-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
backendRelated to backend codeRelated to backend codebugSomething isn't workingSomething isn't workingeffort:lowLikely a few hoursLikely a few hourspriority:highShould be addressed as a priorityShould be addressed as a priorityrepo:drone-tm
Description
- Create a database table
project_images:
Enums
-- 1. Create enum type for image status
CREATE TYPE image_status AS ENUM (
'staged', -- uploaded but not yet classified
'classified', -- successfully classified and moved
'invalid_exif', -- EXIF unreadable or missing
'unmatched', -- no task intersects this photo
'duplicate' -- hash collision within project
);Create table, partitioned:
CREATE TABLE project_images (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
task_id UUID REFERENCES tasks(id) ON DELETE SET NULL,
filename TEXT NOT NULL,
s3_key TEXT NOT NULL,
hash_md5 CHAR(32) NOT NULL,
location GEOMETRY(POINT, 4326), -- extracted from exif lat/lon
exif JSONB, -- The raw EXIF info if we need to process it again at some point
uploaded_by UUID REFERENCES users(id) ON DELETE SET NULL,
uploaded_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
classified_at TIMESTAMP WITH TIME ZONE,
status image_status NOT NULL DEFAULT 'staged',
-- optional metadata
duplicate_of UUID REFERENCES project_images(id) ON DELETE SET NULL
);(Future plan if scalability becomes a problem) Partition by year:
CREATE TABLE project_images_partitioned (
LIKE project_images INCLUDING ALL
) PARTITION BY RANGE (uploaded_at);
CREATE TABLE project_images_2025
PARTITION OF project_images
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');(for now performance should be fine with a good index, so don't do this)
Indexes
CREATE INDEX idx_project_images_project ON project_images(project_id);
CREATE INDEX idx_project_images_status ON project_images(status);
CREATE INDEX idx_project_images_hash ON project_images(hash_md5);
CREATE INDEX idx_project_images_task ON project_images(task_id);
-- Composite indexes for helpful speedup of common searches
CREATE INDEX IF NOT EXISTS idx_project_status_uploaded_at
ON project_images(project_id, status, uploaded_at);
-- To prevent duplicates:
CREATE UNIQUE INDEX uniq_project_hash
ON project_images(project_id, hash_md5);
-- Speed up classifier scans for staged images in `user-uploads` dir
CREATE INDEX idx_project_images_staged ON project_images(uploaded_at)
WHERE status = 'staged';
-- Geospatial index
CREATE INDEX idx_project_images_location ON project_images USING GIST (location);- Create a migration for each task to download related
images.jsonfor each, then insert into database in batches.
Something like this... (avoids dups)
INSERT INTO project_images (
project_id, task_id, filename, s3_key, hash_md5,
location, exif, uploaded_by, uploaded_at, status
)
VALUES
(:project_id, :task_id, :filename, :s3_key, :hash_md5,
ST_SetSRID(ST_MakePoint(:exif_lon, :exif_lat), 4326),
:exif::jsonb, :uploaded_by, :uploaded_at, :status)
ON CONFLICT (project_id, hash_md5) DO NOTHING;- Refactor backend code to use this table, and not the
images.jsonfile anymore.
#635 has additional details on the strategy going forward.
In summary, we should insert the EXIF for each uploaded image into our database, in batches during the sorting / classification process on the backend.
Metadata
Metadata
Assignees
Labels
backendRelated to backend codeRelated to backend codebugSomething isn't workingSomething isn't workingeffort:lowLikely a few hoursLikely a few hourspriority:highShould be addressed as a priorityShould be addressed as a priorityrepo:drone-tm