Tinkering with storing audio data in hadoop DBs
Largely curiosity driven, with notions of using tool as way to replicate issues with database technologies in hadoop. While acknowleding that audio tracks are usually in MB range (quite large for DB).
Long-term view would be to store slices of audio across a standard units of time (ie frame count/sampling rate). Which are annotated with some sentiment, allowing for collections of different slices of audio signals to be fetched based on such sentiment.
1). Client posts Audio to storage (here S3 bucket).
2). Audio files are then fetched by say spark job, which processes & imports into a DB.
3). Audio data stored in DB can be queried, and return original WAV.
A Track is the model of data to be imported. Composed of an 3 model objects.
1). Audio - Holds byte array, later map of byte array chunked by units of time mapped by sampling rate.
2). TrackMetaData - Holds metadata about the track such as name, owner, size, etc.
3). AudioMetaData - Holds metadata about the audio signal such as frame count, sampling rate, duration etc.
1). AudioFactory - Constructing model objects.
2). AudioHandler - Handles audio data as files, numpy conversions.
3). Workers - Ties fetching audio, with the AudioHandler and AudioFactory.
1). Testing DatabaseWorker (HBase):
-
Writing works fine, but should consider compression if helps.
-
Reading is active.
-
Can touch up while looking at exceptions and logger.
-
A lot within HappyBase should have its own time, as this initial code was generated by ChatGPT.
2). Exceptions.
3). Logger.