-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demo: Speech processing MFCC demo #603
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool! I learn something every time I go through your PRs.
While this already looks good, I am hoping that we might be able to polish the indexing portions a little more, and also possibly figure out what we should change in the language to make it better.
z = FToI (pow 2.0 15.0) | ||
def W32ToI (x : Word32): Int = | ||
y:Int = internalCast _ x | ||
select (y <= z) y ((-1)*z + (y- z)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that meant to reinterpret the Word32 as a twos complement encoded integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This is super hacky, but I couldn't come up with a better way. Wav file stores int16's in this format.
examples/alignment.dx
Outdated
iter \i . | ||
case i < n of | ||
True -> | ||
_ = parse h (pChar ls.(i@_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to use unsafeFromOrdinal
here. I guess we should add a !@!
operator for that at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit? It is faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, faster both to compile and to execute. With a downside that if you do end up casting an integer out of range then you can get memory corruption and segfaults.
examples/alignment.dx
Outdated
MelBins = Fin 28 | ||
|
||
hscale : ScaledRange Positive = AsScaledRange {start=0.0, end=samplerate / 2.0} | ||
melscale : ScaledRange MelBins = AsScaledRange {start=mel 0.0, end=mel (samplerate / 2.0)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning why do you pick samplerate / 2.0
as the end
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
Still not fully cleaned up, but made a second attempt at typed windowing and padding (added to the prelude for now). Think this better gets at the core idea of running a sliding window filter over a statically sized table. Might just be being stubborn here, but trying to avoid using List when not necessary. |
Re: how to go from word8's to int16, I figured this out the other day:
I'm pretty sure the code in the
|
Thanks, this is helpful. Wonder if these are both horribly inefficient though. Also in the Wav format it is two byte Ints. Any ideas? |
Hmmm, for two byte ints, I guess I would just add an Int16 type to the Dex compiler and prelude, by copying all the Int32 stuff. |
m => (Window left right) => n = | ||
for i : m. | ||
for j : (Window left right). | ||
k = fromOrdinal _ $ (ordinal i) + (ordinal j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really use unsafeFromOrdinal
' Needs to be called with `castTable` to do the striding. | ||
|
||
def stride (tab : (m & Fin len)=>n) : m => n = | ||
for i. tab.(i, 0@_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to do 0@_
outside of the loop, because at the moment we might fail to hoist it and it's not free
pad (for i. init) (for j. pad init tab.j) | ||
|
||
def stride2 (tab : (m & Fin vlen)=>(n & Fin hlen) => o) : m => n => o= | ||
for i j. tab.(i, 0@_).(j, 0@_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now can we keep those functions in the example? I'm thinking of adding some support for windowing via a generalization of the tiling infrastructure we already have (windowing is just tiling with overlapping tiles!)
PostWindow = Fin $ idiv (size Datsize) (size Step) | ||
|
||
frame_split : PostWindow => FrameWindow => Float = | ||
stride $ castTable (_ & Step) $ window $ pad 0.0 signal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow this is pretty neat
This is a demo of speech preprocessing. Mostly an attempt to try out some applied signal processing and see what you might be able to do. It includes
Things that are great:
Things that I am still struggling with:
Things that are a problem: