-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMX implementation of LV::Video's blit_overlay_alphasrc() is broken #230
Comments
Consider rewriting this using SIMD intrinsics. Its equivalent intrinsic Or use ORC, if it's possible. |
I think ORC is a bit overkill. However writing this in SIMD intrinsics would be great, I am no longer familiair with the code, but I am sure there were some prefetchnta's or alikes nearby, those are probably important for good throughput. |
@dsmit, yeah you're right. Prefetch only exists from SSE and onwards though. I rewrote the MMX code using intrinsics and ran it through Godbolt. Seems like GCC turns them into SSE using XMM registers anyway. Clang keeps the use of MM registers but throws in SSE shuffling instructions. Maybe it's time to drop MMX and use SSE2. SSE2 was introduced in 2000 to P4s. |
I think 3DNow! had some prefetch instructions, and we've used those, although this is a long time ago, and stuff I haven't been working with for over a decade. But I'd drop MMX in a heartbeat and move to 128/256 wide SIMD. I didn't know using intrinsics gave such wild variations between implementations. |
@dsmit, it's probably because x86-64 GCC and Clang targets SSE by default. Since SSE is a superset of MMX, the compiler is free to use wider registers and additional instructions to achieve the same results. |
VideoBlit::blit_overlay_alphasrc_mmx() is the SIMD implementation of _VideoBlit::blit_overlay_alphasrc() using x86 MMX instructions. The MMX register
mm6
is used for an unpack but its value is never iniitalized. Could be a confusion due to a translation into AT&T syntax.The text was updated successfully, but these errors were encountered: