MMX implementation of LV::Video's blit_overlay_alphasrc() is broken #230

kaixiong · 2023-01-31T22:38:55Z

VideoBlit::blit_overlay_alphasrc_mmx() is the SIMD implementation of _VideoBlit::blit_overlay_alphasrc() using x86 MMX instructions. The MMX register mm6 is used for an unpack but its value is never iniitalized. Could be a confusion due to a translation into AT&T syntax.

movd %[spix], %%mm0
movd %[dpix], %%mm1
movq %%mm0, %%mm2
movq %%mm0, %%mm3
psrlq $24, %%mm2
movq %%mm0, %%mm4
psrld $24, %%mm3
psrld $24, %%mm4
psllq $32, %%mm2
psllq $16, %%mm3
por %%mm4, %%mm2
punpcklbw %%mm6, %%mm0    ; <-- mm6 is never initialized (mm0 is destination, mm6 is source)
por %%mm3, %%mm2
punpcklbw %%mm6, %%mm1    ; <-- mm6 is never initialized (mm0 is destination, mm6 is source)
psubsw %%mm1, %%mm0
pmullw %%mm2, %%mm0
psrlw $8, %%mm0
paddb %%mm1, %%mm0
packuswb %%mm0, %%mm0
movd %%mm0, %[dest]

The text was updated successfully, but these errors were encountered:

kaixiong · 2023-01-31T22:44:50Z

Consider rewriting this using SIMD intrinsics. Its equivalent intrinsic _mm_unpackhi_pi16 returns the result, rather than modifying one operand in place.

Or use ORC, if it's possible.

dsmit · 2023-02-02T14:39:51Z

I think ORC is a bit overkill.

However writing this in SIMD intrinsics would be great, I am no longer familiair with the code, but I am sure there were some prefetchnta's or alikes nearby, those are probably important for good throughput.

…ntrinsics (#230).

kaixiong · 2023-02-02T21:35:07Z

@dsmit, yeah you're right. Prefetch only exists from SSE and onwards though.

I rewrote the MMX code using intrinsics and ran it through Godbolt. Seems like GCC turns them into SSE using XMM registers anyway. Clang keeps the use of MM registers but throws in SSE shuffling instructions.

Maybe it's time to drop MMX and use SSE2. SSE2 was introduced in 2000 to P4s.

dsmit · 2023-02-03T09:57:27Z

I think 3DNow! had some prefetch instructions, and we've used those, although this is a long time ago, and stuff I haven't been working with for over a decade.

But I'd drop MMX in a heartbeat and move to 128/256 wide SIMD.

I didn't know using intrinsics gave such wild variations between implementations.

…ntrinsics (#230).

kaixiong · 2023-02-05T02:41:14Z

@dsmit, it's probably because x86-64 GCC and Clang targets SSE by default. Since SSE is a superset of MMX, the compiler is free to use wider registers and additional instructions to achieve the same results.

…ntrinsics (#230).

kaixiong added bug critical labels Jan 31, 2023

kaixiong added this to the 0.4.2 milestone Jan 31, 2023

kaixiong self-assigned this Jan 31, 2023

kaixiong added a commit that referenced this issue Feb 2, 2023

Core (LV::Video): Rewrite MMX alpha blending of 32-bit videos using i…

1ec968b

…ntrinsics (#230).

kaixiong mentioned this issue Feb 2, 2023

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

Open

kaixiong added a commit that referenced this issue Feb 4, 2023

Core (LV::Video): Rewrite MMX alpha blending of 32-bit videos using i…

ed61ea4

…ntrinsics (#230).

kaixiong added a commit that referenced this issue Feb 5, 2023

Core (LV::Video): Rewrite MMX alpha blending of 32-bit videos using i…

0188f89

…ntrinsics (#230).

kaixiong added a commit that referenced this issue Feb 11, 2023

Core (LV::Video): Rewrite MMX alpha blending of 32-bit videos using i…

8eb5707

…ntrinsics (#230).

hartwork removed this from the 0.4.2 milestone Mar 20, 2023

kaixiong added a commit that referenced this issue Mar 31, 2023

Core (LV::Video): Rewrite MMX alpha blending of 32-bit videos using i…

cac3d91

…ntrinsics (#230).

kaixiong added a commit that referenced this issue Feb 8, 2024

Core (LV::Video): Rewrite MMX alpha blending of 32-bit videos using i…

937abfa

…ntrinsics (#230).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMX implementation of LV::Video's blit_overlay_alphasrc() is broken #230

MMX implementation of LV::Video's blit_overlay_alphasrc() is broken #230

kaixiong commented Jan 31, 2023

kaixiong commented Jan 31, 2023

dsmit commented Feb 2, 2023

kaixiong commented Feb 2, 2023

dsmit commented Feb 3, 2023

kaixiong commented Feb 5, 2023

MMX implementation of LV::Video's blit_overlay_alphasrc() is broken #230

MMX implementation of LV::Video's blit_overlay_alphasrc() is broken #230

Comments

kaixiong commented Jan 31, 2023

kaixiong commented Jan 31, 2023

dsmit commented Feb 2, 2023

kaixiong commented Feb 2, 2023

dsmit commented Feb 3, 2023

kaixiong commented Feb 5, 2023