SSE dectetion made wrong assumptions on SSSE3-less processor #140
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
it seems that on some processors invalid SSE instructions will be used.
If I run the decoder on an AMD Turion it crashes immediately with:
Error: SIGILL – Ungültiger Maschinenbefehl (Speicherabzug geschrieben)
This pull request try to fix the issues due avoiding of non-available instructions.
Regards,
Olaf Schulz
========================================================================
Reason are the usage of SSSE3 or SSE4.1 instruction in functions which
are assumed as SSE2 compatible.
Following list contains the instruction for each of the failing functions.
It could be useful to restore older variants of the functions, if available.
(I had not checked if such functions are available.)
but the processor does not support it.
Detected flags:
MMX:1 SSE:1 SSE2:1 SSE3:1 SSSE3:1 SSE4a:0 SSE4_1:0 SSE4_2:0 AVX:0 AVX2:0
^^^^^^^ wrong detection for AMD Turion, see Appendix A
Solution: Use the correct register, ecx, not edx.
int have_SSSE3 = !!(ecx & (1<< 9));
Nemiver debugger stopped at
x00007ffff7bb108e <_Z18sao_band_8bit_sse2PhiPKhiiiiiiii+110>: pshufb %xmm0,%xmm1
pshufb is a SSSE3 command, which is not supported by the processor.
Solution: Change line to (and rename function).
if (have_SSSE3) { accel->sao_band_8 = sao_band_8bit_sse2; }
Nemiver debugger stopped at
0x00007ffff7ba89df <Z27put_hevc_luma_direct_8_sse2PslPKhliiS+207>: pinsrq $0x0,(%r11),%xmm0
pinsrq is a SSE4.1 command
Solution: Shift function into if(have_SSE4_1)-wrapper (and restore
older SSE2 variant if available.)
Nemiver debugger stopped at
0x00007ffff7ba8b3f <Z29put_hevc_chroma_direct_8_sse2PslPKhliiiiS+207>: pinsrq $0x0,(%r11),%xmm0
Solution: Like 2.
99 if (have_SSE2) { [...]
Switch for put_pred_8_sse2 and put_bipred_8_sse2. Both function contain the
instruction pextrd (SSE4.1):
Nemiver debugger stopped at
0x00007ffff7ba7f6f <_Z15put_pred_8_sse2PhlPKslii+447>: pextrd $0x0,%xmm0,%r8d
0x00007ffff7ba8107 <_Z17put_bipred_8_sse2PhlPKsS1_lii+375>: pextrd $0x0,%xmm0,%r10d
Solution: Like 2.
Appendix A)
cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 36
model name : AMD Turion(tm) 64 Mobile Technology MT-32
stepping : 2
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good nopl pni lahf_lm vmmcall
bogomips : 1600.02
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc