Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed multithreading on E2K #32

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

kosumosu
Copy link

  • Fixed multithreading problem in l_studio.cpp when cache entry was not locked and could be destroyed in callback on async IO thread (should affect all architectures).
  • Added alternative thread-safe collections implementation, since e2k does not support 128-bit atomics. New implementation could be improved though, since atomic shared_ptr operations seem to use mutexes inside. At least it works.
  • Simplified code handling sequential jobs in viewrender.cpp since thread pool supports serialization of jobs anyway.
  • Plus some smaller changed (see commit messages)

@@ -906,7 +908,7 @@ class CModelRender : public IVModelRender,
void DrawModelExecute( IMatRenderContext *pRenderContext, const DrawModelState_t &state, const ModelRenderInfo_t &pInfo, matrix3x4_t *pCustomBoneToWorld = NULL );

void InitColormeshParams( ModelInstance_t &instance, studiohwdata_t *pStudioHWData, colormeshparams_t *pColorMeshParams );
CColorMeshData *FindOrCreateStaticPropColorData( ModelInstanceHandle_t handle );
std::shared_ptr<CColorMeshData> FindOrCreateStaticPropColorData( ModelInstanceHandle_t handle );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advise against using std:: stuff in the engine, you should instead use one of the custom classes in public/tierX (maybe CRefPtr), that way someone later can go and optimize that class and have it affect the entire codebase

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found no ready-to-use primitives there. CRefPtr and other stuff will require CColorMeshData to implement ref counting inside it. It also have to be thread safe. I'd better rely it on using std::shared_ptr instead. First, correctness is better then speed. And second, I don't think it will impact performance here - there's plenty of code between creation and release.
I would insist on accepting it as is and improving in another PR.

#endif

//-----------------------------------------------------------------------------

PLATFORM_INTERFACE bool RunTSQueueTests( int nListSize = 10000, int nTests = 1 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 2 lines need to go in your tslist_alternative file, it's redefined in tslists_atomics.h

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@LWSS
Copy link
Contributor

LWSS commented May 29, 2022

Simplified sequential jobs code

this commit rapes my performance. I get around 15fps (vs 300 locked)
2022-05-28-201650_1335x967_scrot

@alexvornoffice
Copy link

Simplified sequential jobs code

this commit rapes my performance. I get around 15fps (vs 300 locked)

So what FPS boost do we get from using one core to more cores? (4 or 8)

@r-a-sattarov
Copy link
Contributor

Simplified sequential jobs code
this commit rapes my performance. I get around 15fps (vs 300 locked)

So what FPS boost do we get from using one core to more cores? (4 or 8)

On Elbrus-801 PC (Elbrus-8C, 32Gb DDR3-ECC, Radeon R7 240 2Gb) FPS increases by 2-3 times.

before
Снимок экрана_2022-08-12_10-32-25

after (with host_thread_mode 1)
Снимок экрана_2022-08-12_11-17-59

@kosumosu
Copy link
Author

kosumosu commented Feb 21, 2023

Simplified sequential jobs code

this commit rapes my performance. I get around 15fps (vs 300 locked))

Reverted this commit. I cannot check how it impacts performance since the game just doesn't work with nvidia drivers on linux amd64. So far I figured out it locks vertex buffer while it is still locked on another thread. Usually this happens when rendering quads for text. But this is unrelated issue.

@dbachilo
Copy link

Can confirm that e2k performance improved significantly. On Elbrus-8CB (e2k_v5 arch) with Radeon RX 570 playing on cs_office map I used to get 11-13 fps in open spaces and 19-20 fps inside buildings. After building e2k_fixup branch I get 20-25 and 30-40 fps respectively.

@deogar
Copy link

deogar commented Sep 18, 2023

Hi everyone, so I've compiled both main branch and this PR and haven't found any measurable drop of performance on my system (AMD Ryzen 5800HS, Radeon Vega 8 Integrated Graphics, Arch Linux kernel 6.5.2).

Both builds felt pretty much the same during playing (de_dust2, CT respawn -> main gates with bots), I've recorded first 30 seconds of match metrics using MangoHUD and here are the results:

Run 1: Main branch
Average FPS: 184.4
1% Min FPS: 65.4
0.1% Min FPS: 36.5
GPU Load: 69.4
CPU Load: 14.1

Run 2: PR #32
Average FPS: 215.1
1% Min FPS: 57.9
0.1% Min FPS: 41.0
GPU Load: 73.1
CPU Load: 12.5

Hope this helps!

csgo_linux64_2023-09-18_02-43-43_summary.csv
csgo_linux64_2023-09-18_02-39-36_summary.csv

@blackletum
Copy link

considering this project hasn't gotten any commits in 15 months, you should fork this and continue work on it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants