Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsan fix browsethread #13872

Open
wants to merge 8 commits into
base: 2.5
Choose a base branch
from
Open

tsan fix browsethread #13872

wants to merge 8 commits into from

Conversation

m0dB
Copy link
Contributor

@m0dB m0dB commented Nov 10, 2024

Fixes #13861 and improves the code quality of BrowseThread a bit.

start(QThread::LowPriority);

qDebug() << "Wait to start browser background thread";
// Wait until the thread is running
Copy link
Contributor Author

@m0dB m0dB Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fixes the data race

Copy link
Contributor Author

@m0dB m0dB Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. This is the fix for the data race. Connecting the signals/slots while the thread is not yet running triggers the issue.

static QWeakPointer<BrowseThread> s_weakInstanceRef;
static QMutex s_mutex;

s_mutex.lock();
Copy link
Contributor Author

@m0dB m0dB Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe this mutex placement. either way, both are improvements.

EDIT: no, I reverted this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's understand and double check that. The old code IMHO does not suffere any issues: https://doc.qt.io/qt-6/qsharedpointer.html#thread-safety -> Reference counting is thread safe. We only must protect the s_weakInstanceRef = strong.toWeakRef(); case which the rety code does in the original version. The old code is lock free exept on the first invocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will check again where the issue comes from, here or the thread being used before being started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. The original code was correct. I have reverted it, except for making

   static QWeakPointer<BrowseThread> s_weakInstanceRef;
   static QMutex s_mutex;

local variables in the function instead of globals.

Please resolve if you are okay with the latest version.

m_path_mutex.lock();
void BrowseThread::requestPopulateModel(mixxx::FileAccess path, BrowseTableModel* client) {
// Inform the thread to populate a new path.
// Note: if another request is currently being processed, this will replace it.
Copy link
Contributor Author

@m0dB m0dB Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is explained in the comment in line 76 seems weird to me. it makes sense when the client is the same, but what if it's a different client? anyway, I didn't want to change the behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it works, because only one table is shown at a time. But indeed an area of improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I have not yet fully understand the issue.

src/library/browse/browsethread.cpp Outdated Show resolved Hide resolved
@m0dB
Copy link
Contributor Author

m0dB commented Nov 11, 2024

Some comments. I have not yet fully understand the issue.

To explain the issue: tsan reported a data race in browsetablemodel.cpp:163 when connecting the table model with the result from BrowseThread::getInstanceRef(). IIRC the fix is waiting until the thread is started before returning from getInstanceRef (in the BrowseThread constructor) but I also changed getInstanceRef itself because the placing of the mutex didn’t seem correct to me.

The other changes are mainly esthetics / code clarity.

I do have my doubts about this code, it seems strange to me that a new populate request for any browsetablemodel aborts the running request. But I didn’t want to change to behavior.

static QWeakPointer<BrowseThread> s_weakInstanceRef;
static QMutex s_mutex;

s_mutex.lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's understand and double check that. The old code IMHO does not suffere any issues: https://doc.qt.io/qt-6/qsharedpointer.html#thread-safety -> Reference counting is thread safe. We only must protect the s_weakInstanceRef = strong.toWeakRef(); case which the rety code does in the original version. The old code is lock free exept on the first invocation.

m_path_mutex.lock();
void BrowseThread::requestPopulateModel(mixxx::FileAccess path, BrowseTableModel* client) {
// Inform the thread to populate a new path.
// Note: if another request is currently being processed, this will replace it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it works, because only one table is shown at a time. But indeed an area of improvement.

src/library/browse/browsethread.cpp Outdated Show resolved Hide resolved

QMutex m_mutex;
QWaitCondition m_locationUpdated;
volatile bool m_bStopThread;
QWaitCondition m_condition;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QWaitConditions are normally named for the condition that shall become true. Just "condition" is a bit poor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing is, I am using it for 2 things: waiting for the thread to be started and waiting for a new populate request.

Copy link
Contributor Author

@m0dB m0dB Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed the member vars for clarity, to express that these are requests (between the main thread and the browse thread) and what. Please review and resolve.

m_mutex.lock();
while (!m_bRun) {
m_condition.wait(&m_mutex);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not understood the loop. Isn't there only one event/codition we are waiting for? m_bRun shall become a semaphore? On the other hand syncronizing the thread start by halting the main thread feels wrong. It would be better to protect the shared sources explicit. What are the shared resources in this case we have tsan complains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are waiting for m_bRun to be set to true, as set by the thread::run method. A semaphore would work as well, but since we already have the condition, I take advantage of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not very clear to me what this problem is exactly, but this is the tsan report:

WARNING: ThreadSanitizer: data race (pid=48840)
  Read of size 8 at 0x0001466d5f28 by thread T29:
    #0 void doActivate<false>(QObject*, int, void**) <null>:230508612 (mixxx:arm64+0x10011f4e8)
    #1 QMetaObject::activate(QObject*, QMetaObject const*, int, void**) <null>:230508612 (mixxx:arm64+0x10011db88)
    #2 QThread::started(QThread::QPrivateSignal) <null>:230508612 (mixxx:arm64+0x1001e8c5c)
    #3 QThreadPrivate::start(void*) <null>:230508612 (mixxx:arm64+0x100248f38)

  Previous write of size 8 at 0x0001466d5f28 by main thread (mutexes: write M0, write M1, write M2):
    #0 QObjectPrivate::addConnection(int, QObjectPrivate::Connection*) <null>:230508612 (mixxx:arm64+0x10011bdac)
    #1 QObjectPrivate::connectImpl(QObject const*, int, QObject const*, void**, QtPrivate::QSlotObjectBase*, int, int const*, QMetaObject const*) <null>:230508612 (mixxx:arm64+0x100121dc4)
    #2 QObject::connectImpl(QObject const*, void**, QObject const*, void**, QtPrivate::QSlotObjectBase*, Qt::ConnectionType, int const*, QMetaObject const*) <null>:230508612 (mixxx:arm64+0x100121790)
    #3 QMetaObject::Connection QObject::connect<void (BrowseThread::*)(BrowseTableModel*), void (BrowseTableModel::*)(BrowseTableModel*)>(QtPrivate::FunctionPointer<void (BrowseThread::*)(BrowseTableModel*)>::Object const*, void (BrowseThread::*)(BrowseTableModel*), QtPrivate::FunctionPointer<void (BrowseTableModel::*)(BrowseTableModel*)>::Object const*, void (BrowseTableModel::*)(BrowseTableModel*), Qt::ConnectionType) qobject.h:223 (mixxx:arm64+0x100b046e4)
    #4 BrowseTableModel::BrowseTableModel(QObject*, TrackCollectionManager*, RecordingManager*) browsetablemodel.cpp:163 (mixxx:arm64+0x100b05824)

The code in the main thread:

  162     m_pBrowseThread = BrowseThread::getInstanceRef();                 
  163     connect(m_pBrowseThread.data(),                                   
  164             &BrowseThread::clearModel,                                
  165             this,
  166             &BrowseTableModel::slotClear,                             
  167             Qt::QueuedConnection);

So, my conclusion was that the connection to the thread before it was running was problematic.

I will re-assess that this is indeed the case, but I do think that is it. And if so, I don't see a problem with waiting for the thread to have started. That happens only once, and it will be fast anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A semaphore would work as well, but since we already have the condition, I take advantage of it.

This reusing makees the code harder to read. So I would prefer two primitived for two tasks.

The ThreadSanitizer warning is really confusing, what does it check for? Whether a variable is written/read from two threads? It must be a control structure in Qt. However waiting for the thread to start should not change the issue for my undertsanding.

But if it is the solution i don't mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the solution work? I am afraid instead of a solution, we trick ThreadSanitizer.
Can, it happen that it treated the unrelated mutex as a proper guard?
Qt allows to send signals and reconnect them concurrently. So it should not be a problem.
Maybe only a missing atomic annotation in Qt?

Is there a way to find out what 0x0001466d5f28 is and which mutexted I are considered?

Or is it s QT bug? Is 2.4 with Qt 5 affected as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I am now building Qt5. But it is on purpose that the 2.5 branch of mixxx vcpkg has Qt6 as the default? I am confused now as to what we are supposed to use for Mixxx 2.5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am running into lots of problems trying to build against Qt5, so it would be much appreciated if you could help me building Qt6 with vcpkg with this Qt sanitize thread option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixxx 2.4.2 will be our last Qt5 release. So there is IMHO no need to deal with QT5 in this case. We just keep the Qt5 support in our 2.5 branch just in case we want to test regresions.

I suggest to issue a PR against our vcpkg branch and use the build package downloaded from GitHub. For my undersanding you jsut need to add the sanitizer flag at the position I pointed out above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I prefer to use Qt 6, but you pointed it out in the qt5 portfile.cmake. The qt6 portfile.cmake is completely different, and I don't see where to add this option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this works. I'd prefer to set this from my triplets file arm64-osx-tsan.cmake, but for now this have to do.

diff --git a/ports/qtbase/portfile.cmake b/ports/qtbase/portfile.cmake
index 8b9e3b0d3..4a96412d7 100644
--- a/ports/qtbase/portfile.cmake
+++ b/ports/qtbase/portfile.cmake
@@ -92,7 +92,7 @@ INVERTED_FEATURES
)

list(APPEND FEATURE_OPTIONS -DCMAKE_DISABLE_FIND_PACKAGE_Libudev:BOOL=ON)
-list(APPEND FEATURE_OPTIONS -DFEATURE_xml:BOOL=ON)
+list(APPEND FEATURE_OPTIONS -DFEATURE_xml:BOOL=ON -DFEATURE_sanitize_thread:BOOL=ON)

m_condition.wait(&m_mutex);
}
auto path = std::move(m_path);
auto modelObserver = m_modelObserver;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a gurantee that the m_modelObserver outlives the thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because it's holds a shared pointer to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_modelObserver holds a share pointer to the Browsthread? Maybe you can add a comment to the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a single, refcounted, BrowseThread. This is what is returned by BrowseThread::getInstanceRef();

In BrowseTableModel.cpp it is stored: m_pBrowseThread = BrowseThread::getInstanceRef();

So, when all BrowseTableModel instances are gone, the ref count is 0, and the BrowseThread instance is destroyed.

But I am not sure why I have to explain this or add more comments. This is not my code! I only want to solve the thread sanitizer warnings. If you feel the (existing!) code needs more explanation, go ahead.

src/library/browse/browsethread.cpp Outdated Show resolved Hide resolved
m_mutex.lock();
m_bRun = true;
m_condition.wakeOne();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wakeOne() and wait() below is confusing, Let's use a second sync method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the request in a separate method.

m_path_mutex.unlock();
m_locationUpdated.wakeAll();
m_modelObserver = modelObserver;
m_condition.wakeOne();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What can happen if the thread is not yet waiting? In this case no one is waked up.
But the m_modelObserver and m_path have a value. I think the thread can first check if m_modelObserver is not null and pick that up before waiting. Probably the waiting in the constructor is obsolete than?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if m_path has a value, the thread will not enter the wait state.

@m0dB
Copy link
Contributor Author

m0dB commented Nov 12, 2024

I have checked with Qt 6 with thread sanitizer activated through the Qt feature flags, and it still has to same problem, so I would like to ask to merge this PR. I feel I addressed all comments.

@m0dB
Copy link
Contributor Author

m0dB commented Nov 17, 2024

Is there is a reason not to merge this? I confirm that, including with a thread sanitizer enabled Qt, I get TSAN warnings unless I wait for the thread to have started.

@daschuer
Copy link
Member

daschuer commented Nov 20, 2024

I did a bit of a research and discovered a related QT issues.

https://bugreports.qt.io/browse/QTBUG-100336
.. looks quite similar. My guess is that the different backtrace happens due to not optimized code.

Can you please use the attached the testing code and confirm that this is actually our issue?

This bug is fixed in: Qt v6.8.0 and planned to 6.5 the stable branch we use for Mixxx 2.5
qt/qtbase@75d82af

I was able to trace it down to qt/qtbase@34fe923

receiver->d_func()->connections.loadRelaxed()->currentSender = previous

Where it has been portet from QAtomic in v5.14.0
But the issue exists way longer with the old API.

The issue seems affect a bunch of atomic variables and whenever a connection is astabisched between threads. Most likely this fix fixes only one symptome of the issue and the issue will happen again with the other signals.
Luckily this is currently "only" a linter issue and we do not have a real user issue with that, I would prefer to revert the quite bulky and run-time consuming solution here and just wait for the upstream fix. The other refactorings in this thread are welcome.

An alternative solution for all signals would be to connect the the connections via "m_model_observer" from the worker thread itself, this way the affected variables are only touched by one thread. I consider this a nice refactoring that can remain even if the upstrem issue is fixed. But I don't want to put that work on you. You may also directky "QMetaObject::invokeMethod" without connecting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants