'Using optimistic locks in C++ and memory order
I'm wondering how 'optimistic locks' (as described e.g. here: https://db.in.tum.de/~leis/papers/artsync.pdf) can be used in C++ with regard to non-atomic data and what memory orders should be used.
My understanding from the above paper is that following code will synchronize t1
and t2
and t1
will either print both variables updated or none.
static constexpr uint64_t locked_bit = (1ULL << 63);
std::atomic<int> data1 = 0;
std::atomic<int> data2 = 0;
std::atomic<uint64_t> versioned_lock = 0;
int main() {
std::thread t1([&]{
restart:
auto s = versioned_lock.load();
if (s & locked_bit)
goto restart;
auto local_data1 = data1.load();
auto local_data2 = data2.load();
if (s == versioned_lock.load()) {
// data has not been overwritten yet, can safely use local_data
std::cout << local_data1 << " " << local_data2 << std::endl;
} else {
// possible race, local_data might be garbage?
}
});
std::thread t2([&]{
auto current_lock_version = versioned_lock.load();
current_lock_version++;
versioned_lock.store(current_lock_version | locked_bit);
data1.store(1234);
data2.store(4321);
versioned_lock.store(current_lock_version);
});
t1.join();
t2.join();
}
My questions are:
- I this a valid C++ code with no UB?
- Are my assumption about synchronization correct? Will
t1
actually print either "0 0" or "1234 4321"? - What memory orders should be used for reading/writing to
versioned_lock
? Should it bememory_order_seq_cst
or can it be something less restrictive? Will it work also ifdata1
anddata2
are non-atomic (justint
or even some more complex data type) and does it affect the required memory order (or maybe there is a need for atomic_thread_fence)?
Solution 1:[1]
Is this a valid C++ code with no UB?
I think so. The usual way to get UB in such a program is a data race, but that can only happen when a non-atomic variable is written concurrently, and every shared variable in this program is atomic.
Are my assumption about synchronization correct? Will t1 actually print either "0 0" or "1234 4321"?
Yes. All the loads and stores in this version are seq_cst
, so they
occur in some total order, which I will use < to denote. Let L1, L2 denote the two loads of
versioned_lock
in t1, and S1, S2 the two stores in t2.
If S1, S2 both occur before L1, then both the new values will be seen
and we print 1234 4321
, which is fine. If S1, S2 both occur after L2 then we
print 0 0
, which is also fine. If either S1 or S2 occurs in between
L1 and L2, then L1 and L2 return different values and we do not print
anything, which again is fine.
The only ordering that remains is S1 < L1 < L2 < S2. In this case L1 returns the value with the locked bit set, and so we restart the loop instead of printing.
What memory orders should be used for reading/writing to versioned_lock? Should it be memory_order_seq_cst or can it be something less restrictive?
I think that the following suffices:
static constexpr uint64_t locked_bit = (1ULL << 63);
std::atomic<int> data1 = 0;
std::atomic<int> data2 = 0;
std::atomic<uint64_t> versioned_lock = 0;
int main() {
std::thread t1([&]{
restart:
auto s = versioned_lock.load(std::memory_order_acquire); // L1
if (s & locked_bit)
goto restart;
auto local_data1 = data1.load(std::memory_order_relaxed);
auto local_data2 = data2.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire); // AF
if (s == versioned_lock.load(std::memory_order_relaxed)) { // L2
// data has not been overwritten yet, can safely use local_data
std::cout << local_data1 << " " << local_data2 << std::endl;
} else {
// possible race, local_data might be garbage?
}
});
std::thread t2([&]{
auto current_lock_version = versioned_lock.load(std::memory_order_relaxed);
current_lock_version++;
versioned_lock.store(current_lock_version | locked_bit, std::memory_order_relaxed); // S1
std::atomic_thread_fence(std::memory_order_release); // RF
data1.store(1234, std::memory_order_relaxed);
data2.store(4321, std::memory_order_relaxed);
versioned_lock.store(current_lock_version, std::memory_order_release); // S2
});
t1.join();
t2.join();
}
That is, the accesses to versioned_lock
that clear and test the locked
bit must be release and acquire, the stores of the data must be
preceded by a release fence, and the loads of the data followed by an
acquire fence. Intuitively, this should keep the data accesses from leaking out from between the lock accesses that protect them. (You could also make all the stores of the data
be release and then drop the release fence, but that would be unnecessarily
strong: we do not care if the two data stores get reordered with each
other. It would make a bigger difference if we had many more data elements. The same applies to the option of making the data loads acquire.)
To prove that this works, notice that the two results we must avoid are 1234 0
and 0 4321
. So suppose that the load of data1
returns 1234
(the case when data2
returns 4321
is equivalent). Since this is the value that was stored by t2, the release fence (RF) synchronizes with the acquire fence (AF). Now S1 is sequenced before RF, and AF is sequenced before L2, so we conclude that S1 happens before L2. Thus L2 cannot return 0; it either returns locked_bit
or 1
.
If L1 returns a different value from L2 then we will not print. If L1 returns locked_bit
we also will not print. The one remaining case is that L1 and L2 both return 1. In this case, L1 has read the value written by S2, and L1/S2 are acquire/release, so S2 synchronizes with L1. The data stores are sequenced before S2, and L1 is sequenced before the data loads, so both data stores happen before both data loads. Therefore we see both the new values, and print 1234 4321
.
Will it work also if data1 and data2 are non-atomic (just int or even some more complex data type)?
No, that will not work at all. Nothing in this algorithm prevents the
stores and loads of data1, data2
from conflicting; it's just that
when they do conflict, our tests of the versioned lock tell us not to
use the data. But if they were not atomic, the conflicting stores and
loads would be a data race, causing undefined behavior whether we use
the data or not.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |