std::memory_order
From Cppreference
Defined in header <atomic>
|
||
enum memory_order {
memory_order_relaxed, |
||
On multicore systems, when a thread writes a value to memory, it becomes immediately available for reading on the same core, but threads executing on other cores may see the previous value for some time, and when they get to see values change, it may not be in the same order as what other threads see. In addition, both C++ compilers and CPUs may reorder regular memory accesses within a single thread for efficiency.
Each atomic operation accepts a std::memory_order as an additional parameter, which specifies how non-atomic memory accesses are to be ordered around this atomic operation.
Contents |
[edit] Constants
Defined in header
<atomic> | |
Value | Meaning |
memory_order_relaxed | The operation does not order memory. |
memory_order_consume | Performs a consume operation on the affected memory location, marking the root of a data dependency tree. The reads from the affected memory location that carry data dependency cannot be reordered before the load; other reads can be. On most platforms, this affects compiler optimization only. |
memory_order_acquire | Performs an acquire operation on the affected memory locations, thus making regular memory writes in other threads released through the atomic variable to which it is applied, visible to the current thread. No reads from the affected memory location can be reordered before the load. |
memory_order_release | Performs a release operation on the affected memory locations, thus making regular memory writes visible to other threads through the atomic variable to which it is applied. |
memory_order_acq_rel | The operation has both acquire and release semantics. |
memory_order_seq_cst | The operation has both acquire and release semantics, and in addition, has sequentially-consistent operation ordering. |
[edit] Sequentially-consistent ordering
The default is std::memory_order_seq_cst which establishes a single total ordering over all atomic operations tagged with this tag: all threads see the same order of such atomic operations and no memory_order_seq_cst atomic operations can be reordered. Sequential ordering is necessary for many multiple producer-multiple consumer situation where all consumers must observe the actions of all producers occurring in the same order.
On all multicore systems, total sequential ordering requires a full memory fence CPU instruction which may become a performance bottleneck since it forces all memory accesses to propagate to every thread.
[edit] Release-Acquire ordering
If an atomic store is tagged std::memory_order_release and an atomic load from the same variable is tagged std::memory_order_acquire, pairwise synchronization is established between the thread that does the release and the thread that does the acquire. Different threads can see different ordering, but the thread that does the release will observe exactly the same order of atomic operations as the thread that does the acquire on these atomic variables. Moreover, any non-atomic and relaxed atomic stores that happen before the release in the first thread will be guaranteed to be completed from the point of view of the second thread before it does the acquire.
Release-acquire synchronization is transitive: if after the load-acquire, the second thread does a store-release on some other atomic, which the third thread load-acquires, non-atomic and relaxed events that happened in the first thread are now guaranteed to be visible to the third thread. The second thread in this case may use the tag std::memory_order_acq_rel (see example).
On strongly-ordered systems (x86, SPARC, IBM mainframe), release-acquire ordering is automatic. No additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from moving non-atomic stores past the atomic store-relase or perform non-atomic loads earlier than the atomic load-acquire)
[edit] Release-Consume ordering
If an atomic store is tagged std::memory_order_release and an atomic load from the same variable is tagged std::memory_order_consume, a weaker form of synchronization is established, known as "dependency ordering". Only the non-atomic and relaxed atomic operations that carry a data dependency to and from the atomic that participates in the store-release and load-consume link are sequenced, all other operations within each thread may be reordered freely (see example). Like the synchronizes-with relationship established by release and acquire, the dependency-ordered-before relationship established by release and consume can propagate through multiple threads.
On all mainstream CPUs, other than DEC Alpha, dependecy ordering is automatic, no additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from performing speculative loads on the objects that are involved in the dependency chain)
[edit] Release sequence
If some atomic is store-released and several other threads perform read-modify-write operations on that atomic, a "release sequence" is formed: all threads that perform the read-modify-writes to the same atomic synchronize with the first thread and each other even if they have no memory_order_release semantics. This makes single producer - multiple consumers situations possible without imposing unnecessary synchronization between individual consumer threads.
[edit] Relaxed ordering
Atomic operations tagged std::memory_order_relaxed do not participate in any synchronization and do not impose any ordering except that once a thread reads a value, a subsequent read from the same thread from the same object cannot read an earlier value. For example, with x and y initially zero,
// Thread 1:
r1 = y.load(memory_order_relaxed);
x.store(r1, memory_order_relaxed);
// Thread 2:
r2 = x.load(memory_order_relaxed);
y.store(42, memory_order_relaxed);
Is allowed to produce r1 == r2 == 42
[edit] Examples
[edit] std::memory_order_seq_cst
This example demonstrates a situation where sequential ordering is necessary. Any other ordering may trigger the assert because it would be possible for the threads c and d to observe changes to the atomics x and y in opposite order.
#include <thread> #include <atomic> #include <cassert> std::atomic<bool> x = ATOMIC_VAR_INIT(false); std::atomic<bool> y = ATOMIC_VAR_INIT(false); std::atomic<int> z = ATOMIC_VAR_INIT(0); void write_x() { x.store(true, std::memory_order_seq_cst); } void write_y() { y.store(true, std::memory_order_seq_cst); } void read_x_then_y() { while (!x.load(std::memory_order_seq_cst)) ; if (y.load(std::memory_order_seq_cst)) { ++z; } } void read_y_then_x() { while (!y.load(std::memory_order_seq_cst)) ; if (x.load(std::memory_order_seq_cst)) { ++z; } } int main() { std::thread a(write_x); std::thread b(write_y); std::thread c(read_x_then_y); std::thread d(read_y_then_x); a.join(); b.join(); c.join(); d.join(); assert(z.load() != 0); // will never happen }
[edit] std::memory_order_relaxed
The following example demonstrates a task (updating a global counter) that requires atomicity, but no ordering constraints since non-volatile memory is not involved.
#include <vector> #include <iostream> #include <thread> #include <atomic> std::atomic<int> cnt = ATOMIC_VAR_INIT(0); void f() { for(int n = 0; n < 1000; ++n) { cnt.fetch_add(1, std::memory_order_relaxed); } } int main() { std::vector<std::thread> v; for(int n = 0; n < 10; ++n) { v.emplace_back(f); } for(auto& t : v) { t.join(); } std::cout << "Final counter value is " << cnt << '\n'; }
Output:
Final counter value is 10000
[edit] std::memory_order_release and memory_order_acquire
Concurrent queues, double-checked locking and other producer-consumer situations require release ordering in the publisher thread and acquire ordering in the consumer thread. This pattern establishes pairwise synchronization between threads.
#include <thread> #include <atomic> #include <cassert> std::atomic<int> x = ATOMIC_VAR_INIT(0); std::atomic<int> y = ATOMIC_VAR_INIT(0); void thread1() { x.store(1, std::memory_order_release); y.store(1, std::memory_order_release); } void thread2() { int a = y.load(std::memory_order_acquire); int b = x.load(std::memory_order_acquire); if (a == 1) { assert(b == 1); // may fail if memory_order_relaxed is used } } int main() { std::thread t1(thread1); std::thread t2(thread2); t1.join(); t2.join(); }
[edit] std::memory_order_ack_rel
The follow example demonstrates transitive release-acquire ordering across three threads
#include <thread> #include <atomic> #include <cassert> #include <vector> std::vector<int> data; std::atomic<int> flag = ATOMIC_VAR_INIT(0); void thread_1() { data.push_back(42); flag.store(1, std::memory_order_release); } void thread_2() { int expected=1; while (!flag.compare_exchange_strong(expected, 2, std::memory_order_acq_rel)) { expected = 1; } } void thread_3() { while (flag.load(std::memory_order_acquire) < 2) ; assert(data.at(0) == 42); // will never fire } int main() { std::thread a(thread_1); std::thread b(thread_2); std::thread c(thread_3); a.join(); b.join(); c.join(); }
[edit] std::memory_order_release and std::memory_order_consume
This example demonstrates dependency-ordered synchronization: the integer data is not related to the pointer to string by a data-dependency relationship, thus its value is undefined in the consumer.
#include <thread> #include <atomic> #include <cassert> #include <string> std::atomic<std::string*> ptr; int data; void producer() { std::string* p = new std::string("Hello"); data = 42; ptr.store(p, std::memory_order_release); } void consumer() { std::string* p2; while (!(p2 = ptr.load(std::memory_order_consume))) ; assert(*p2 == "Hello"); // never fires assert(data == 42); // may or may not fire } int main() { std::thread t1(producer); std::thread t2(consumer); t1.join(); t2.join(); }