diff --git a/CppCoreGuidelines.md b/CppCoreGuidelines.md
index 2c55aa1..d278e5e 100644
--- a/CppCoreGuidelines.md
+++ b/CppCoreGuidelines.md
@@ -1,6 +1,6 @@
# C++ Core Guidelines
-April 17, 2016
+April 23, 2016
Editors:
@@ -10155,8 +10155,13 @@ Performance is very sensitive to cache performance and cache algorithms favor si
# CP: Concurrency and Parallelism
-The core component of concurrent and parallel programming is the thread. Threads
-allow you to run multiple instances of your program independently, while sharing
+We often want our computers to do many tasks at the same time (or at least make them appear to do them at the same time).
+The reasons for doing so varies (e.g., wanting to wait for many events using only a single processor, processing many data streams simultaneously, or utilizing many hardware facilities)
+and so does the basic facilities for expressing concurrency and parallelism.
+Here, we articulate a few general principles and rules for using the ISO standard C++ facilities for expressing basic concurrency and parallelism.
+
+The core machine support for concurrent and parallel programming is the thread.
+Threads allow you to run multiple instances of your program independently, while sharing
the same memory. Concurrent programming is tricky for many reasons, most
importantly that it is undefined behavior to read data in one thread after it
was written by another thread, if there is no proper synchronization between
@@ -10181,36 +10186,50 @@ them, and it seems that there is much interest in making the writing of
concurrent programs in C++ even easier. We expect some of the library-related
guidance here to change significantly over time.
+This section needs a lot of work (obviously).
+Please note that we start with rules for relative non-experts.
+Real experts must wait a bit;
+contributions are welcome,
+but please think about the majority of programmers who are struggling to get their concurrent programs correct and performant.
+
Concurrency and parallelism rule summary:
* [CP.1: Assume that your code will run as part of a multi-threaded program](#Rconc-multi)
* [CP.2: Avoid data races](#Rconc-races)
+* [CP.3: Minimize explicit sharing of writable data](#Rconc-data)
+* [CP.4: Think in terms of tasks, rather than threads](#Rconc-task)
+* [CP.8 Don't try to use `volatile` for synchronization](#Rconc-volatile)
See also:
* [CP.con: Concurrency](#SScp-con)
* [CP.par: Parallelism](#SScp-par)
-* [CP.simd: SIMD](#SScp-simd)
+* [CP.mess: Message passing](#SScp-mess)
+* [CP.vec: Vectorization](#SScp-vec)
* [CP.free: Lock-free programming](#SScp-free)
+* [CP.etc: Etc. concurrency rules](#SScp-etc)
### CP.1: Assume that your code will run as part of a multi-threaded program
##### Reason
-It is hard to be certain that concurrency isn't used now or sometime in the future.
+It is hard to be certain that concurrency isn't used now or will be sometime in the future.
Code gets re-used.
Libraries using threads may be used from some other part of the program.
Note that this applies most urgently to library code and least urgently to stand-alone applications.
+However, thanks to the magic of cut-and-paste, code fragments can turn up in unexpected places.
##### Example
- double cached_computation(int x)
+ double cached_computation(double x)
{
- static int cached_x = 0;
+ static double cached_x = 0.0;
static double cached_result = COMPUTATION_OF_ZERO;
+ double result;
+
if (cached_x == x)
return cached_result;
- double result = computation(x);
+ result = computation(x);
cached_x = x;
cached_result = result;
return result;
@@ -10227,8 +10246,8 @@ There are several ways that this example could be made safe for a multi-threaded
* Refuse to build and/or run in a multi-threaded environment.
* Provide two implementations, one which is used in single-threaded environments and another which is used in multi-threaded environments.
-**Exception**: There are examples where code will never be run in a multi-threaded environment.
-However, there are also many examples where code that was "known" to never run in a multi-threaded program
+**Exception**: Code that is never run in a multi-threaded environment.
+Be careful: there are many examples where code that was "known" to never run in a multi-threaded program
was run as part of a multi-threaded program. Often years later.
Typically, such programs lead to a painful effort to remove data races.
Therefore, code that is never intended to run in a multi-threaded environment should be clearly labeled as such and ideally come with compile or run-time enforcement mechanisms to catch those usage bugs early.
@@ -10241,9 +10260,10 @@ Unless you do, nothing is guaranteed to work and subtle errors will persist.
##### Note
-In a nutshell, if two threads can access the same named object concurrently (without synchronization), and at least one is a writer (performing a non-`const` operation), you have a data race. For further information of how to use synchronization well to eliminate data races, please consult a good book about concurrency.
+In a nutshell, if two threads can access the same object concurrently (without synchronization), and at least one is a writer (performing a non-`const` operation), you have a data race.
+For further information of how to use synchronization well to eliminate data races, please consult a good book about concurrency.
-##### Example
+##### Example, bad
There are many examples of data races that exist, some of which are running in
production software at this very moment. One very simple example:
@@ -10262,88 +10282,1137 @@ including:
`id` plus one.
* Thread A and B load `id` and increment it simultaneously. They both get the
same ID.
+
+Local static variables are a common source of data races.
+
+##### Example, bad:
+
+ void f(fstream& fs, regex pat)
+ {
+ array buf;
+ int sz = read_vec(fs,buf,max); // read from fs into buf
+ gsl::span s {buf,max};
+ // ...
+ auto h1 = async([&]{ sort(par,s); }); // spawn a task to sort
+ // ...
+ auto h2 = async([&]{ return find_all(buf,sz,pat); }); // span a task to find matches
+ // ...
+ }
+
+Here, we have a (nasty) data race on the elements of `buf` (`sort` will both read and write).
+All data races are nasty.
+Here, we managed to get a data race on data on the stack.
+Not all data races are as easy to spot as this one.
+
+##### Example, bad:
+
+ // code not controlled by a lock
+
+ unsigned val;
+
+ if (val < 5) {
+ // ... other thread can change val here ...
+ switch(val) {
+ case 0: // ...
+ case 1: // ...
+ case 2: // ...
+ case 3: // ...
+ case 4: // ...
+ }
+ }
+
+Now, a compiler that does not know that `val` can change will most likely implement that `switch` using a jump table with five entries.
+Then, a `val` outside the [0..4] range will cause a jump to an address that could be anywhere in the program, and execution would proceed there.
+Really, "all bets are off" if you get a data race.
+Actually, it can be worse still: by looking at the generated code you may be able to determine where the stray jump will go for a given value;
+this can be a security risk.
##### Enforcement
Some is possible, do at least something.
+There are commercial and open-source tools that try to address this problem, but static tools often have many false positives and run-time tools often have a significant cost.
+We hope for better tools.
+
+Help the tools:
+
+* less global data
+* fewer `static` variables
+* more use of stack memory (and don't pass pointers around too much)
+* more immutable data (literals, `constexpr`, and `const`)
+
+### CP.3: Minimize explicit sharing of writable data
+
+##### Reason
+
+If you don't share writable data, you can't have a data race.
+The less sharing you do, the less chance you have to forget to synchanize access (and get data races).
+The less sharing you do, the less chance you have to wait on a lock (so performance can improve).
+
+##### Example
+
+ bool validate(const vector&);
+ Graph validate(const vector&);
+ Image validate(const vector&);
+ // ...
+
+ void process_readings(istream& socket1)
+ {
+ vector surface_readings;
+ socket1 >> surface_readings;
+ if (!socket1) throw Bad_input{};
+
+ auto h1 = async([&] { if (!validate(surface_readings) throw Invalide_data{}; });
+ auto h2 = async([&] { return temparature_gradiants(surface_readings); });
+ auto h3 = async([&] { return altitude_map(surface_readings); });
+ // ...
+ auto v1 = h1.get();
+ auto v2 = h2.get();
+ auto v3 = h3.get();
+ // ...
+ }
+
+Without those `const`s, we would have to review every asynchroneously invoked function for potential data races on `surface_readings`.
+
+##### Note
+
+Immutable data can be safely and efficiently shared.
+No locking is needed: You can't have a data race on a constant.
+
+##### Enforcement
+
+???
+
+
+### CP.4: Think in terms of tasks, rather than threads
+
+##### Reason
+
+A `thread` is a implementation concept, a way of thinking about the machine.
+A task is an application notion, something you'd like to do, preferably concurrently with other tasks.
+Application concepts are easier to reason about.
+
+##### Example
+
+ ???
+
+###### Note
+
+With the exception of `async()`, the standard-library facilities are low-level, machine-oriented, threads-and-lock level.
+This is a necessary foundation, but we have to try to raise the level of abstrcation: for productivity, for reliability, and for performance.
+This is a potent argument for using higher level, more applications-oriented libraries (if possibly, built on top of standard-library facilities).
+
+##### Enforcement
+
+???
+
+### CP.8 Don't try to use `volatile` for synchronization
+
+##### Reason
+
+In C++, unlike some other languages, `volatile` does not provide atomicity, does not synchronize between threads,
+and does not prevent instruction reordering (neither compiler nor hardware).
+It simply has nothing to do with concurrency.
+
+##### Example, bad:
+
+ int free_slots = max_slots; // current source of memory for objects
+
+ Pool* use()
+ {
+ if (int n = free_slots--) return &pool[n];
+ }
+
+Here we have a problem:
+This is perfectly good code in a single-threaded program, but have two treads exectute this and
+there is a race condition on `free_slots` so that two threads might get the same value and `free_slots`.
+That's (obviously) a bad data race, so people trained in other languages may try to fix it like this:
+
+ volatile int free_slots = max_slots; // current source of memory for objects
+
+ Pool* use()
+ {
+ if (int n = free_slots--) return &pool[n];
+ }
+
+This has no effect on synchronization: The data race is still there!
+
+The C++ mechanism for this is `atomic` types:
+
+ atomic free_slots = max_slots; // current source of memory for objects
+
+ Pool* use()
+ {
+ if (int n = free_slots--) return &pool[n];
+ }
+
+Now the `--` operation is atomic,
+rather than a read-increment-write sequence where another thread might get in-between the individual operations.
+
+##### Alternative
+
+Use `atomic` types where you might have used `volatile` in some other language.
+Use a `mutex` for more complicated examples.
+
+##### See also
+
+[(rare) proper uses of `volatile`](#Rconc-volatile2)
## CP.con: Concurrency
-???
+This section focuses on relatively ad-hoc uses of multiple threads communicating through shared data.
+
+* For parallel algorithms, see [parallelism](#SScp-par)
+* For inter-task communication without explicit sharing, see [messaging](#SScp-mess)
+* For vector parallel code, see [vectorization](#SScp-vec)
+* For lock-free programming, see [lock free](#SScp-free)
Concurrency rule summary:
-* ???
-* ???
+* [CP.20: Use RAII, never plain `lock()`/`unlock()`](#Rconc-raii)
+* [CP.21: Use `std::lock()` to acquire multiple `mutex`es](#Rconc-lock)
+* [CP.22: Never call unknown code while holding a lock (e.g., a callback)](#Rconc-unknown)
+* [CP.23: Think of a joining `thread` as a scoped container](#Rconc-join)
+* [CP.24: Think of a detached `thread` as a global container](#Rconc-detach)
+* [CP.25: Prefer `gsl::raii_thread` over `std::thread` unless you plan to `detach()`](#Rconc-raii_thread)
+* [CP.26: Prefer `gsl::detached_thread` over `std::thread` if you plan to `detach()`](#Rconc-detached_thread)
+* [CP.27: Use plain `std::thread` for `thread`s that detach based on a run-time condition (only)](#Rconc-thread)
+* [CP.28: Remember to join scoped `thread`s that are not `detach()`ed](#Rconc-join)
+* [CP.30: Do not pass pointers to local variables to non-`raii_thread's](#Rconc-pass)
+* [CP.31: Pass small amounts of data between threads by value, reather by reference or pointer](#Rconc-data)
+* [CP.32: To share ownership beween unrelated `thread`s use `shared_ptr`](#Rconc-shared)
+* [CP.40: Minimize context switching](#Rconc-switch)
+* [CP.41: Minimize thread creation and destruction](#Rconc-create)
+* [CP.42: Don't `wait` without a condition](#Rconc-wait)
+* [CP.43: Minimize time spent in a critical section](#Rconc-time)
+* [CP.44: Remember to name your `lock_guard`s and `unique_lock`s](#Rconc-name)
+* [CP.50: Define a `mutex` together with the data it protects](#Rconc-mutex)
+* ??? when to use a spinlock
+* ??? when to use `try_lock()`
+* ??? when to prefer `lock_guard` over `unique_lock`
+* ??? Time multiplexing
+* ??? when/how to use `new thread`
-???? should there be a "use X rather than `std::async`" where X is something that would use a better specified thread pool?
+### CP.20: Use RAII, never plain `lock()`/`unlock()`
-Speaking of concurrency, should there be a note about the dangers of `std::atomic` (weapons)?
-A lot of people, myself included, like to experiment with `std::memory_order`, but it is perhaps best to keep a close watch on those things in production code.
-Even vendors mess this up: Microsoft had to fix their `shared_ptr` (weak refcount decrement wasn't synchronized-with the destructor, if I recall correctly, although it was only a problem on ARM, not Intel)
-and everyone (gcc, clang, Microsoft, and Intel) had to fix their `compare_exchange_*` this year, after an implementation bug caused losses to some finance company and they were kind enough to let the community know.
+##### Reason
-It’s worth noting that `volatile` in C++ is not related to concurrency or
-parallelism in any way. Some languages have chosen to give it threading-related
-semantics, so programmers familiar with such languages tend to think that the
-meaning is similar. Sadly, these programmers are mistaken. The C++ standard
-provides some ordering guarantees on volatile operations, but these guarantees
-are far fewer and weaker than the guarantees on threading primitives. Therefore,
-using `volatile` in place of threading primitives in portable code is both
-unsafe and highly discouraged.
+Avoids nasty errors from unreleased locks.
- if (source->pool != YARROW_FAST_POOL && source->pool != YARROW_SLOW_POOL) {
- THROW(YARROW_BAD_SOURCE);
+##### Example, bad
+
+ mutex mtx;
+
+ void do_stuff()
+ {
+ mtx.lock();
+ // ... do stuff ...
+ mtx.unlock();
+ }
+
+Sooner or later, someone will forget the `mtx.unlock()`, place a `return` in the `... do stuff ...`, throw an exception, or something.
+
+##### Example
+
+ mutex mtx;
+
+ void do_stuff()
+ {
+ unique_lock lck {mtx};
+ // ... do stuff ...
}
-??? Is `std::async` worth using in light of future (and even existing, as libraries) parallelism facilities? What should the guidelines recommend if someone wants to parallelize, e.g., `std::accumulate` (with the additional precondition of commutativity), or merge sort?
+##### Enforcement
+
+Flag calls of member `lock()` and `unlock()`. ???
+
+
+### CP.21: Use `std::lock()` to acquire multiple `mutex`es
+
+##### Reason
+
+To avoid deadlocks on multiple `mutex`s
+
+##### Example
+
+This is asking for deadlock:
+
+ // thread 1
+ lock_guard lck1(m1);
+ lock_guard lck2(m2);
+
+ // thread 2
+ lock_guard lck2(m2);
+ lock_guard lck1(m1);
+
+Instead, use `lock()`:
+
+ // thread 1
+ lock_guard lck1(m1,defer_lock);
+ lock_guard lck2(m2,defer_lock);
+ lock(lck1,lck2);
+
+ // thread 2
+ lock_guard lck2(m2,defer_lock);
+ lock_guard lck1(m1,defer_lock);
+ lock(lck2,lck1);
+
+Here, the writers of `thread1` and `thread2` are still not agreeing on the order of the `mutex`es, but order no longer matters.
+
+##### Note
+
+In real code, `mutex`es are rarely named to conveniently remind the programmer of an intended relation and intended order of acquisition.
+In real code, `mutex`es are not always conveniently aquired on consequtive lines.
+
+I'm really looking forward to be able to write plain
+
+ lock_guard lck1(m1,defer_lock);
+
+and have the `mutex` type deduced.
+
+##### Enforcement
+
+Detect the acquistion of multiple `mutex`es.
+This is undecidable in general, but catching common simple examples (like the one above) is easy.
+
+
+### CP.22: Never call unknown code while holding a lock (e.g., a callback)
+
+##### Reason
+
+If you don't know what a piece of code does, you are risking deadlock.
+
+##### Example
+
+ void do_this(Foo* p)
+ {
+ lock_guard lck {my_mutex};
+ // ... do something ...
+ p->act(my_data);
+ // ...
+ }
+
+If you don't know what `Foo::act` does (maybe it is a virtual function invoking a derived class member of a class not yet written),
+it may call `do_this` (recursively) and cause a deadlock on `my_mutex`.
+Maybe it will lock on a different mutex and not return in a reasonable time, causing delays to any code calling `do_this`.
+
+##### Example
+
+A common example of the "calling unknown code" problem is a call to a function that tries to gain locked access to the same object.
+Such problem cal often be solved by using a `recursive_mutex`. For example:
+
+ recursive_mutex my_mutex;
+
+ template
+ void do_something(Action f)
+ {
+ unique_lock lck {my_mutex};
+ // ... do something ...
+ f(this); // f will do something to *this
+ // ...
+ }
+
+If, as it is likely, `f()` invokes operations on `*this`, we must make sure that the object's invariant holds before the call.
+
+##### Enforcement
+
+* Flag calling a virtual function with a non-recursive `mutex` held
+* Flag calling a callback with a non-recursive `mutex` held
+
+
+### CP.23: Think of a joining `thread` as a scoped container
+
+##### Reason
+
+To maintain pointer safety and avoid leaks, we need to consider what pointers a used by a `thread`.
+If a `thread` joins, we can safely pass pointers to objects in the scope of the `thread` and its enclosing scopes.
+
+##### Example
+
+ void f(int * p)
+ {
+ // ...
+ *p = 99;
+ // ...
+ }
+ int glob = 33;
+
+ void some_fct(int* p)
+ {
+ int x = 77;
+ raii_thread t0(f,&x); // OK
+ raii_thread t1(f,p); // OK
+ raii_thread t2(f,&glob); // OK
+ auto q = make_unique(99);
+ raii_thread t3(f,q.get()); // OK
+ // ...
+ }
+
+An `raii_thread` is a `std::thread` with a destructor that joined and cannot be `detached()`.
+By "OK" we mean that the object will be in scope ("live") for as long as a `thread` can use the pointer to it.
+The fact that `thread`s run concurrently doesn't affect the lifetime or ownership issues here;
+these `thread`s can be seen as just a function object called from `some_fct`.
+
+##### Enforcement
+
+Ensure that `raii_thread`s don't `detach()`.
+After that, the usual lifetime and ownership (for local objects) enforcement applies.
+
+
+### CP.24: Think of a detached `thread` as a global container
+
+##### Reason
+
+To maintain pointer safety and avoid leaks, we need to consider what pointers a used by a `thread`.
+If a `thread` is detached, we can safely pass pointers to static and free store objects (only).
+
+##### Example
+
+ void f(int * p)
+ {
+ // ...
+ *p = 99;
+ // ...
+ }
+
+ int glob = 33;
+
+ void some_fct(int* p)
+ {
+ int x = 77;
+ std::thread t0(f,&x); // bad
+ std::thread t1(f,p); // bad
+ std::thread t2(f,&glob); // OK
+ auto q = make_unique(99);
+ std::thread t3(f,q.get()); // bad
+ // ...
+ t0.detach();
+ t1.detach();
+ t2.detach();
+ t3.detach();
+ // ...
+ }
+
+By "OK" we mean that the object will be in scope ("live") for as long as a `thread` can use the pointers to it.
+By "bad" we mean that a `thread` may use a pointer after the pointed-to object is destroyed.
+The fact that `thread`s run concurrently doesn't affect the lifetime or ownership issues here;
+these `thread`s can be seen as just a function object called from `some_fct`.
+
+##### Enforcement
+
+In general, it is undecidable whether a `detach()` is executed for a `thread`, but simple common cases are easily detected.
+If we cannot prove that a `thread` does not `detatch()`, we must assune that it does and that it outlives the scope in which it was constructed;
+After that, the usual lifetime and ownership (for global objects) enforcement applies.
+
+
+### CP.25: Prefer `gsl::raii_thread` over `std::thread` unless you plan to `detach()`
+
+##### Reason
+
+An `raii_thread` is a thread that joins at the end of its scope.
+
+Detatched threads are hard to monitor.
+
+??? Place all "immortal threads" on the free store rather than `detach()`?
+
+##### Example
+
+ ???
+
+##### Enforcement
+
+???
+
+### CP.26: Prefer `gsl::detached_thread` over `std::thread` if you plan to `detach()`
+
+##### Reason
+
+Often, the need to `detach` is inherent in the `thread`s task.
+Documenting that aids comprehension and helps static analysis.
+
+##### Example
+
+ void heartbeat();
+
+ void use()
+ {
+ gsl::detached_thread t1(heartbeat); // obviously need not be joined
+ std::thread t2(heartbeat); // do we need to join? (read the code for heartbeat())
+ // ...
+ }
+
+##### Enforcement
+
+Flag unconditional `detach` on a plain `thread`
+
+
+### CP.27: Use plain `std::thread` for `thread`s that detach based on a run-time condition (only)
+
+##### Reason
+
+`thread`s that are supposed to unconditionally `join` or unconditionally `detach` can be clearly identified as such.
+The plain `thread`s should be assumed to use the full generality of `std::thread`.
+
+##### Example
+
+ void tricky(thread* t, int n)
+ {
+ // ...
+ if (is_odd(n))
+ t->detach();
+ // ...
+ }
+
+ void use(int n)
+ {
+ thread t { thricky, this, n };
+ // ...
+ // ... should I join here? ...
+ }
+
+##### Enforcement
+
+???
+
+
+
+### CP.28: Remember to join scoped `thread`s that are not `detach()`ed
+
+##### Reason
+
+A `thread` that has not been `detach()`ed when it is destroyed terminates the program.
+
+##### Example, bad
+
+ void f() { std::cout << "Hello "; }
+
+ struct F {
+ void operator()() { std::cout << "parallel world "; }
+ };
+
+ int main()
+ {
+ std::thread t1{f}; // f() executes in separate thread
+ std::thread t2{F()}; // F()() executes in separate thread
+ } // spot the bugs
+
+##### Example
+
+ void f() { std::cout << "Hello "; }
+
+ struct F {
+ void operator()() { std::cout << "parallel world "; }
+ };
+
+ int main()
+ {
+ std::thread t1{f}; // f() executes in separate thread
+ std::thread t2{F()}; // F()() executes in separate thread
+
+ t1.join();
+ t2.join();
+ } // one bad bug left
+
+??? Is `cout` synchronized?
+
+##### Enforcement
+
+* Flag `join's for `raii_thread`s ???
+* Flag `detach`s for `detached_thread`s
+
+
+### CP.30: Do not pass pointers to local variables to non-`raii_thread's
+
+###### Reason
+
+In general, you cannot know whether a non-`raii_thread` will outlife your thread (so that those pointers will become invalid.
+
+##### Example, bad
+
+ void use()
+ {
+ int x = 7;
+ thread t0 { f, ref(x) };
+ // ...
+ t0.detach();
+ }
+
+The detach` may not be so easy to spot.
+Use a `raii_thread` or don't pass the pointer.
+
+##### Example, bad
+
+ ??? put pointer to a local on a queue that is read by a longer-lived thread ???
+
+##### Enforcement
+
+Flage pointers to locals passed in the constructor of a plain `thread`.
+
+
+### CP.31: Pass small amounts of data between threads by value, reather by reference or pointer
+
+##### Reason
+
+Copying a small amount of data is cheaper to copy and access than to share it using some locking mechanism.
+Copying naturally gives unique ownership (simplifies code) and eliminates the possibility of data races.
+
+##### Note
+
+Defining "small amount" precisely and is impossible.
+
+##### Example
+
+ string modify1(string);
+ void modify2(shared_ptr[CP.32: To share ownership beween unrelated `thread`s use `shared_ptr`
+
+##### Reason
+
+If treads are unrelated (that is, not known to be in the same scope or one within the lifetime of the other)
+and they need to share free store memory that needs to be deleted, a `shared_ptr` (or equivalent) is the only
+safe way to ensure proper deletion.
+
+##### Example
+
+ ???
+
+##### Note
+
+* A static object (e.g. a global) can be shard because it is not owned in the sense that some thread is responsible for it's deletion.
+* An object on free store that is never to be deleted can be shared.
+* An object owned by one thread can be safely shared with another as long as that second thread doesn't outlive the owner.
+
+##### Enforcement
+
+???
+
+
+### CP.40: Minimize context switching
+
+##### Reason
+
+Context swtiches are expesive.
+
+##### Example
+
+ ???
+
+##### Enforcement
+
+???
+
+
+### CP.41: Minimize thread creation and destruction
+
+##### Reason
+
+Thread creation is expensive.
+
+##### Example
+
+ void worker(Message m)
+ {
+ // process
+ }
+
+ void master(istream& is)
+ {
+ for (Message m; is>>m; )
+ run_list.push_back(new thread(worker,m);}
+ }
+
+This spawns a `thread` per message, and the `run_list` is presumably managed to destroy those tasks once they are finished.
+
+Instead, we could have a set of pre-created worker threads processing the messages
+
+ Sync_queue work;
+
+ void master(istream& is)
+ {
+ for (Message m; is>>m; )
+ work.put(n);
+ }
+
+ void worker()
+ {
+ for (Message m; m=work.get(); ) {
+ // process
+ }
+ }
+
+ void workers() // set up worker threads (specifically 4 worker threads)
+ {
+ raii_thread w1 {worker};
+ raii_thread w2 {worker};
+ raii_thread w3 {worker};
+ raii_thread w4 {worker};
+ }
+
+###### Note
+
+If you system has a good thread pool, use it.
+If your system has a good message queue, use it.
+
+##### Enforcement
+
+???
+
+
+### CP.42: Don't `wait` without a condition
+
+##### Reason
+
+A `wait` without a condition can miss a wakeup or wake up simply to find that there is no work to do.
+
+##### Example, bad
+
+ std::condition_variable cv;
+ std::mutex mx;
+
+ void thread1()
+ {
+ while (true) {
+ // do some work ...
+ std::unique_lock lock(mx);
+ cv.notify_one(); // wake other thread
+ }
+ }
+
+ void thread2()
+ {
+ while (true) {
+ std::unique_lock lock(mx);
+ cv.wait(lock); // might block forever
+ // do work ...
+ }
+ }
+
+Here, if some other `thread` consumes `thread1`'s notification, `thread2` can wait forever.
+
+##### Example
+
+ template
+ class Sync_queue {
+ public:
+ void put(const T& val);
+ void put(T&& val);
+ void get(T& val);
+ private:
+ mutex mtx;
+ condition_variable cond; // this controls access
+ list q;
+ };
+
+ template
+ void Sync_queue::put(const T& val)
+ {
+ lock_guard lck(mtx);
+ q.push_back(val);
+ cond.notify_one();
+ }
+
+ template
+ void Sync_queue::get(T& val)
+ {
+ unique_lock lck(mtx);
+ cond.wait(lck,[this]{ return !q.empty(); }); // prevent spurious wakeup
+ val=q.front();
+ q.pop_front();
+ }
+
+Now if the queue is empty when a thread executing `get()` wakes up (e.g., because another thread has gotton to `get()` before it),
+it will immediately go back to sleep, waiting.
+
+##### Enforcement
+
+Flag all `waits` without conditions.
+
+
+### CP.43: Minimize time spent in a critical section
+
+##### Reason
+
+The less time is spent with a `mutex` taken, the less chance that another `thread` has to wait,
+and `thread` suspection and resumption are expensive.
+
+##### Example
+
+ void do_something() // bad
+ {
+ unique_lock lck(my_lock);
+ do0(); // preparation: does not need lock
+ do1(); // transaction: needs locking
+ do2(); // cleanup: does not need locking
+ }
+
+Here, we are holding the lock for longer than necessary:
+We should not have taken the lock before we needed it and should have released it again before starting the cleanup.
+We could rewrite this to
+
+ void do_something() // bad
+ {
+ do0(); // preparation: does not need lock
+ my_lock.lock();
+ do1(); // transaction: needs locking
+ my_lock.unluck();
+ do2(); // cleanup: does not need locking
+ }
+
+But that compromises safety and violates the [use RAII](#Rconc-raii) rule.
+Instead, add a block for the critical section:
+
+ void do_something() // OK
+ {
+ do0(); // preparation: does not need lock
+ {
+ unique_lock lck(my_lock);
+ do1(); // transaction: needs locking
+ }
+ do2(); // cleanup: does not need locking
+ }
+
+##### Enforcement
+
+Impossible in general.
+Flag "naked" `lock()` and `unlock()`.
+
+
+### CP.44: Remember to name your `lock_guard`s and `unique_lock`s
+
+##### Reason
+
+An unnamed local objects is a temporary that immediately goes out of scope.
+
+##### Example
+
+ unique_lock(m1);
+ lock_guard {m2};
+ lock(m1,m2);
+
+This looks innocent enough, but it isn't.
+
+##### Enforcement
+
+Flag all unnamed `lock_guard`s and `unique_lock`s.
+
+
+
+### P.50: Define a `mutex` together with the data it guards
+
+##### Reason
+
+It should be obvious to a reader that the data is to be guarded and how.
+
+##### Example
+
+ struct Record {
+ std::mutex m; // take this mutex before accessing other members
+ // ...
+ };
+
+##### Enforcement
+
+??? Possible?
-???UNIX signal handling???. May be worth reminding how little is async-signal-safe, and how to communicate with a signal handler (best is probably "not at all")
## CP.par: Parallelism
-???
+By "parallelism" we refer to a performing a task (more or less) simultaneously ("in parallel with") on many data items.
Parallelism rule summary:
* ???
* ???
+* Where appropriate, prefer the standard-library parallel algorithms
+* Use algorithms that are designed for parallelism, not algorithms with unnecessary depaendency on linear evaluation
-## CP.simd: SIMD
+
+
+## CP.mess: Message passing
+
+The standard-library facilities are quite low level, focused on the needs of close-to the hardware critical programming using `thread`s, `mutex`ex, `atomic` types, etc.
+Most people shouldn't work at this level: it's error-prone and development is slow.
+If possible, use a higher level facility: messaging libraries, parallel algorithms, and vectorization.
+This section looks at passing messages so that a programmer doesn't have to do explicit synchronization.
+
+Message passing rules summary:
+
+* [CP.60: Use a `future` to return a value from a concurrent task](#Rconc-future)
+* [CP.61: Use a `async()` to spawn a concurrent task](#Rconc-async)
+* message queues
+* messaging libraries
+
+???? should there be a "use X rather than `std::async`" where X is something that would use a better specified thread pool?
+
+??? Is `std::async` worth using in light of future (and even existing, as libraries) parallelism facilities? What should the guidelines recommend if someone wants to parallelize, e.g., `std::accumulate` (with the additional precondition of commutativity), or merge sort?
+
+
+### CP.60: Use a `future` to return a value from a concurrent task
+
+##### Reason
+
+A `future` preserves the usual function call return semantics for asynchronous tasks.
+The is no explicit locking and both correct (value) return and error (exception) return are handled simply.
+
+##### Example
+
+ ???
+
+##### Note
???
-SIMD rule summary:
+##### Enforcement
+
+???
+
+### CP.61: Use a `async()` to spawn a concurrent task
+
+##### Reason
+
+A `future` preserves the usual function call return semantics for asynchronous tasks.
+The is no explicit locking and both correct (value) return and error (exception) return are handled simply.
+
+##### Example
+
+ ???
+
+##### Note
+
+Unfortunately, `async()` is not perfect.
+For example, there is no guarantee that a thread pool is used to minimize thread construction.
+In fact, most current `async()` implementations don't.
+However, `async()` is simple and logically correct so until something better comes along
+and unless you really need to optimize for many asynchroneous tasks, stick with `async()`.
+
+##### Enforcement
+
+???
+
+
+## CP.vec: Vectorization
+
+Vectorization is a technique for executing a number of tasks concurrently without introducing explicit synchronization.
+An operation is simply applied to elements of a data structure (a vector, an array, etc.) in parallel.
+Vectorization has the interesting property of often requiring no non-local changes to a program.
+However, vectorization works best with simple data structures and with algorithms specifically crafted to enable it.
+
+Vectorization rule summary:
* ???
* ???
## CP.free: Lock-free programming
-Lock-free programming is writing concurrent code without the use of
-locks. Because there are no locks, lock-free algorithms tend to be far more
-subtle and error-prone than their locked counterparts. Many operations that are
-trivial with locking (e.g. deleting a link from a shared linked list) are much
-harder without them (following the example, how do you know you’re the *only*
-thread inspecting that particular link, so you can free it?)
-
-Because of the added difficulty, expert-level knowledge of many subsystems,
-including the hardware your program is running on, is generally required in
-order to write an efficient and correct lock-free algorithm.
+Synchrionization using `mutex`es and `condition_variable`s can be relatively expensive.
+Furthermore, it can lead to deadlock.
+For performance and to eliminate the possibility of deadlock, we sometimes have to use the tricky low-level "lock-free" facilities
+that rely on briefly gaining exclusive ("atomic") access to memory.
+Lock free programming is also used to implement higher-level concurrency mechanisms, such as `thread`s and `mutex`es.
Lock-free programming rule summary:
-* ???
-* ???
+* [CP.100: Don't use lock-free programming unless you absolutely have to](#Rconc-lockfree)
+* [CP.101: Distrust your hardware/compiler combination](#Rconc-distrust)
+* [CP.102: Carefully study the literature](#Rconc-litterature)
+* how/when to use atomics
+* avoid starvation
+* use a lock free data structure rather than hand-crafting specific lock-free access
+* [CP.110: Use a conventional pattern for double-checked locking](#Rconc-double)
+* how/when to compare and swap
-### Don't use lock-free programming unless you absolutely have to
+
+### CP.100: Don't use lock-free programming unless you absolutely have to
##### Reason
It's error-prone and requires expert level knowledge of language features, machine architecture, and data structures.
+##### Example, bad
+
+ extern atomic head; // the shared head of a linked list
+
+ Link* nh = new Link(data,nullptr); // make a link ready for insertion
+ Link* h = head.load(); // read the shared head of the list
+
+ do {
+ if (h->data<=data) break; // if so, insert elsewhere
+ nh->next = h; // next element is the previous head
+ } while (!head.compare_exchange_weak(h,nh)); // write nh to head or to h
+
+Spot the bug.
+It would be really hard to find through testing.
+Read up on the ABA problem.
+
+**Exception**: [Atomic variables](#???) can be used simply and safely.
+
+##### Note
+
+Higher-level concurrency mechanisms, such as `thread`s and `mutex`es are implemented using lock-free programming.
+
**Alternative**: Use lock-free data structures implemented by others as part of some library.
+
+### CP.101: Distrust your hardware/compiler combination
+
+##### Reason
+
+The low-level hardware interfaces used by lock-free progrmming are among the hardest to implement well and among
+the areas where the most subtle portability problems occur.
+If you are doing lock-free programming for performance, you need to check for regressions.
+
+##### Note
+
+Instruction reordering (static and dynamic) makes it hard for us to think effectively at this level (especially if you use relaxed memory models).
+Experience, (semi)formal models and model checking can be useful.
+Testing - often to an extreme extent - is essential.
+“Don’t fly too close to the wind.”
+
+##### Enforcement
+
+Have strong rules for re-testing in place that covers any change in hardware, operating system, compiler, and libraries.
+
+
+### CP.102: Carefully study the literature
+
+##### Reason
+
+With the exception of atomics and a few use standard patterns, lock-free programming is really an expert-only topic.
+Become an expert before shipping lock-free code for others to use.
+
+##### References
+
+* Anthony Williams: C++ concurrency in action. Manning Publications.
+* Boehm, Adve, You Don't Know Jack About Shared Variables or Memory Models , Communications of the ACM, Feb 2012.
+* Boehm, “Threads Basics”, HPL TR 2009-259.
+* Adve, Boehm, “Memory Models: A Case for Rethinking Parallel Languages and Hardware, Communications of the ACM, August 2010.
+* Boehm, Adve, “Foundations of the C++ Concurrency Memory Model”, PLDI 08.
+* Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber, “Mathematizing C++ Concurrency”, POPL 2011.
+* Damian Dechev, Peter Pirkelbauer, and Bjarne Stroustrup: Understanding and Effectively Preventing the ABA Problem in Descriptor-based Lock-free Designs. 13th IEEE Computer Society ISORC 2010 Symposium. May 2010.
+* Damian Dechev and Bjarne Stroustrup: Scalable Non-blocking Concurrent Objects for Mission Critical Code. ACM OOPSLA'09. October 2009
+* Damian Dechev, Peter Pirkelbauer, Nicolas Rouquette, and Bjarne Stroustrup: Semantically Enhanced Containers for Concurrent Real-Time Systems. Proc. 16th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems (IEEE ECBS). April 2009.
+
+
+### CP.110: Use a conventional pattern for double-checked locking
+
+##### Reason
+
+Double-checked locking is easy to mess up.
+
+##### Example
+
+ atomic x_init;
+
+ if (!x_init.load(memory_order_acquire) {
+ lock_guard lck(x_mutex);
+ if (!x_init.load(memory_order_relaxed) {
+ // ... initialize x ...
+ x_init.store(true, memory_order_release);
+ }
+ }
+
+ // ... use x ...
+
+
+##### Enforcement
+
+??? Is it possible to detect the idiom?
+
+
+## CP.etc: Etc. concurrency rules
+
+These rules defy simple catagorization:
+
+* [CP.200: Use `volatile` only to talk to non-C++ memory](#Rconc-volatile2)
+* [CP.201: ??? Signals](#Rconc-signal)
+
+### CP.200: Use `volatile` only to talk to non-C++ memory
+
+##### Reason
+
+`volatile` is used to refer to objectys that are shared with "non-C++" code or hardware that does not follow the C++ memory model.
+
+##### Example
+
+ const volatile long clock;
+
+This describes a register constantly updated by a clock circuit.
+`clock` is `volatile` because its value will change without any action from the C++ program that uses it.
+For example, reading `clock` twice will often yield two different values, so the optimizer had better not optimize away the second read in this code:
+
+ long t1 = clock;
+ // ... no use of clock here ...
+ long t2 = clock;
+
+`clock` is `const` because the program should not try to write to `clock`.
+
+###### Note
+
+Unless you are writing the lowest level code manipulating hardware directly, consider `volatile` an esoteric feature that is best avoided.
+
+###### Example
+
+Usually C++ code receives `volatile` memory that is owned Elsewhere (hardware or another language):
+
+ int volatile* vi = get_hardware_memory_location();
+ // note: we get a pointer to someone else's memory here
+ // volatile says "treat this with extra respect"
+
+Sometimes C++ code allocates the `volatile` memory and shares it with "elsewhere" (hardware or another language) by deliberately escaping a pointer:
+
+ static volatile long vl;
+ please_use_this(&vl); // escape a reference to this to "elsewhere" (not C++)
+
+##### Example; bad
+
+`volatile` local variables are nearly always wrong -- how can they be shared with other languages or hardware if they're emphemeral?
+The same applies almost as strongly to member variables, for the same reason.
+
+ void f() {
+ volatile int i = 0; // bad, volatile local variable
+ // etc.
+ }
+
+ class mytype {
+ volatile int i = 0; // suspicious, volatile member variable
+ // etc.
+ };
+
+##### Note
+
+In C++, unlike in some other languages, `volatile` has [nothing to do with synchrnization](#Rconc-volatile).
+
+##### Enforcement
+
+* Flag `volatile T` local and member variables; almost certainly you intended to use `atomic` instead.
+* ???
+
+### CP.201: ??? Signals
+
+???UNIX signal handling???. May be worth reminding how little is async-signal-safe, and how to communicate with a signal handler (best is probably "not at all")
+
+
# E: Error handling
Error handling involves: