diff --git a/CppCoreGuidelines.md b/CppCoreGuidelines.md index 4731486..136cf16 100644 --- a/CppCoreGuidelines.md +++ b/CppCoreGuidelines.md @@ -9891,10 +9891,10 @@ This also applies to `%`. ??? should this section be in the main guide??? -This section contains rules for people who needs high performance or low-latency. -That is, rules that relates to how to use as little time and as few resources as possible to achieve a task in a predictably short time. +This section contains rules for people who need high performance or low-latency. +That is, these are rules that relate to how to use as little time and as few resources as possible to achieve a task in a predictably short time. The rules in this section are more restrictive and intrusive than what is needed for many (most) applications. -Do not blindly try to follow them in general code because achieving the goals of low latency requires extra work. +Do not blindly try to follow them in general, code because achieving the goals of low latency requires extra work. Performance rule summary: @@ -10086,7 +10086,30 @@ Performance is very sensitive to cache performance and cache algorithms favor si # CP: Concurrency and Parallelism -??? +The core component of concurrent and parallel programming is the thread. Threads +allow you to run multiple instances of your program independently, while sharing +the same memory. Concurrent programming is tricky for many reasons, most +importantly that it is undefined behavior to read data in one thread after it +was written by another thread, if there is no proper synchronization between +those threads. Making existing single-threaded code execute concurrently can be +as trivial as adding `std::async` or `std::thread` strategically, or it can be +necessitate a full rewrite, depending on whether the original code was written +in a thread-friendly way. + +The concurrency/parallelism rules in this document are designed with three goals +in mind: +* To help you write code that is amenable to being used in a threaded + environment +* To show clean, safe ways to use the threading primitives offered by the + standard library +* To offer guidance on what to do when concurrency and parallelism aren't giving + you the performance gains you need + +It is also important to note that concurrency in C++ is an unfinished +story. C++11 introduced many core concurrency primitives, C++14 improved on +them, and it seems that there is much interest in making the writing of +concurrent programs in C++ even easier. We expect some of the library-related +guidance here to change significantly over time. Concurrency and parallelism rule summary: @@ -10151,6 +10174,26 @@ Unless you do, nothing is guaranteed to work and subtle errors will persist. In a nutshell, if two threads can access the same named object concurrently (without synchronization), and at least one is a writer (performing a non-`const` operation), you have a data race. For further information of how to use synchronization well to eliminate data races, please consult a good book about concurrency. +##### Example + +There are many examples of data races that exist, some of which are running in +production software at this very moment. One very simple example: + + int get_id() { + static int id = 1; + return i++; + } + +The increment here is an example of a data race. This can go wrong in many ways, +including: + +* Thread A loads the value of `id`, the OS context switches A out for some + period, during which other threads create hundreds of IDs. Thread A is then + allowed to run again, and `id` is written back to that location as A's read of + `id` plus one. +* Thread A and B load `id` and increment it simultaneously. They both get the + same ID. + ##### Enforcement Some is possible, do at least something. @@ -10171,7 +10214,14 @@ A lot of people, myself included, like to experiment with `std::memory_order`, b Even vendors mess this up: Microsoft had to fix their `shared_ptr` (weak refcount decrement wasn't synchronized-with the destructor, if I recall correctly, although it was only a problem on ARM, not Intel) and everyone (gcc, clang, Microsoft, and Intel) had to fix their `compare_exchange_*` this year, after an implementation bug caused losses to some finance company and they were kind enough to let the community know. -It should definitely be mentioned that `volatile` does not provide atomicity, does not synchronize between threads, and does not prevent instruction reordering (neither compiler nor hardware), and simply has nothing to do with concurrency. +It’s worth noting that `volatile` in C++ is not related to concurrency or +parallelism in any way. Some languages have chosen to give it threading-related +semantics, so programmers familiar with such languages tend to think that the +meaning is similar. Sadly, these programmers are mistaken. The C++ standard +provides some ordering guarantees on volatile operations, but these guarantees +are far fewer and weaker than the guarantees on threading primitives. Therefore, +using `volatile` in place of threading primitives in portable code is both +unsafe and highly discouraged. if (source->pool != YARROW_FAST_POOL && source->pool != YARROW_SLOW_POOL) { THROW(YARROW_BAD_SOURCE); @@ -10201,7 +10251,16 @@ SIMD rule summary: ## CP.free: Lock-free programming -??? +Lock-free programming is writing concurrent code without the use of +locks. Because there are no locks, lock-free algorithms tend to be far more +subtle and error-prone than their locked counterparts. Many operations that are +trivial with locking (e.g. deleting a link from a shared linked list) are much +harder without them (following the example, how do you know you’re the *only* +thread inspecting that particular link, so you can free it?) + +Because of the added difficulty, expert-level knowledge of many subsystems, +including the hardware your program is running on, is generally required in +order to write an efficient and correct lock-free algorithm. Lock-free programming rule summary: