I recently read this paper titled, Understanding Real-World Concurrency Bugs in Go (PDF), that studies concurrency bugs in Golang and comments on the new primitives for messages passing that the language is often known for.
I am not a very good Go programmer, so this was an informative lesson in various ways to achieve concurrency and synchronization between different threads of execution. It is also a good read for experienced Go developers as it points out some important gotchas to look out for when writing Go code. The fact that it uses real world examples from well known projects like Docker, Kubernetes, gRPC-Go, CockroachDB, BoltDB etc. makes it even more fun to read!
The authors analyzed a total of 171 concurrency bugs from several prominent Go open source projects and categorized them in two orthogonal dimensions, one each for the cause of the bug and the behavior. The cause is split between two major schools of concurrency
Along the cause dimension, we categorize bugs into those that are caused by misuse of shared memory and those caused by misuse of message passing
and the behavior dimension is similarly split into
we separate bugs into those that involve (any number of ) goroutines that cannot proceed (we call themblocking bugs) and those that do not involve any blocking (non-blocking bugs)
Interestingly, they chose the behavior to be blocking instead of deadlock since the former implies that atleast one thread of execution is blocked due to some concurrency bug, but the rest of them might continue execution, so it is not a deadlock situation.
Go has primitive shared memory protection mechanisms like Mutex
, RWMutex
etc. with a caveat
Write lock requests in Go have ahigher privilege than read lock requests.
as compared to pthread in C. Go also has a new primitive called sync.Once
that can be used to guarantee that a function is executed only once. This can be useful in situations where some callable is shared across multiple threads of execution but it shouldn't be called more than once. Go also has sync.WaitGroups
, which is similar to pthread_join
to wait for various threads of executioun to finish executing.
Go also uses channels for the message passing between different threads of executions called Goroutunes. Channels can be buffered on un-buffered (default), the difference between them being that in a buffered channel the sender and receiver don't block on each other (until the buffered channel is full).
The study of the usage patterns of these concurrency primitives in various code bases along with the occurence of bugs in the codebase concluded that even though message passing was used at fewer places, it accounted for a larger number of bugs(58%).
Implication 1:With heavier usages of goroutines and newtypes of concurrency primitives, Go programs may potentiallyintroduce more concurrency bugs
Also, interesting to note is this observation in tha paper
Observation 5:All blocking bugs caused by message passing are related to Go’s new message passing semantics like channel. They can be difficult to detect especially when message passing operations are used together with other synchronization mechanisms
The authors also talk about various ways in which Go runtime can detect some of these concurrency bugs. Go runtime includes a deadlock detector which can detect when there are no goroutunes running in a thread, although, it cannot detect all the blocking bugs that authors found by manual inspection.
For shared memory bugs, Go also includes a data race detector which can be enbaled by adding -race
option when building the program. It can find races in memory/data shared between multiple threads of execution and uses happened-before algorithm underneath to track objects and their lifecycle. Although, it can only detect a part of the bugs discovered by the authors, the patterns and classification in the paper can be leveraged to improve the detection and build more sophisticated checkers.