One of the hardest and exhaustive problems in software engineering are race
conditions. They are sneaky, cunning, and nasty, often changing form, disguised
under many different stack traces. In this article, I’ll present my naive
approach to this problem, why it wasn’t the best option, and what tool can help
to solve the issue in projects written in Go.
The problem
One day, I was presented with the error in the
fork of the socketio library. The
socketio is the library implementing real-time communication between a browser
and a server. In my version of the library, there was a race condition error
that sporadically led to a library-panic.
Debugging it the hard way
When the program panics, the Go interpreter prints the stack trace on the
standard output. The stack trace was my starting point on the hunt for the bug.
A colleague also gave me the reproduction guide and the test repository with a
simple app using the library to separate the problem from a project. The bug
seemed to be connected with the
pingLoop()
and the
NextWriter()
functions. After some testing, I got three unique stack traces and had some
foggy ideas about the root of the problem, but the solution was still under my
horizon. The problem in my debugging method was that I could see only one stack
trace at the time, but race conditions are caused by two or more threads
competing about the access to the shared resource. This quickly led me to use
print statements in many places scattered in the code and lots of logs to
analyze. Gathered logs confirmed my assumptions. At some point, two threads were
fighting for access to one variable. The library was detecting the concurrency
and throwing an error. Sadly this method swiftly became similar to trying to
brute-force the solution of the problem. I had one part of the problem in the
stack trace, but couldn’t find the other half.
The better way to approach
At some point, a colleague hinted me to try -race option of Go.
1
2
exportGORACE=history_size=7
go run -race main.go
Enabling this option was the solution to my problems. It quickly presented me
with much more information than I could hope for. When the -race flag is
enabled, Go carefully watches the code for any variable access, that could
result in race condition. The only issue is, it sometimes appears to not work
correctly, reporting a problem. [failed to restore the stack] But even with
this issue, it allows us to gather very detailed information about the threads
accessing the variable at the same time.
GORACE is the environment variable, where you can supply parameters for
-race option. In this case case, I needed only a history_size parameter.
Increasing history_size allocates more memory for the goroutine stack trace,
but acceptable values are 0..7, so even with maximal history size, stack
restoration will fail from time to time. You can read more about -race
parameters in
the docs.
This tool quickly pointed me to the problematical place in code. It showed me,
that the problem is located in two functions that lacked a mutex shared with the
NextWriter()
function. The solution to the problem was to add mutex to the
Write()
and
Close()
functions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Write writes bytes p.
func (e *PacketEncoder) Write(p []byte)(int, error){
WriterLocker.Lock()
defer WriterLocker.Unlock()return e.w.Write(p)NextWriter
}
// Close closes the encoder.
func (e *PacketEncoder) Close() error {
WriterLocker.Lock()
defer WriterLocker.Unlock()if e.closer != nil {return e.closer.Close()}return nil
}
Summary
Race conditions are quite challenging problems. Solving them may be a lot of
work, but if you know well your toolset, it can get easier. If you are working
in an environment you are not familiar with, look for available tools.
Especially with Go, use -race flag.
If you think we can help in improving the security of your firmware or you
looking for someone who can boost your product by leveraging advanced features
of used hardware platform, feel free to
book a call with us or
drop us email to contact<at>3mdeb<dot>com. If you are interested in similar
content feel free to sign up for our newsletter