Context is for goroutine cancellation

How to stop or gracefully shut down goroutines

Apr 7, 2023

Starting a goroutine is as easy as adding the go keyword in front of a method, but managing the lifecycle of a goroutine is not.

If you only need to start a few goroutines and wait for their completion, you are off the hook thanks to sync.WaitGroup. However, what if a goroutine has to run for a specific duration or repeatatly in a loop until the initiating code terminates?

Does it matter? After all, if the main goroutine terminates, any other goroutine will also stop. It does matter, because depending on what the goroutines are doing, it might leave your system in an inconsistent or invalid state. Channels are a commonly used to signal to a goroutine that it can shut down, but I often see the use of a signaling channel, for example chan bool or chan struct{}.

Why create an additional channel when Go provides a ready-to-use channel for precisely this purpose? Here is an example of a function that periodically refreshes data while also providing gracefully shut itdown in case a refresh is still in progress.

func refreshLoop(ctx context.Context) error {
  ticker := time.NewTicker(500 * time.Millisecond)
  defer ticker.Stop()

  for {
    select {
      case <-ctx.Done():
        return ctx.Err()
      case <-ticker.C:
        // check context here too, because select can be non-deterministic
        if ctx.Err() != nil {
          return ctx.Err()
        }
        refresh(ctx)
    }
  }
}

Instead of defining a separate channel, we can pass context.Context, a common first argument in many modern Go APIs. Context is often misunderstood due to its ability to carry deadlines, cancellation signals, and request-scoped values. But, understanding it allows you to write expressive and concise concurrent code. More importantly, it provides a framework for writing code for incorporating graceful shutdown for any launched goroutine.

Limiting execution time

Context cancellation is also useful without a select statement. Suppose you want to execute code with a time constraint, for example when you can’t guarantee if or when the code will return once canceled. You can wrap the code in a goroutine and block until ctx.Done() returns:

ctx, cancel := context.WithTimeout(ctx, 500*time.Millisecond)
go func() {
  // ...
  cancel()
}()

<- ctx.Done()

If the task finishes within the allotted time, it will cancel the context and unblock the code to continue. If it times out, ctx.Done() will also unblock, thus establishing an upper bound of time spent. I am happy to see that starting in Go 1.20, these concepts are futher expanded with the introduction of context.WithCancelCause.

Cancel cause

If ctx.Done() is closed, ctx.Err() returns a non-nil error explaining why. Before Go 1.20, it could only return Canceled if the context was canceled or DeadlineExceeded if the context’s deadline passed. In the example above, we explicitly cancel the context when the work is done to close the channel. Now, with context.WithCancelCause, we can provide more specific errors that distinguish between a parent context cancelling the task and a cancellation due to the context no longer being needed.

ctx, cancel := context.WithCancelCause(ctx)
// ...
cancel(errors.New("upstream request has timed out"))

The code performing the task can now check the cause of the context cancellation and determine if it is valuable to log.

ctx, cancel := context.WithTimeout(ctx, 500*time.Millisecond)
go func() {
  // perform task
  cancel()
}()

<- ctx.Done()

// only log unexpected context errors
if err := context.Cause(ctx); err != context.Canceled  {
  log.Error(err)
}

Initially, I thought of using context.WithCancelCause to pass a cancellation reason indicating the task has finished, but we should reserve errors for errenous execution paths and not to signal expected outcomes.

Defer your cancellation

Finally, as with any concurrent code that can block, ensure there is a path for recovery in unexpected events. Often production code will make use of recover. If you start a goroutine with a deferred recover, you need to make sure that cancel will also be called, for example, by deferring the cancel function. Otherwise, the statement will never be reached and the code waiting for <-ctx.Done() will block forever in instances where no other code ends up cancelling the context.

Go’s context API can be overwhelming at first glance, but continiously revisiting the package documentation or even better, examining to the source code implementation of it while using context will go a long way.