Konrad Reiche About Photos Talks

A cautionary tale of cgo embedding

Dave Cheney famously said it:

cgo is not Go. You’re not writing a Go program that uses some logic from a C library, instead you’re writing a Go program that has to coexist with a belligerent piece of C code that is hard to replace, has the upper hand negotiations, and doesn’t care about your problems. With cgo, C calls the shots, not your code.

Today, I had to make this experience firsthand debugging a project that is using github.com/augustoroman/v8, a Go wrapper exposing a subset of v8’s API natively written in C++. In a couple of test scenarios the program would just crash with the following C stack trace:

Received signal 11 SEGV_MAPERR 000000000007

==== C stack trace ===============================

 [0x000000d25164]
 [0x7f0150b43dd0]
 [0x0000004a1787]
[end of stack trace]
signal: segmentation fault

The Go project itself doesn’t use cgo explicitly but implicitly through the library. Assuming this is caused by v8 running into problems with the JavaScript code I kept tracing function calls around the invocation of v8 but without success. Eventually, it turned out the segmentation fault happens on printing the result of the computation: a struct containing a *big.Float. The float itself was a nil pointer. Go’s formatter prints nil pointers as:

<nil>

This discovery led to me to the following minimal example that would reproduce the crash:

var f *big.Float
_ = v8.NewIsolate()
fmt.Println(f)

I started debugging with Delve — to my surprise it was invoking:

func (x *Float) Format(s fmt.State, format rune) {
    // ...
    buf = x.Append(buf, byte(format), prec)

Why does Go even enter methods that clearly require a state in the case of a nil pointer? I assumed cgo completely corrupted the memory space and wreaked havoc in my world. Far from it; as it turns out this is expected Go behavior. The fmt package uses a catchPanic method to print nil pointers. This is the most Go unidiomatic code I have seen so far in the standard library.

func (p *pp) catchPanic(arg interface{}, verb rune) {
	if err := recover(); err != nil {
		// If it's a nil pointer, just say "<nil>". The likeliest causes are a
		// Stringer that fails to guard against nil or a nil pointer for a
		// value receiver, and in either case, "<nil>" is a nice result.
		if v := reflect.ValueOf(arg); v.Kind() == reflect.Ptr && v.IsNil() {
			p.buf.WriteString(nilAngleString)
			return
		}
		// Otherwise print a concise panic message. Most of the time the panic
		// value will print itself nicely.
		if p.panicking {
			// Nested panics; the recursion in printArg cannot succeed.
			panic(err)
		}

		oldFlags := p.fmt.fmtFlags
		// For this output we want default behavior.
		p.fmt.clearflags()

		p.buf.WriteString(percentBangString)
		p.buf.WriteRune(verb)
		p.buf.WriteString(panicString)
		p.panicking = true
		p.printArg(err, 'v')
		p.panicking = false
		p.buf.WriteByte(')')

		p.fmt.fmtFlags = oldFlags
	}
}

That means the panic-recover mechanism doesn’t work anymore once the v8 is initialized. How does it intercept the panic? Normally, a panic should be printed like this:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x751ba4]

A panic is based on the SIGSEGV process signal and C can register its own signal handlers with signal or even better sigaction. The latter allows for accessing previously defined signal handlers. The v8 Go wrapper didn’t register any of those handlers but the v8 C++ project does for debugging purposes.

After 11 hours of debugging I was pretty sure I would need to come up with a crude hack to fix this, luckily it was simply a matter of calling the v8 constructor with the stack trace dumping turned off:

+  v8::Platform *platform = v8::platform::CreateDefaultPlatform(0, v8::platform::IdleTaskSupport::kDisabled, v8::platform::InProcessStackDumping::kDisabled);
-  v8::Platform *platform = v8::platform::CreateDefaultPlatform();

Without this option being exposed by the v8 API I would have been forced to register my own sigaction handler before v8 is initialized, keep a reference of the Go signal handler, and reassign it once v8 is initialized.

What left me with shock after this debugging journey was the fact that importing a Go library with cgo components could completely wreck Go’s runtime behavior without your knowledge. It’s inherently dangerous to have code like this even close to a production environment.

The culprit in this case is nonetheless v8 which is explicitely designed to be embedded into other runtime environments, i.e. other C++ code, Java, etc. A lot of runtimes make bold assumptions about the code embedding them:

All of this is wrong, i.e. programs may process a SIGINT very differently, for instance interrupt the current execution instead of terminating the process and often after SIGTERM finalizers are run to clean up and free resources. Go for instance will print useful debug information after SIGQUIT has been received.

This is only one example why cgo is not Go and why I will always write cgo with a lowercase g.