Crash early and crash often for more reliable software

Like the null check in the previous section, an assertion failure is trivial to debug.

The fix may either be to remove the assertion and add handling code, or to make the calling code adhere to the assertion.

Using code coverage and style guidelines to limit error handling complexityAlthough code coverage is an imperfect tool (a high coverage percentage does not necessarily mean a program has quality test coverage), enforcing a very high level of coverage is very useful in yielding liberal use of assertions and limiting error handling to only what is necessary.

In the Envoy error handling guidelines we write:Tip: If the thought of adding the extra test coverage, logging, and stats to handle an error and continue seems ridiculous because “this should never happen”, it’s a very good indication that the appropriate behavior is to terminate the process and not handle the error.

When in doubt, please discuss.

In essence, we force all error branches and assertions to be covered by tests.

If the programmer feels this time is wasteful it’s a useful forcing function for code logic simplification.

Ownership semantics used to prevent crashes can lead to complexity and bugsOne other area in which I often see extra complexity and bugs added in order to ostensibly prevent crashing is in object/data ownership semantics.

Fundamentally, there are three different ways data can be allocated and tracked in a program:StackHeap with a single owner (e.

g.

, std::unique_ptr<> in C++, standard borrow checking in Rust, etc.

).

Note that in many popular languages this ownership type is not available in practice because all heap allocated objects are reference-counted and allow for possible unintentional sharing (e.

g.

, Java, Python, JS, Go, etc.

).

Heap with multiple owners (e.

g.

, std::shared_ptr<> in C++, Java, Go, Python, Rust shared pointers, etc.

).

Stack allocation is relatively simple and easy to understand so I’m going to primarily discuss how (2) and (3) relate to crashing early and code complexity.

At a high level, code which uses heap data with a single owner is substantially easier to reason about than code that uses reference-counted data.

A single piece of code allocates data and a single piece of code frees it.

Very simple.

The alternative is shared ownership.

The use of shared ownership can make code extremely difficult to reason about.

How and when will an object be freed?.Will there be any memory leaks due to circular references?.(Somewhat ironically, I’ve seen many more production memory leaks in software written in Java and Python due to circular shared references vs.

well written C++ that makes heavy use of single owner semantics.

)The downside of the single ownership approach is the ease of creating “use after free” situations in C/C++.

Rust avoids “use after free” entirely with the borrow checker, while still allowing for single owner semantics.

This is incredibly powerful from both a correctness perspective as well as a single data owner perspective and I look forward to the day in which most code is written in languages with Rust-like semantics.

That said, given that the majority of code in the world is still written in C/C++, Java, Python, JS, and similar languages, I will continue this discussion from that viewpoint.

In C/C++, a “use after free” crash can sometimes be difficult to debug (again making Rust borrow semantics very enticing from a productivity standpoint), but it is very clearly a sign that the program crashed and an invariant has been violated.

The alternative that I sometimes see tried in C/C++ is favoring Java/Python-like shared object ownership in an effort to avoid these types of crashes.

The thinking goes that if an object is never freed while there is a reference to it, the program will never crash.

Yet in my experience this inevitably leads to greater code complexity and more bugs due to circular references, hard to reason about logic, etc.

Only use shared memory ownership when the program logic actually calls for it.

For the same reasons that I advocate for limited error handling and extra assertions above, using single owner semantics is preferred.

In Rust, the compiler will verify correctness.

In C/C++ the compiler will not, but letting the program crash and fixing the uncovered invariant violation is far preferable to introducing needless ownership and code complexity in an attempt to avoid crashes of this type altogether.

For languages that do not allow for explicit single owner semantics, I recommend aggressively setting references to null when no longer in use.

This reduces the chance of circular references and should make effective use after free issues more clear.

ConclusionLimiting software complexity is one of the primary mechanisms available to us to limit defects.

Very often, invariant violations that cause a fatal crash are substantially easier to debug and fix than complex code that attempted to prevent the crashes in the first place.

Specifically, I recommend using the following three techniques for limiting error handling and code complexity:Limit error handling to only errors that can actually happen during normal control flow.

Crash otherwise.

Liberally use assertions to document invariant state and crash if violated.

Use single owner data semantics if at all possible to limit code complexity, and if doing this using C/C++, let the program crash if an ownership invariant is violated.

The three strategies above will limit code complexity and generally yield bugs that are more obvious and easier to fix.

.

. More details

Leave a Reply