A case about parsing errors

So this is my first blog post, and I would like to start making a point about parsing errors, as this weekend I was refactoring the error handling architecture in edge264, a software decoder for the video codec H.264/AVC.

Decoding a multimedia stream is a lot of parsing (easily half of the codebase). You receive a bitstream from which you extract a sequence of values, compressed using e.g. Exp-Golomb or Arithmetic coding. These will form a header first, then the actual data, which after a few more stages in decompression yield pixels. But what happens if the bitstream contains an error?

Know your enemy

To my best guess, there are three kinds of errors to expect:

In practice, software such as ffmpeg dedicate a decent amount of code (search for AV_LOG_ERROR) and documentation to error handling, that is providing an alternate code path in case of erroneous input, and keeping a defined behaviour afterwards. However, they rarely provide a clear strategy for error reporting, that is what should be presented to the end user when something is wrong, and which errors do trigger this behaviour.

The common default choice is to provide an error code for every wrong input one can think of, and optionally print out an error message when it is caught. It then becomes of programmer’s responsibility to handle all of them, no matter how frequent they might be in practical streams. In software such as VLC, these messages are eventually forwarded to the user through an Error Console.

However, in the context of a multimedia stream, users have no power to fix an incoming bitstream by hand, should any of the above errors actually happen. Their sole ability is probably to move the antenna and catch a clearer signal — if and only if they have access to it. For these users, rendering a black frame is probably just as good as printing [h264 @ 0xba7bc00] illegal bit depth value (28, 8).

Moreover, transport protocols and containers provide mechanisms for detecting errors and possibly recover them. The TCP network protocol reorders incoming packets, and can request resending a missing one. The MPEG2-TS multimedia container signals corrupted packets, and is augmented with error detection/correction for digital TV broadcasting. Thus in practice most glitches are detected upstream, such that very few bitstream errors practically make it to the audio/video decoders.

Single out the uses

Let’s proceed by identifying the few uses of a multimedia library, before trying to derive a good error reporting strategy:

With these in mind, we can see the problem with competing open source H.264 decoders (for Cisco OpenH264, search for WELS_LOG): they try to match a bit of all uses, without being a perfect fit for any of the three. There are just too many errors a user cannot understand, yet not all irregular inputs are reported.

To settle this I want to make a distinction between a validator and a parser. The validator takes a possibly erroneous input stream and reports everything wrong, without actually decoding it. Its output is meant to help fixing the encoding library, or the stream itself, and its goal is to guarantee any accepted stream will comply with the target specification. The parser in turn takes a possibly erroneous stream, but only bothers decoding it for end users. Error reporting is minimal, if not present at all, but the parser should try to recover gracefully from a bad stream because no user will want to fix it.

In edge264 I decided to focus on parsing (search for e->ret) and completely avoid validation, it being a very difficult problem in fact! All input values are silently clamped to the bounds specified in H.264, such that one can always expect a correct internal state. The few mandatory tests for unsupported features are collapsed into common exit branches (5 in total). Also in earlier versions I had no error reporting at all, but that could result in a video player not knowing when to stop trying to decode a flawed stream. Hence the presence of two error codes (for unsupported and erroneous stream).

Conclusion

Ruling out validation from parsing (or explicitly going for both) helps making a clear commitment to which errors are tested, and which are ignored. In the case of edge264 it helped me focus on the rest of the decoding, and put less constraints on code structure (function return values, trace output, no pesky goto fail branches). Further usage will show how it fares in practice, and whether more errors should eventually be reported. As for the actual details of a simple code architecture, these are left for later articles 🙂

Also, for coherency in this post I remained in the scope of video decoding, but am pretty much convinced the separation of validation and parsing should also exist in other domains, especially with man-made streams, such as code compilation and HTML parsing.