Please Destroy My Face: Reverse Engineering Scorched Earth’s MTN File Format

What does the program do frequently, and what does it do infrequently?When you develop an application, you design your data structures and serialization techniques to match your use cases; when you reverse engineer, you can infer data structures and serialization techniques from the program’s use cases.

I place special emphasis on this sort of analysis, because later on, we’ll use this technique to make a major intuitive leap, so keep it in mind.

The first thing I did was read through the instruction manual for the MTN file creator.

It lays out the file requirements fairly succinctly:The original for the file may be from any source that will createa file than can be edited and saved as a 16 color .

bmp file.

From this, it’s clear the format was built with a very limited scope, and the MTN file structure likely resembles a 16-color bitmap fairly closely.

If more image formats were supported, or the MTN file creator was more permissive regarding the characteristics of bmp images, I would presume the parser was more robust, or the file format was designed in a more general way.

I continued in this way, reading the author’s site, the documentation, and so on to get as much information as possible.

Many of the details I gathered ended up being irrelevant, but I’ll highlight a few key facts:Scorched Earth is a DOS program.

It was written using Borland C++ and Turbo Assembler.

Scorched earth was developed in the early/mid 90s, while the author was studying at Caltech.

It was not a commercial program, and was released as a nearly full-featured shareware product.

File format likely resembles bitmap (as previously discerned)File sizes for mountain files are significantly smaller than the original filesSo from these facts, I make the following assumptions:File structure is likely 4 parts, Signature, Header, Palette, and Pixel data, to match the bitmap files MTNs are generated from.

As with bitmap files, I expect integers will make up most of the header.

Early versions of Borland C++ and Turbo assembler use 16 bits for ints, rather than the traditional 32.

Therefore, my default datatype assumption will be a 16 bit unsigned int.

Some sort of compression technique is likely at play, as the MTN files are smaller than the source images.

There is little-to-no intentional obfuscation of the file format.

Scorched Earth was not done with explicit commercial intent, it was developed because it was fun and interesting for the author.

Beyond basic shareware restrictions, there was no DRM/copy protection, so there’s no reason to believe the executable or MTN file format is obfuscated.

Bitmapping out a planThe first thing I did was build a toolkit to generate 4 bit/16 color bitmap images.

I’m sure this raises questions, something like “Zach, you seem to be a reasonable person, why would you reinvent this positively prehistoric wheel?”.

The thing to consider about prehistoric wheels is that often, the solutions don’t work in truly prehistoric contexts.

WhoopsThere is meaningful nuance between a bitmap image containing only 16 colors, and a 16 color bitmap.

Bitmaps declare the number of colors available in the header explicitly.

Many popular libraries and image manipulation programs only support saving bitmaps with a palette of 256 colors, regardless of how many colors are actually used.

It makes sense, the only benefit to “true” 16 color bitmaps is saving a few bytes of storage, in exchange for significantly higher complexity in saving the files.

While there is likely some solution, or some well supported library to do this, it didn’t make sense to look too hard.

The format I was going to be reverse engineering was likely super similar to a 16-color bitmap, so it made sense to get deeply familiar with the structure of the file.

So given all that, using the Construct library for Python, I built a very simple schema to parse and construct 16-color bitmaps.

There are two main components of the bitmap file format I want to highlight.

The first is the palette.

Bitmap files contain a palette, which is a sequence of color codes, describing each color in the image.

Color codes are stored in a somewhat strange way, in BGR order rather than RGB.

The palette also contains an alpha value, however, I don’t include that in this schema.

The second interesting component is the pixel data storage.

Pixels are stored as a sequence of integers, each representing a color from the palette.

The palette can be thought of as a zero-indexed array of color codes for this context.

Pixels are stored in terms of rows, padded to a multiple of 4 bytes.

The first row stored is the bottom row of the image, the final row stored is the first row of the image.

If you’re interested in learning more, I highly recommend the BMP File Format article on Wikipedia, it explains everything quite well.

Using the schema above, I was able to create my own test files, and really get started.

Now we’re in business!File Analysis: Signature Detection and Palette AnalysisFinding the file signature is a good way to start reverse engineering.

Often, it tells you something about the design of the parser.

Here, finding it is easy — Every mountain file begins with MT 0xBEEF 256 .

This is very similar to how bitmap files start with BM , which implies my previous assumption about the parser being similar to bitmap files is looking more likely to be correct.

Next, I look for the palette, as it’s a large chunk of data that’s likely to be mapped pretty closely to the version in the bitmap files.

As a refresher, bitmaps use a straightforward space-saving technique, in the form of a color palette.

In the file, all the color values used in the file are listed in the header, and pixels, rather than describing their color explicitly, simply reference an index in the palette.

In an image with two colors, black and white, the palette could be described like0: (0, 0, 0)1: (255, 255, 255)and the pixel data would look something like0 1 0 1 0 1 0 1 1 1 1 1 1In terms of binary layout, bitmap palettes are stored as 4 sequential bytes, each an integer from 0–255, in the order Blue, Green, Red, Alpha.

You’ll notice this BGRA pattern is somewhat reversed from the standard RGBA pattern you may be familiar with.

Let’s take a step back and talk about the mindset required to reverse engineer.

Your goal is to get inside the head of the programmer who wrote the original code.

A helpful proxy is to think about what you would have done if you wrote the original code.

So from that perspective, I know that I’m very lazy, and if at all possible, I would have just copied the palette wholesale, and used some prewritten code or library to parse it.

Applying this theory, I search the binary output for the exact palette output, and come back with nothing.

Okay, so let’s make some more assumptions.

The game has no need for alpha, and if I’m trying to save all the space I can, maybe I just stored the BGR values.

I would expect to find 48 bytes, in 16 triples.

I search again, and still no dice.

So, let’s go with a different scenario.

Maybe, if I was developing the game, I started copying the palette directly, but it ended up being a huge pain to map between the BGR pattern of bitmaps and the RGB pattern of everything else, and just rolled my own RGB storage for the palette.

So I search again, for 48 bytes, in 16 triples, but I swap the B and R values from the original palette.

This time, I get matches in all of my test files — I’ve managed to identify the palette.

At this stage, my schema looks something likeWow!.I already have 2/4 segments identified — I’m sure the rest will be just as straightforward!Header Analysis: Part OneThe header is the next bit of low-hanging fruit.

It’s only 18 bytes long, and I expect most of the critical fields will be lifted directly from the bitmap header or derived from the MTN file itself.

The lowest of the low-hanging fruit in my eyes is the height and width of the MTN file.

Both Scorched Earth and the MTN generator were written in C, which means that the memory for the pixel data had to be explicitly allocated, likely by using the height and width values.

While it’s possible the height and width come after the palette, it’s much more likely that the file header is modeled after the bitmap header, so I restrict my search to the 18 bytes between the signature and the palette.

This assumption pays off — the very first value in the header is the width in every test file.

The height, however, is not present.

It’s not present even when I search the whole file, at least, not in a consistent location.

So, I started looking at the header bytes manually, and I noticed something — there was a value in a consistent position within the header that was the height of the image, minus 1.

I’ll save you some suspense —as far as I can tell, this is indeed the height value, however, there’s no obvious logical reason for it to be decreased by one.

I presume it’s either some memory management quirk/optimization, or an honest-to-goodness bug.

One thing to be careful of when reverse engineering is to not get too hung up on tiny inconsistencies or bits of weirdness.

Sometimes they’re signals you’re missing some bit of complexity or don’t fully understand the implementation, however, just as often, they’re just bugs, or details that aren’t relevant to your analysis.

The programmers who wrote the programs you reverse are as human as you, so try not to waste too much time asking why.

As I was investigating height and width, I also identified two other fields.

First, it seemed there was a constant value of 16.

I can only assume is the color count of the image data.

I assume this is to support a potential expansion to 256 colors at some later date.

Next, I identify a field that contains the size in bytes of the palette plus the size of the pixel data.

This is an interesting one, because it’s stored as a short, just 16 bits, though many of the MTN files distributed with the game have sizes that exceed this limit.

In the case where the value exceeds that of a short, the most significant bits are simply truncated.

This implies to me it’s not meaningfully used in parsing, but I document it anyway.

At this point, I feel I have identified all the fields I can for now, and my schema looks like:The header isn’t complete, but I think it’s time to make some headway on the pixel data, as that’s certainly going to be a long fightPixel Data: Round OneTo start, I get naive — as naive as possible.

I assume the data is represented identically to the bitmap data.

4 bits (also known as a nibble) per pixel, each pixel is a value from 0–15 that refers to an index in the palette.

So, I start with a simple input image:And, it turns out, it comes out looking pretty —Don’t do drugs, kids.

horrific.

This isn’t obvious from the images here, but the dimensions of the image aren’t even right — not even close.

If you recall, I mentioned that the file sizes were often smaller for the MTN files than the source files, but this wasn’t the case on the small images (on the order of 10×10) I was using, they were often larger.

This was curious, and it implied there was some sort of overhead that was only made up when the image was large enough.

At this point, I was fairly lost, so I began another phase of recon.

And by that I mean I played a few games of Scorched Earth.

Intermission: Building a MountainTo figure this out, we’re going to have to understand a bit about how terrain works in Scorched Earth.

Scanned mountains were only added to the most recent version of the game, meaning it’s very likely the MTN file format was built to match the data structure of terrain.

So, how IS terrain stored in memory?.Unfortunately, I can’t just attach a debugger to the process to inspect the memory, and even if I could, if I knew what exactly I was looking for, I wouldn’t need the debugger in the first place.

So, the next best thing is to observe it in action.

This is the technique I mentioned before — sometimes, to understand the data structures of an application, you have to understand how the application uses them.

I play a few games, and pay special attention to how the terrain behaves, and I notice something.

Let’s see if you can spot it:Do you see it?And here, there’s some particularly interesting behavior in the menu:Do you see it now?Terrain operations occur on the vertical axis almost entirely — there is little-to-no horizontal movement, but significant vertical movement in the case of gravity, and in the case of the menu, to draw terrain on the screen.

As previously mentioned, bitmap pixels are stored as rows, and individual pixel values in the pixel byte array are horizontally adjacent.

When pixels are written to graphics buffers, the most common practice is to draw from the upper left corner to bottom right corner.

Thus, the bitmap pixel storage is fairly workable for this use case.

However, this is simply convention — there is no requirement to draw images in that way when dealing with low-level hardware interfaces.

If you wanted to draw your images starting in the bottom left corner, and work your way to the upper right corner, as in the menu above, there’s no technical reason why you couldn’t.

So why would Scorched Earth want to do this?.Let’s take another look at gravity, and destructible terrain.

When missiles explode in Scorched Earth, they destroy some amount of terrain.

Any terrain that was above the now-destroyed terrain has gravity applied to it.

The gravity physics are fairly simple — each pixel of terrain simply moves directly downward until it rests on another terrain pixel.

So, in terms of code, you need to do two operations.

First, check if the pixel below a given terrain pixel is terrain or empty.

Then, if it’s empty, shift the terrain pixel down once, and continue until each terrain pixel rests on another terrain pixel.

While this algorithm can be implemented with horizontally-adjacent pixels, it’s horrifically inefficient.

Why?.Spacial locality!When you access data in a program, the data generally has to be loaded into a CPU register.

Registers check the various CPU caches first, then they go to memory, and lastly, if relevant go to disk or network.

When the data is found, the CPU reads an entire word at a time into it’s cache.

Thus, if you’re reading sequential memory values, then your CPU cache hit rate will be very high.

This, alongside a number of other CPU optimization techniques that take advantage of spatial locality vastly reduce the amount of time it takes to access sequential bytes in memory, on average.

Pixel data in a 2D image is generally stored in the form of a matrix, and matrices tend to be the poster child the importance of respecting spatial locality in algorithm design.

For example, when performing matrix multiplication, sequencing the nested for loops correctly can increase performance up to 5 times.

So, our seemingly simple algorithm for applying gravity can quickly become wildly inefficient if we don’t carefully consider the implications of the data structure.

All of this, put together, leads to a revelation: Sequential pixels in MTN files are not horizontally adjacent, as they are in bitmaps, but vertically adjacent, such that they can take advantage of spatial locality when applying terrain destruction and gravity transformations.

The way terrain is drawn on the menu screen, and the way pixels are stored in MTN files is a simple consequence of this critical use case.

With this earthshaking revelation, we can do… absolutely nothing.

At least, not yet.

While this does give us an idea how the pixels are structured, it does not explain why we’re seeing significantly more nibbles than we’d expect to have pixels.

To solve that riddle, we’ll have to look at some actual data.

Pixel Data: Round Two, in Which I Stare into the Nibbles until they stare back at meWe will start by looking at the pixel data for a simple example image.

GloriousI used this image for a few reasons.

It’s quite a simple design, it has few pixels (the actual image is a 10×10) and the limited number of colors make investigating the data very easy.

If you recall, a nibble is 4 bits, and two nibbles make 1 byte.

A nibble can contain values from 0 to 15, which matches the total number of available colors.

In this example, a nibble value of 0 maps to a black pixel, and 1 a white pixel.

Ideally, we should be able to see the output image structure without actually having to render an image.

The raw pixel data for this image, in terms of nibbles is:0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0The astute among you will immediately see a pattern; nibbles matching 0 X 0 0 .

If we break this into rows, we can see something interesting0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0The rows have fairly regular sizes!.It seems we might be getting somewhere.

The next thing I notice is that the numerical values at the beginning of the rows seem to almost correspond to the following number of nibbles, after 0 8 0 0 and 0 7 0 0 there are 8 nibbles, and after 0 9 0 0 there are 10.

The reason for this is simple — byte alignment.

Recall that these values are nibbles, that is 4 bits.

In general, when you load a binary stream, it’s represented and parsed in terms of bytes.

To this end, you want a single segment of data to end on a byte boundary, rather than a nibble boundary, because it makes parsing the data much, much easier.

So here, every row is padded, such that it ends on a byte boundary.

That means rows with an 8 aren’t padded (8 nibbles == 4 bytes) but rows with a 7 nibbles are padded with 1 nibble, to make 4 bytes, and rows with a 9 are padded with 1 nibble to 10 nibbles, or 5 bytes.

The padding nibbles can be discarded when parsing the rows, as they are irrelevant in terms of image data.

This means, the image data, row by row, looks like:[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0]We’re closer to something reasonable — this is (mostly) a 10×9 matrix.

If you recall when we were discussing the header, the MTN conversion program seems to trim 1 height from the image, meaning that other than those 3 missing pixels in the upper right corner, this is the expected amount of pixel data.

Let’s try to recall way, way, back, at the beginning of the article.

The first bit of recon.

You may recall a message from the MTN generation program output:The Sky must start in the upper left corner of the source file screen.

All of the source file sky must be edited to the same color.

What is sky?.How does sky relate to terrain?Yes I’m reusing this, do I look like I’m made of GIFs?.Don’t judge me.

In the game, sky is the beautiful blue-purple gradient we see in the GIF above.

It is a background that cannot be interacted with in game.

The goal of the MTN generator is to output, well, mountains!.Thus, it has to differentiate land (which tanks can drive around on and destroy) with sky (where such things are illegal).

Thus, the instruction in the readme exists so that all sky pixels can be excised from the BMP file as the MTN file is generated.

In the game, rendering must be a three-step process.

A matrix is first initialized with a sky gradient, the terrain is drawn on top of it, and lastly, it is painted to the screen.

So, while our 10×9 matrix is, in our eyes, missing 3 pixels, in the game, all this means is for those 3 pixels, the background will be visible.

To make a small leap, I can say the 3 missing pixels map to the upper-left corner of the source image.

While logically, I would say that, based on the source image, only a single pixel should be removed as sky, that being the upper-left corner pixel, there may be a bug that I do not understand, or a more complex algorithm in play.

In any case, I can simply fill those values in with dummy values to complete our matrix.

It won’t match the source exactly, but it will be close enough for our purposes, and likely won’t impact larger test cases much at all.

With this, we can once again update our schema.

I’ve largely breezed over the Construct-specific syntax, as it’s been largely straightforward up until this point, but I want to highlight the pixels structure now:"pixels" / Array( this.

width, FocusedSeq( "items", "count" / Rebuild(Int8ul, len_(this.

items)), "padding" / Padding(1) "items" / Bitwise( Aligned( 8, Array( this.

count, Nibble ) ) ), ) )In english, this represents and Array of length width, containing sub-arrays.

The sub-arrays are made up of Nibbles, padded such that they reach the byte boundary, as we discussed.

The sub-arrays’ size is determined by the first byte in the sequence.

It is interpreted as an int, and the byte after it is interpreted as padding.

BMP ExtractionFinally, we’re ready to begin extracting BMPs from MTNs using the pixel interpretation logic described above.

and we’ll be using an aggressively blue color to highlight the sky so we can validate the correct pixels are getting filled in.

Initial examples look largely fine.

There are two major transformations to perform:Rotation, as evidenced by the picture on the rightMirroring, less clearly.

The mirroring is subtle, but it has to do with how bitmap pixels are stored.

Inexplicably, bitmap pixels are stored bottom to top.

That means the first pixel in a bitmap pixel array maps to the coordinates (height, 0).

However, MTN pixels are stored top to bottom, so if I simply rotate it without first performing a vertical mirror, the image will end up facing the incorrect direction.

With these transformations, things come out looking lovely.

Picturesque!Except…Oops.

Not quite.

There’s clearly something that I missed during the pixel data analysis.

Thankfully, this is a fairly simple fix.

If you recall, pixel rows were prefixed by something like 0 X 0 0 , which I interpreted as an 8-bit int, followed by 8 bits of padding.

However, what that means is it can only handle rows up to 255 pixels.

If a value looked something like 0 8 0 1 , meaning the parser should expect 264 pixels, our previous parsing routine would partially corrupt the image.

So, if we instead interpret those bytes as a 16-bit integer rather than an 8-bit integer, plus 8-bits of padding, and we get this beautiful image:All better!And with that, I’ve managed to extract a bitmap from a MTN file!At this point, I had managed to divine the values of a few more header values, so with those, and the fix to our pixel schema described above, we land on our final schemaHeader Analysis: Round Two — Giving UpNow we reach the limitations of this sort of analysis.

I strongly suspect the remaining unknowns have no impact on parsing, and are either entirely unused, or represent metadata useful for debugging only.

If we were using disassembly, we could see when (if ever) these files were written, but we don’t have access to that data.

Forcing the remaining unknowns to 0 doesn’t have any impact on parsing in-game, so for now, I simply ignore these.

If I figure anything else out later, I’ll add an addendum.

Addendum: Asking the AuthorAfter I completed the analysis, I sent an email to the original author, Wendell Hicken, for him to confirm my assumptions about the remaining unknowns.

He gave me two interesting pieces of information.

First, the constant 256 is actually a constant 1 (it’s interpreted as big endian, rather than little endian), and represents a version of the MTN file, in case other versions were added later.

Secondly, he confirmed the remaining unknowns in the header are unused junk data.

My assumption is these were likely allocated to allow future expansion or iteration on the file format.

This isn’t uncommon, as it simplifies the parser significantly.

With this, we have completely documented the file format!Tying it all together: Time to Destroy my faceOnce I had a successful routine to extract BMPs from MTNs, it was a simple matter to take a BMP, reverse the transformations, and begin generating MTN files.

ConclusionIn the end, there’s not much more to say.

The goal was terrifically dumb, but the investigation and resulting tool were quite interesting.

If you’re interested what I’ve discussed here, the code is available on github for you to peruse.

Thanks for reading, and I’d encourage everyone to play some games of Scorched Earth, and destroy my face.

.

. More details

Leave a Reply