Creating AI for GameBoy Part 2: Collecting Data From the Screen

If you missed Part 1: Coding a Controller, click here to catch up.

In this edition, I will be going over how to intelligently get information out of the game through various image processing and classification techniques.

This is important to any game AI, but crucial for its application in Fire Emblem due to how the game is played entirely through decision making.

The reason that getting information out of a game is so important is that we will be training a machine learning algorithm to play, and since ML algorithms require datasets to learn, we need to be able to generate data.

With that in mind, we need to know which information we need in order to establish our features and targets.

Feature data should be represented as game states and actions taken while targets should be our metric of how well we are playing the game.

As with before, the code in its current state can be found on my github.

This task requires a handful of libraries, so I think it is worthwhile to briefly go over which ones are useful for which tasks:Python Image Library (PIL) — Used to screencap/provide the imagescv2 — Used to process images and convert colorspytesseract — Used to obtain text out of imagesKeras — used to train a model to obtain numbers out of imagesScikitlearn — used to train a model to obtain numbers out of imagesThe basic workflow of the image processing tasks follows the following order:Take a large screenshot with PIL (on my old laptop, the function is prohibitively slow at ~0.


75s)Subset the image as necessary with a function outlined below (If PIL wasn’t so slow, multiple screenshots would be just fine)Convert colors and pad as necessary with cv2, padding functionApply image classification models to the processed image to obtain dataThe function below is what I used to take a subset of an image taken with PIL.

While it might be faster to use slicing methods on the list, the difference is negligible when compared to the 0.

6+ seconds it takes for the PIL ImageGrab function to take our initial picture.

def subscreen(x0,y0,x1,y1, screen): sub_img = [] for i in range(y0,y1,1): row=[] for j in range(x0,x1,1): row.

append(screen[i][j]) sub_img.


array(row)) sub_img = np.

array(sub_img) return sub_imgWe need to know the name of our character: LynFor these two pictures, we are most interested in the text data — the name of the character we are about to use and the options that are available to us.

For this, pytesseract is an invaluable tool.

Google’s tesseract is an open-source Optical Character Recognition (OCR) tool released in python under the name pytesseract.

Usage of this library is incredibly simple, we can obtain the text ‘Lyn’ from the blue pane in the top left with one line of code (top) and the options presented to us in the second picture (bottom):#for black texttext = pytesseract.

image_to_string(image)#for white texttext = pytesseract.

image_to_string(inverted_grayscale(image))pytesseract can tell us what options we haveThe processing function inverted_grayscale may vary from case to case- in this case it turns the picture gray and inverts all of the pixel values, turning the white pixels black and vice versa.

Cv2 has great color change functions that could be substituted for the processing function I used, which I wrote for this specific case, but the real magic is in what pytesseract enables us to do.

Pytesseract allows us to forgo creating a labeled image dataset, training a classification model, and otherwise reinventing the wheel, which is a huge time saver as we are about to see…The ‘Status’ screen shows the current game stateThe following two images represent the game state and character stats, respectively.

Full of numbers, there is a lot of information to be gained from these screens.

Ideally, we would just use the pytesseract code from above, but sadly these block letters give the package some trouble.

After labeling multiple pictures for a dataset and training a handful of convolutional neural network models (CNNs) in Keras, I decided to double back to the rudimentary sci-kit learn package to see if the slow-training CNN was truly necessary to detect which images corresponded to which numbers.

With the first try at a Logistic Regression model without any hyperparameter manipulation, I was able to achieve 99.

5% accuracy — presumably the 0.

5% came from a mislabeled image.

Confession time, I mislabeled 2 out of 500 images… showing just how good the model truly is.

The actual usage of the code for data extraction is below with edits for clarification:#using logistic regression model 'block_reader'#padder function adds black to outline if image is too smallstats['str'] = block_reader .


reshape(to fit model))[0]The reason the more unsophisticated model is so successful is due to the small image size, distinct differences in classes, and consistency within each class.

To unpack that a little further, it is much more conceivable that a logistic regression model can handle a 30×24 image than something like 400×600 due to the relatively small number of features (pixels) in each observation.

The differences in classes allow different pixels to hold different importances in classification.

The consistency within each class is another factor leading to the success of an out-of-the box model because of how the model’s testing data is all remarkably similar to, if not exactly what it has seen before in the training data (which is why we can use training accuracy as an evaluation metric).

Now, with both Google’s and our own trained model, we are ready to gain information from Fire Emblem.

All we need to do is take the picture, process it, and apply our model to obtain data and incorporate that into the gameplay process!.

. More details

Leave a Reply