In this somewhat futile experiment of trying to bring a new story to the caped crusader, and learn about recursive neural networks & long term short term memory networks along the way.
A very short Recurrent Neural Networks explainerAs to not re-tread well explained topics, there are a number great resources to get a good foundational understanding of what is happening in a neural network, but here is a brief look at the difference between them and the recurrent variation.
A traditional neural network will take in a series of inputs and after computation provide outputs, fairly straight-forward.
It creates a static model based on the information it used to create it.
In contrast, a recurrent network creates a dynamic model that uses feedback from the previous computation to help determine the next.
Essentially, they make a memory in a sense that is fed back through the model to help improve it, as the above image displays in both short form (left) and an unfolded long version (right).
Recurrent networks perform especially well when working with information that relies on understanding a specific series, like text.
Though you may dread it’s outcome sometimes, text-autocomplete is a form of recurrent network.
It is not just basing the choice of next word on a single previous word, but potentially the past few words, building on the relationship it knows between them.
Try to fill in the blank for the following exchange:Batman breaks into the warehouse and he shouts “Stop!”.
The Joker laughs and says “Never _____ !”Though you may have ideas, that blank could be pretty much any word.
With the additional context found earlier in the sentence, we may guess that he says “Batman” or “Bats”.
It is this type of flow of through a sequence that is where the recurrent network can shine.
Moving to LSTMFrom Understanding LSTM NetworksWe can run into a major issue with recurrent networks where factoring in those additional steps that can have the gradient sink or explode (essentially, too much multiplication of these tiny fractions can really muck things up).
Recalling everything that came before can be too dense, but we do want to use some of what was learned.
Long short-term memory units (known as LSTM) are a form of logic to help combat this issue.
LSTMs create a separate memory that is stored and temporarily used along the calculation.
When deciding a new output, the model takes into account the current input, the previous output, AND the previous memory it stored .
Once it generates the new output, it then updates the memory.
You can imagine how useful something like this may be when working with text, which relies heavily on following the flow of words.
For another great resource on walking through the internal function of this process, follow the link in the caption of the image.
Let’s get into generation!Text generation can work on either a character or a word level.
Once trained on a set of input data, it attempts to learn a pattern so that it can effectively predict what will come next.
Similar to any numerical model that tries to find the best output based on the input features, the model has words or characters stored in a numerical context that it can understand and interpret.
As you may guess, the problem can be that the numerical representation of words or characters may not always create a very logical flow of text.
For this exercise, I tried a few different approaches to get things right.
My input:Collection of Wikipedia plot summaries for all Batman movies, comics, and graphic novels (49 in total).
Recurrent neural network models that I’m attempting:Character-level LSTM modelWord-level LSTM model with pre-trained word vectorsGPT-2 ModelCharacter-level LSTM modelThis method works by indexing by individual letters, learning the patterns of how each word is formed.
To start, we take our text , break it up by individual characters and create an index for them:chars = sorted(list(set(all_plots)))print('total chars:', len(chars))char_indices = dict((c, i) for i, c in enumerate(chars))indices_char = dict((i, c) for i, c in enumerate(chars))We then break the whole text into chunks, mimicking sentences:maxlen = 100step = 3sentences = []next_chars = []for i in range(0, len(all_plots) – maxlen, step): sentences.
append(all_plots[i: i + maxlen]) next_chars.
append(all_plots[i + maxlen])print('nb sequences:', len(sentences))From there, all we need to do is vectorize these sentences with the character indexes we created:x = np.
zeros((len(sentences), maxlen, len(chars)), dtype=np.
bool)y = np.
zeros((len(sentences), len(chars)), dtype=np.
bool)for i, sentence in enumerate(sentences): for t, char in enumerate(sentence): x[i, t, char_indices[char]] = 1 y[i, char_indices[next_chars[i]]] = 1Viola!.The details are all set, all we have to do now is create our LSTM model and run:model = Sequential()model.
add(LSTM(128, input_shape=(maxlen, len(chars))))model.
add(Dense(len(chars), activation='softmax'))optimizer = RMSprop(lr=0.
01)model.
compile(loss='categorical_crossentropy', optimizer=optimizer)Results:Not great.
I did not have the time or processing power to run this model more than a few hundred epochs, and at the end of the day, I still ended up with garbled letters, and the occasional proper name.
Our text, though not a book, is still 211,508 characters long and that is pretty dense to process.
It became really good at using formal character names, but that is the bulk of its success.
Here is an example of the type of text returned with just the input prompting of “Batman”:And the bat-signal, killed by a nemperpored of the murder of his father s bomb a Batman, he seeping in batman to attack dark knockshous up Batman and prompts the criminal artic and he is he all of the underal the Batcave, Gordon in the standia, the story of the head Harvey Dent and is unfils have story aissaveral to all be were stance and he will murce that his father in the sulveit by the storyGIBBERISH!!!Word-level LSTM model with pre-trained word vectorsNext, I moved up to a word-level LSTM model.
Where I previously had broken up the text by individual characters, maybe it would have an easier time working with just words.
To also help my chances at coherent phrasing, I included the help of pre-trained word vectors.
What the pre-trained vector does is have the particular embedded weight for words as a starting calculation and working from there.
We may not know how the phrasing should go, but we do know the relationship between some words, like “hot” would not be highly correlated to “ice” .
So, we start by bringing in that pre-trained model:word_model = gensim.
models.
Word2Vec(sentences, size=100, min_count=1, window=5, iter=100)pretrained_weights = word_model.
wv.
syn0vocab_size, emdedding_size = pretrained_weights.
shapeAnd similarly build out our features:max_sentence_len = 40sentences = [[word for word in sen.
split()[:max_sentence_len]] for sen in all_plots.
split('.
')]def word2idx(word): return word_model.
wv.
vocab[word].
indexdef idx2word(idx): return word_model.
wv.
index2word[idx]print('!.Preparing the data for LSTM.
')train_x = np.
zeros([len(sentences), max_sentence_len], dtype=np.
int32)train_y = np.
zeros([len(sentences)], dtype=np.
int32)for i, sentence in enumerate(sentences): for t, word in enumerate(sentence[:-1]): train_x[i, t] = word2idx(word) train_y[i] = word2idx(sentence[-1])And off to training!ResultsSo-so.
I let it run for 500 epochs, but that is still not enough to get close to some good coherent phrasing.
Are the results I got more fun and slightly more usable, absolutely?.Do they sound anything like natural language!.Not even close.
This should make sense, our model gives it the benefit of word relationships in a vector space and learns from the structure of the plots, but not of that relates to proper grammar.
More epochs may start to get closer (I’ve seen several that run for a few thousand epochs and get closer), but I haven’t the time.
Some of the highlights, input text in bold:Batman fights to castle aircraft barely brutality more.
Batman reveals Christmas halfway moments audience Robin, arrive escape insanity torches structure.
Joker tries to toxin masked long-abandoned confirmed commitment attached women orphanage introducing brother.
Batman fights spies teacher, works Dr Switch pass, sanction coffin levitates it?Ha!.Close.
But, no.
OpenAI GPT-2 ModelWhen regular training, and adding some basic pre-training vectors to our model aren’t enough, you’ve got to bring in the big guns.
OpenAI GPT-2 is a the generative pre-trained Transformer from OpenAI.
It is fairly fast if up and running on a GPU, and its ability to mimic language is very impressive by comparison.
The level to which it gives itself away varies by text.
If you’d like to take a quick run at it, check out Talk to Transformer.
For this example, I used gpt-2-simple so that I could easily fine-tune my model.
ResultsWow?.This is some wonderful stuff.
Is it entirely original!.No, the text lends heavily from phrasing found in the original text, so it may be a overfit at times.
The good news is that it is in a fairly readable form and a delightfully weird mix of storylines.
These snippets almost read like scripts for movies from people who read the Wikipedia of different comics, so fairly accurate!Sample from 400 Epochs in:Sarah narrates the story of how before her marriage to Jason, she killed her abusive husband and took up the mantle of Robin.
Although initially offended by the title, Sarah later comes to accept it, even though it was actually Batman who committed the murders.
The story begins after Robin infiltrates a high-level military conspiracy convened to address the deteriorating security of world governments following the death of long ago military general James Garvin.
Members of the conspiracy include: Joker, Sorcerer, Witch Doctor, Satanic Ritual-Bearer, and their leader, Alan MacGregor.
Joker leads the conspirators in an attempt to seduce the female conspirator, who is played by an unknown actor.
The female states that she does not believe in vampires, but rather in a supreme being known as the Joker’s human, who was responsible for the deaths of hundreds of innocents at his hands.
Now were talking?.Look at that list of conspirators.
Mastermind Alan MacGregor!!??.I looked in the original text and the names “alan” and “macgregor” were used, but were not a singular proper name.
Sample from 800 Epochs in:Once again, Batman and Robin make quick work of the Syndicate.
Both raid the lair of the Syndicate s leader, Mr.
Freeze.
Batman swings into the fray via skylight, while Jason Todd and Scarlet head towards an unknown destination.
Meanwhile, Mr.
Freeze, still wearing body armour, bores clement tortured and chained to a tree in Arkham Asylum, as Scarlet recovers from her ordeal.
Batman and Robin begin to have crime fighting relationship problems because of Freeze’s obsession with Scarlet, but Bruce eventually convinces Dick to trust him.
Bruce ponders kidnapping Scarlet to study, but decides against it because it would expose Batman to the greatest danger that he has faced.
Later, Batman boards a Gothic-looking ship and kidnaps Scarlet away from Gotham.
He later talks with her and promises that he will support her in her studies, but only if she follows his orders.
Weird controlling Batman!.Interesting.
Sample from 1600 Epochs:At the Batcave, Bruce announces to the world that he has abducted and murdered Gotham’s warden, and that Gotham is his oyster.
He states that Gotham’s law enforcement should be afraid, and that the only way to keep them safe is to kill each other.
The three decide to go after Mr.
Freeze together.
Bruce invites Gordon and Fox to meet him in the Batcave.
The trio confront Mr.
Freeze, and he admits that he has come to kill them all.
Mr.
Freeze tries to convince Bruce that he is powerless against Bruce’s plans, but Bruce proves him wrong.
Getting kinda dark.Not a surprising bit for a character who trends that way.
With results like these, I don’t think bots are coming for writers jobs anytime soon, but a fun experiment in language modeling.
Now time to celebrate with a quick batusi!Thanks for reading!.