Bewildering Brain

Not unless you mail them from Desolation Row.

Like its obvious, the lyrics, even though they have a clear Dylanesque style, feel like a cutout of reality.

Like a copy-paste of his work, a collage of different verses to create a new song.

Many phrases are identical to the ones written by Bob.

It was to be expected, as when we search for higher readability while using bigrams, we also reduce the variance in the words predicted.

As a result, we get 3 or more words that come from the same verse.

Using unigrams isn’t the answer either since the meaning would be lost in not respecting the syntactic and morphological order of the words: it’d be a word soup in random order.

Markov chains bring the advantage of being easy to implement and using less variables to predict results: but it came hand by hand with bad predictions.

To counter this, I jumped to a more complex model: a recurrent neural network, or RNN.

Another detail worth mentioning: the algorithm will continue predicting, no matter the length, until reaching the end of the chain (the tag <END>).

This means that a run of the algorithm can output 2 verses and the next one 50.

“There must be some kind of way outta here” — Said the joker to the thief.

RNN (Recurrent Neural Network)Recurrent networks are a type of artificial neural network that recognizes patterns in data sequences like text, genomes, handwriting, oral language or numerical time series originating from sensors, stock market or governmental agencies.

These algorithms take time and sequence into account; they have a temporal dimension.

In comparison to Markov Chains, recurrent networks have memory.

To understand recurrent networks, first you need to understand the basics about common networks: feedforward ones.

Both types of networks are called neuronal because of the way in which they channel information through a series of mathematical operations which are carried out in the nodes of the network.

One carries information until the end without touching the same node more than once, while the other runs cycles over the same network.

The latter one is called recurrent.

In the case of feedforward networks, feeding them with input you get an output.

In supervised learning the output would be a label.

This means data gets mapped raw to a category recognizing patterns which decide, for example, if an image used as input can be classified as cat or dog.

Recurrent networks, on the other hand, take as input not only the current example being seen, but they also consider what has been perceived previously in time.

The decision taken in timet-1is going to affect the one being taken a moment after, at time t.

So recurrent networks have two sources of input: the present and the recent past, which combine to determine how to respond to new data — just like a human brain.

We need to also analyze the concept of Long Short-Term Memory (LSTM) to finish understanding the complete process.

¿What would happen if I feed my network all the songs by Dylan and tell it to finish the title of this song?Subterranean Homesick ….

We know the next word is Blues (if you don’t know the song, it’s of the utmost importance you listen to it now).

The network won’t know, as this information doesn’t repeat many times, or its occurrence is not close enough to be remembered.

For humans, it’s pretty obvious and intuitive that if a phrase appears in the title of a book, it must be an important part of the plot; even in cases when it’s the only time it appears.

Contrary to a RNN, we would definitely remember.

Even though they fail to remember, there’s techniques like LSTM networks which handle this kind of situations in a successful way.

While RNNs remember everything up until a certain limited depth, LSTM networks learn what to remember and what to forget.

This allows LSTM networks to reach and use memories that are beyond the RNNs range.

Memories that because of their perceived importance where remembered in the first place by LSTM networks.

How do we achieve this?.Recurrent networks, in general, have a simple structure where modules are repeated, where data flows through.

The simple ones are commonly created with layers which use simply hyperbolic tangents (tanh) as activation functions.

On the other hand LSTM networks come with a more complicated structure; combining these with several other functions, among them sigmoids.

They not only count with input and output gates, but also a third gate which we could call forget gate.

It receives if the information is worth remembering.

If negative, the information is deleted.

How does the decision making work?.Each gate is associated with a weight.

For each iteration, the sigmoid function is applied to the input and as output we receive a value between 0 and 1.

 0 means nothing gets through and 1 means everything does.

Afterwards, each value of each layer gets updated with a mechanism of back-propagation.

This allows the gate to learn, with time, which information is important and which one is not.

To implement all this, I based myself on the text-predictor from Greg Surma.

I implemented some small changes to the model, adapted it to Python 3 and played a little bit with the hyperparameters until I got satisfactory results.

The model is character-based: all the unique characters were calculated with their respective frequencies.

The tensors were created replacing each character with their indicated frequency.

The length of the output is predefined by a parameter.

In this case it’s fixed.

For more detail, you can check my code in my GitHub account.

alexing/lyrics_predicitonComparing Markov Chains with RNNs on text prediction — alexing/lyrics_predicitongithub.

comEnough with technicalities: let’s check the results!The interesting point in this is not only seeing the final result, but the learning process the algorithm had in each iteration.

How it went from a concoction of characters to formed verses in just a couple cycles.

We can also appreciate the learning curve, where we can regard at how the loss function gets minimized until it gets established around an asymptotic value near 0.

6 in less than 90k iterations.

Iteration 05p4:HStdgoTxFtBy/IBBDJe!l5KTHldE42:lt(-cdbA2lB2avwshp-w,M)cKyP]8 1arOsfsJSAcWU6sU6E"JV54X9bfrxFV1EEnajkozd'Tk3iQkuUp02oekrQI-,,rAt-(PyfE6z-v7]8utBnD/Nxv:m;0Mw!)cYbnugqo7t MXQhnq?X7qBTgKp9dJAojO2.

87cN?:/0SJq:kBSyKDaj5G0"U466;y8'7cxNLsYXVTUIxqbW0i0bZh8okns) Hf2?2R2hxddb;zXfv3J4iLfv-qOK4y[gaQuImW!XUyyBch9)GgcFB5f[Ri6?FaGno pBMQl hD ;tPUnyWuxg!B Qd6ot30tAlnLg2n?tLctfXTaz:9pIC3Z;fnA]A?q9k"B2rm"eHTI"miA!d/iimz!/ndfzSKd.

W[ALoLxE[l;PQI:PG ]EtUM4?(x4zBB-[wH;GJT/JYAzFGK9x05J1Ch[z2(/L4P?KiTYNK,7mYou know nothing, RNN model.

After a cold start, the model gets initialized with trash and random characters.

Iteration 1000temple at I hand you up to laby, You set you on always hole as madoo and use unknear, And thinking I want his dista, Coom on you make you.

" "What want Everybody was on Ira," ain't may bold by you.

And the pend.

Honey, day you don't eway you say" I mad in Game, No, contaw woman, How, way, Pryie you don't know, and couse I love are stone is sute curt suck block on Haye?.Now, a make for etcide is lord, Walles And he lad feel, Take, blace And mave wease with nothing, But youThe model learned, in a few iterations, which are the indicated characters to create a Dylan song.

Also which is its shape: the sizes of verses and the basic punctuation rules like capital letters before starting sentences and the usage of commas.

Iteration 2000how.

You never you been todred, Just crying her face to the night.

Oh, uh, sang to time in you.

Timb friend carbed as lace.

We'll be the better does of my beantains, The mightenmed to cheat twist and you'll asy dressed them loves?.With the mough seen of the facing gold, Take er can Man, wanded like mind for your morning the night up the feet the wond pring, Take did a grost ever neum.

Pounsta fleason just comeless, them bads of me see there a womes of as too lotten up to turn, YouSome words are already real words and the morphological relation between words starts to show: adjectives and article as modifiers to nouns.

Circumstantial modifiers, objects and predicates after verbs.

Iteration 4000I world must be lady, babe, Ither didn't matked, don't remember helled things.

They'll eter came life, mamber And the company together That I thinking for you, though protaured in the dance please Follower, I ain't never the one?.Well, it air awa paries because a north that her in day you only think cannot the ground, her not a roll a mause where they're looked awhile, Can the Lad-eyes and the confesed white wiced to come me.

You're in two if it is, slele noners, Ain't mes was blowErrors in word predictions start to be less frequent: less vocabulary mistakes.

Iteration 35000with their highway.

I cannon cloaked in a picture, It diamondy his man I'll see you even day he'd come there across, the moon after the parking, I'm dressed, I'm a bad line.

Sanalured like a coller standing in a woman.

I'll be banked inside, She sees – Shere road-luck in the dust he's well never know.

With degreeing on a whole farms, but don't think twice and I took forwlet Johanna I never crash.

I'm going to the jelf.

All I never been don't know what I Night – Don't meanReaches an excellent morphological understanding of the verses: even if the the words don’t make much sense; it has the form of a poetry or song lyrics.

Iteration 102000guess he wat hope this nose in the last table if I don't mean to know.

Well, I'm puts some dirty trouble for you in the law.

Go dishes – And all the way from a cow, They'll stone you when you knew that, you are, you're gonna have to guess that I after and flowing on you, laws are dead, might as I read is changed.

I've taking for you, yesterday, a Martin Luther game was tried.

He was without a home, Let the basement deep about hall.

" Well, I'm lose again the real land, Throw myThe amount of errors has diminished noticeably.

It definitely looks like it was written by a human.

Maybe that human is over the recommended dose of Xanax, but is human after all.

Iteration 259000guess in the confusion come with nothing in here together wrong.

I saw sold, he puts in my bed.

Going through the branchummy, There's an ended on the factiful longer and pierce of Blind expense And the wind blow you went, I've shine, Bent Before the world is on the snowfur warn – now, handled, your daughters are full of old, goes for dignity.

Oh, you got some kid no reason to game Just and it's light and it evonces, 'round up Indian.

Well, the bright the truth was a man pattyOur network learned to write songs like Bob Dylan.

OK, alright.

I accept it.

Still we have some vocabulary errors and the lyrics may not have much sense.

Even if lyrics generated by an artificial intelligence still have these little flaws, we can certainly see that the model learned correctly to copy the style of the provided dataset.

If we consider the fact that the network learned how to do all this from scratch, and that in the beginning it didn’t have any understanding of what a letter or a word (without even mentioning the English Grammar) are, we can agree that the results are surprising.

We were capable of detecting logical patterns in a dataset and reproduce them: and in no moment the network had any input on what the language was, or which was the rule set on it or even if what was being processed were clinical images from medical patients or the corpus of Shakespeare’s works.

Conclusions and future stepsIn this article I planned on comparing two very different methods to predict texts.

On the one hand, Markov Chains bring the advantage of being easy to implement.

Big theoretical or technological knowledge is not needed to develop them: but predictions come out pretty basic and fell short of expectations.

The future of this field is clearly in the use of RNNs, even when implementing, running and testing them take a long time, processing power, space in disk and memory for the tensors, and specially an advanced technological and theoretical knowledge.

To further improve precision, we could contrast the output predictions in posterior stage with a dictionary.

This could be made of unique words taken from the database or the English Language Dictionary.

In this way, if the predicted word is not present, we could eliminate it or exchange it for the word with the greatest similarity (less distance).

Again, if you wish, you can check my code in my public GitHub account.

Let me ask you one question, is your money that good?Will it buy you forgiveness, do you think that it could?How much longer until artificial lyrics arrive?.Who will be the first to squeeze the juice out of this market.There has already been holographic tours of late musicians.

Roy Orbison, Michael Jackson y Tupac are just some examples worth mentioning.

Will this new era of “music after death” be the perfect excuse for artificial lyrics?Thanks for reading!SourcesPredicting Logic’s Lyrics with Machine LearningUsing machine-learning concepts, we can analyze Logic’s style of hip-hop and predict lyrics he would actually write.

towardsdatascience.

comA Beginner’s Guide to LSTMs and Recurrent Neural NetworksData can only be understood backwards; but it must be lived forwards.

 — Søren Kierkegaard, Journals Contents Actually…skymind.

aiText Predictor — Generating Rap Lyrics ????Language Modeling with Recurrent Neural Networks (LSTMs)towardsdatascience.

com.

. More details

Leave a Reply