How to implement Seq2Seq LSTM Model in Keras #ShortcutNLP

How to implement Seq2Seq LSTM Model in Keras #ShortcutNLPIf you got stuck with Dimension problem, this is for youAkira TakezawaBlockedUnblockFollowFollowingMar 18Keras: Deep Learning for PythonWhy do you need to read this?If you got stacked with seq2seq with Keras, I’m here for helping you.

When I wanted to implement seq2seq for Chatbot Task, I got stuck a lot of times especially about Dimension of Input Data and Input layer of Neural Network Architecture.

So Here I will explain complete guide of seq2seq for in Keras.

Let's get started!MenuWhat is Seq2Seq Text Generation Model?Task Definition and Seq2Seq ModelingDimensions of Each Layer from Seq2SeqPreprocessing of Seq2Seq (in Chatbot Case)Simplest preprocessing of code: which you can use today!1.

What is Seq2Seq Text Generation Model?Seq2Seq is a type of Encoder-Decoder model using RNN.

It can be used as a model for machine interaction and machine translation.

By learning a large number of sequence pairs, this model generates one from the other.

More kindly explained, the definition of Seq2Seq is below:Input: Text DataOutput: Text Data as wellAnd here we have examples of business applications of seq2seq:Chatbot (you can find from my GitHub)Machine Translation (you can find from my GitHub)Question AnsweringAbstract Text Summarization (you can find from my GitHub)Text Generation (you can find from my GitHub)If you want more information about Seq2Seq, here I have a recommendation from Machine Learning at Microsoft on Yotube:So let’s take a look at whole process!— — — — —2.

Task Definition and Seq2Seq Modelinghttps://www.

oreilly.

com/library/view/deep-learning-essentials/9781785880360/b71e37fb-5fd9-4094-98c8-04130d5f0771.

xhtmlFor training our seq2seq model, we will use Cornell Movie — Dialogs Corpus Dataset which contains over 220,579 conversational exchanges between 10,292 pairs of movie characters.

And it involves 9,035 characters from 617 movies.

Here one of the conversations from the data set:Mike: "Drink up, Charley.

We're ahead of you.

"Charley: "I'm not thirsty.

"Then we will input these pairs of conversation into Encoder and Decoder.

So that means our Neural Network model has two input layer as you can see below.

This is our Seq2Seq Neural Network Architecture for this time:copyright Akira TakezawaLet's visualize our Seq2Seq by using LSTM:copyright Akira Takezawa3.

Dimensions of Each Layer from Seq2Seqhttps://bracketsmackdown.

com/word-vector.

htmlThe Black Box for “NLP newbie” is I think this:How each layer compiles data and change their Dimension of data?To make this clear, I will explain how it works with detail.

The Layers can be broken down into 5 different parts:Input Layer (Encoder and Decoder)Embedding Layer (Encoder and Decoder)LSTM Layer (Encoder and Decoder)Decoder Output LayerLet’s get started!1.

Input Layer of Encoder and Decoder (2D->2D)Input Layer Dimension: 2D (sequence_length, None)# 2Dencoder_input_layer = Input(shape=(sequence_length, ))decoder_input_layer = Input(shape=(sequence_length, ))NOTE: sequence_length is MAX_LEN unified by padding in preprocessingInput Data: 2D (sample_num, max_sequence_length)# Input_Data.

shape = (150000, 15)array([[ 1, 32, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 123, 56, 3, 34, 43, 345, 0, 0, 0, 0, 0, 0, 0], [ 3, 22, 1, 6543, 58, 6, 435, 0, 0, 0, 0, 0, 0], [ 198, 27, 2, 94, 67, 98, 0, 0, 0, 0, 0, 0, 0], [ 45, 78, 2, 23, 43, 6, 45, 0, 0, 0, 0, 0, 0] ], dtype=int32)NOTE: sample_num can be a length of training_data (150000)Output Data: 2DNOTE: Input() is used only for Keras tensor instantiations— — — — —2.

Embedding layer of Encoder and Decoder (2D->3D)Embedding Layer Dimension: 2D (sequence_length, vocab_size)embedding_layer = Embedding(input_dim = vocab_size, output_dim = embedding_dimension, input_length = sequence_length)NOTE: vocab_size is the number of unique wordsInput Data: 2D (sequence_length, vocab_size)# Input_Data.

shape = (15, 10000)array([[ 1, 1, 0, 0, 1, 0, .

0, 0, 1, 0, 0, 0, 0], [ 0, 0, 1, 0, 0, 1, .

0, 0, 0, 0, 0, 0, 1], [ 0, 1, 0, 0, 0, 0, .

0, 0, 1, 0, 0, 0, 0], [ 0, 1, 0, 0, 0, 1, .

0, 0, 0, 1, 0, 1, 0], [ 0, 0, 1, 0, 1, 0, .

0, 0, 1, 0, 1, 0, 0] ], dtype=int32)NOTE: Data should be a group of One-Hot VectorOutput Data: 3D (num_samples, sequence_length, embedding_dims)# Output_Data.

shape = (150000, 15, 50)array([[[ 1, 1, 0, 0, .

0, 1, 0, 0], [ 0, 0, 1, 0, .

0, 0, 0, 1], .

, .

, [ 0, 1, 0, 0, .

1, 0, 1, 0], [ 0, 0, 1, 0, .

0, 1, 0, 0]], [[ 1, 1, 0, 0, .

0, 1, 0, 0], [ 0, 0, 1, 0, .

0, 0, 0, 1], .

, .

, [ 0, 1, 0, 0, .

1, 0, 1, 0], [ 0, 0, 1, 0, .

0, 1, 0, 0]], .

,] * 150000 , dtype=int32)NOTE: Data is word embedded in 50 dimensions— — — — —3.

LSTM layer of Encoder and Decoder (3D->3D)The tricky argument of LSTM layer is these two:1.

return_state:Whether to return the last state along with the output2.

return_sequences:Whether the last output of the output sequence or a complete sequence is returnedYou can find a good explanation from Understand the Difference Between Return Sequences and Return States for LSTMs in Keras by Jason Brownlee.

Layer Dimension: 3D (hidden_units, sequence_length, embedding_dims)# HIDDEN_DIM = 20encoder_LSTM = LSTM(HIDDEN_DIM, return_state=True) encoder_outputs, state_h, state_c = encoder_LSTM(encoder_embedding)decoder_LSTM = LSTM(HIDDEN_DIM, return_state=True, return_sequences=True) decoder_outputs, _, _ = decoder_LSTM(decoder_embedding, initial_state=[state_h, state_c])Input Data: 3D (num_samples, sequence_length, embedding_dims)# Input_Data.

shape = (150000, 15, 50)array([[[ 1, 1, 0, 0, .

0, 1, 0, 0], [ 0, 0, 1, 0, .

0, 0, 0, 1], .

, .

, [ 0, 1, 0, 0, .

1, 0, 1, 0], [ 0, 0, 1, 0, .

0, 1, 0, 0]], [[ 1, 1, 0, 0, .

0, 1, 0, 0], [ 0, 0, 1, 0, .

0, 0, 0, 1], .

, .

, [ 0, 1, 0, 0, .

1, 0, 1, 0], [ 0, 0, 1, 0, .

0, 1, 0, 0]], .

,] * 150000 , dtype=int32)NOTE: Data is word embedded in 50 dimensionsOutput Data: 3D (num_samples, sequence_length, hidden_units)# HIDDEN_DIM = 20# Output_Data.

shape = (150000, 15, 20)array([[[ 0.

0032, 0.

0041, 0.

0021, .

0.

0020, 0.

0231, 0.

0010], [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], .

, .

, [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], [ 0.

0021, 0.

0065, 0.

0008, .

0.

0089, 0.

0043, 0.

0024]], [ 0.

0032, 0.

0041, 0.

0021, .

0.

0020, 0.

0231, 0.

0010], [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], .

, .

, [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], [ 0.

0021, 0.

0065, 0.

0008, .

0.

0089, 0.

0043, 0.

0024]], .

,] * 150000 , dtype=int32)NOTE: Data reshaped by LSTM as hidden layer in 20 dimensionsAdditional Information:If return_state = False and return_sequences = False :Output Data: 2D (num_sample, hidden_units)# HIDDEN_DIM = 20# Output_Data.

shape = (150000, 20)array([[ 0.

0032, 0.

0041, 0.

0021, .

0.

0020, 0.

0231, 0.

0010], [ 0.

0076, 0.

0767, 0.

0761, .

0.

0098, 0.

0065, 0.

0076], .

, .

, [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], [ 0.

0021, 0.

0065, 0.

0008, .

0.

0089, 0.

0043, 0.

0024]] , dtype=float32)— — — — —4.

Decoder Output Layer (3D->2D)Output Layer Dimension: 2D (sequence_length, vocab_size)outputs = TimeDistributed(Dense(VOCAB_SIZE, activation='softmax'))(decoder_outputs)NOTE: TimeDistributedDenselayer allows us to apply a layer to every temporal slice of an inputInput Data: 3D (num_samples, sequence_length, hidden_units)# HIDDEN_DIM = 20# Input_Data.

shape = (150000, 15, 20)array([[[ 0.

0032, 0.

0041, 0.

0021, .

0.

0020, 0.

0231, 0.

0010], [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], .

, .

, [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], [ 0.

0021, 0.

0065, 0.

0008, .

0.

0089, 0.

0043, 0.

0024]],[ 0.

0032, 0.

0041, 0.

0021, .

0.

0020, 0.

0231, 0.

0010], [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], .

, .

, [ 0.

0099, 0.

0007, 0.

0098, .

0.

0038, 0.

0035, 0.

0026], [ 0.

0021, 0.

0065, 0.

0008, .

0.

0089, 0.

0043, 0.

0024]],.

,] * 150000 , dtype=int32)NOTE: Data reshaped by LSTM as hidden layer in 20 dimensionsOutput Data: 2D (sequence_length, vocab_size)# Output_Data.

shape = (15, 10000)array([[ 1, 1, 0, 0, 1, 0, .

0, 0, 1, 0, 0, 0, 0], [ 0, 0, 1, 0, 0, 1, .

0, 0, 0, 0, 0, 0, 1], [ 0, 1, 0, 0, 0, 0, .

0, 0, 1, 0, 0, 0, 0], [ 0, 1, 0, 0, 0, 1, .

0, 0, 0, 1, 0, 1, 0], [ 0, 0, 1, 0, 1, 0, .

0, 0, 1, 0, 1, 0, 0] ], dtype=int32)After Data passed this Fully Connected Layer, we use Reversed Vocabulary which I will explain later to convert from One-Hot Vector into Word Sequence.

— — — — —4.

Entire Preprocess of Seq2Seq (in Chatbot Case)Creating A Language Translation Model Using Sequence To Sequence Learning ApproachBefore jumping on preprocessing of Seq2Seq, I wanna mention about this:We need some Variables to define the Shape of our Seq2Seq Neural Network on the way of Data preprocessingMAX_LEN: to unify the length of the input sentencesVOCAB_SIZE: to decide the dimension of sentence’s one-hot vectorEMBEDDING_DIM: to decide the dimension of Word2Vec— — — — —Preprocessing for Seq2SeqOK, please put this information on your mind, let’s start to talk about preprocessing.

The whole process could be broken down into 8steps:Text CleaningPut <BOS> tag and <EOS> tag for decoder inputMake Vocabulary (VOCAB_SIZE)Tokenize Bag of words to Bag of IDsPadding (MAX_LEN)Word Embedding (EMBEDDING_DIM)Reshape the Data depends on neural network shapeSplit Data for training and validation, testing— — — — —1.

Text CleaningFunctionI always use this my own function to clean text for Seq2Seq:Input# encoder input text data["Drink up, Charley.

We're ahead of you.

", 'Did you change your hair?', 'I believe I have found a faster way.

']Output# encoder input text data['drink up charley we are ahead of you', 'did you change your hair', 'i believe i have found a faster way']— — — — —2.

Put <BOS> tag and <EOS> tag for decoder inputFunction<BOS> means “Begin of Sequence”, <EOS> means “End of Sequence”.

Input# decoder input text data[['with the teeth of your zipper', 'so they tell me', 'so which dakota you from'],,,,]Output# decoder input text data[['<BOS> with the teeth of your zipper <EOS>', '<BOS> so they tell me <EOS>', '<BOS> so which dakota you from <EOS>'],,,,]— — — — —3.

Make Vocabulary (VOCAB_SIZE)FunctionInput# Cleaned texts[['with the teeth of your zipper', 'so they tell me', 'so which dakota you from'],,,,]Output>>> word2idx{'genetically': 14816, 'ecentus': 64088, 'houston': 4172, 'cufflinks': 30399, "annabelle's": 23767, .

.

} # 14999 words>>> idx2word{1: 'bos', 2: 'eos', 3: 'you', 4: 'i', 5: 'the', .

.

} # 14999 indexs— — — — —4.

Tokenize Bag of words to Bag of IDsFunctionInput# Cleaned texts[['with the teeth of your zipper', 'so they tell me', 'so which dakota you from'],,,,]Output# decoder input text data[[10, 27, 8, 4, 27, 1107, 802], [3, 5, 186, 168], [662, 4, 22, 346, 6, 130, 3, 5, 2407],,,,,]— — — — —5.

Padding (MAX_LEN)FunctionInput# decoder input text data[[10, 27, 8, 4, 27, 1107, 802], [3, 5, 186, 168], [662, 4, 22, 346, 6, 130, 3, 5, 2407],,,,,]Output# MAX_LEN = 10# decoder input text data array([[10, 27, 8, 4, 27, 1107, 802, 0, 0, 0], [3, 5, 186, 168, 0, 0, 0, 0, 0, 0], [662, 4, 22, 346, 6, 130, 3, 5, 2407, 0],,,,,]— — — — —6.

Word Embedding (EMBEDDING_DIM)FunctionWe use Pretraind Word2Vec Model from Glove.

We can create embedding layer with Glove with 3 steps:Call Glove file from XXCreate Embedding Matrix from our VocabularyCreate Embedding LayerLet’s take a look!Call Glove file from XXCreate Embedding Matrix from our VocabularyCreate Embedding Layer— — — — —7.

Reshape the Data to neural network shapeFunctionInput# MAX_LEN = 10# decoder input text dataarray([[10, 27, 8, 4, 27, 1107, 802, 0, 0, 0], [3, 5, 186, 168, 0, 0, 0, 0, 0, 0], [662, 4, 22, 346, 6, 130, 3, 5, 2407, 0],,,,,]Output# output.

shape (num_samples, MAX_LEN, VOCAB_SIZE)# decoder_output_data.

shape (15000, 10, 15000)array([[[0.

, 0.

, 0.

, .

, 0.

, 0.

, 0.

], [0.

, 0.

, 0.

, .

, 0.

, 0.

, 0.

], [0.

, 0.

, 1.

, .

, 0.

, 0.

, 0.

], .

, [1.

, 0.

, 0.

, .

, 0.

, 0.

, 0.

], [1.

, 0.

, 0.

, .

, 0.

, 0.

, 0.

], [1.

, 0.

, 0.

, .

, 0.

, 0.

, 0.

]], .

, ], , dtype=float32)— — — — —8.

Split Data for training and validation, testingFunction— — — — —ReferencesCreating A Language Translation Model Using Sequence To Sequence Learning ApproachHello guys.

It's been quite a long while since my last blog post.

It may sound like an excuse, but I've been struggling…chunml.

github.

ioTaming Recurrent Neural Networks for Better SummarizationThis is a blog post about our latest paper, Get To The Point: Summarization with Pointer-Generator Networks, to appear…www.

abigailsee.

comoswaldoludwig/Seq2seq-Chatbot-for-KerasThis repository contains a new generative model of chatbot based on seq2seq modeling.

 …github.

comDeep Learning Models for Question Answering with KerasLast week, I was at a (company internal) workshop on Question Answering (Q+A), organized by our Search Guild, of which…sujitpal.

blogspot.

comMachine Translation using Sequence-to-Sequence Learning – NextjournalIn this article we're training a Recurrent Neural Network (RNN) model based on two Long Short-Term Memory (LSTM) layers…nextjournal.

comChatbots with Seq2SeqLearn to build a chatbot using TensorFlowcomplx.

meSeq2Seqモデルを用いたチャットボット作成 〜英会話のサンプルをTorchで動かす〜 | ALGO GEEKS最近、チャットボットが話題となっていますが、自然な会話を成り立たせること、は大きな課題の一つです。 ここでは、Deep Learningの一種である、Seq2Seqモデルを用いて、チャットボットを動作させてみます。…blog.

algolab.

jp.

. More details

Leave a Reply