A Beginner’s Guide to Rasa NLU for Intent Classification and Named-entity Recognition

A Beginner’s Guide to Rasa NLU for Intent Classification and Named-entity RecognitionNg Wai FoongBlockedUnblockFollowFollowingJun 27Image taken from https://givemechallenge.

com/wp-content/uploads/2017/01/rasa-NLU.

pngThe purpose of this article is to explore the new way to use Rasa NLU for intent classification and named-entity recognition.

Since version 1.

0.

0, both Rasa NLU and Rasa Core have been merged into a single framework.

As a results, there are some minor changes to the training process and the functionality available.

First and foremost, Rasa is an open source machine learning framework to automate text-and voice-based conversation.

In other words, you can use Rasa to build create contextual and layered conversations akin to an intelligent chatbot.

In this tutorial, we will be focusing on the natural-language understanding part of the framework to capture user’s intention.

There are 5 sections in this tutorial:Setup and installationData preparation and formatTraining and testingRunning NLU ServerConclusion[Section 1] Setup and installationI am using Python 3.

6.

7 installed in a virtual environment of a Windows operating system.

It is recommended to install it in a clean virtual environment as the there are quite a number of python modules to be installed.

Python moduleFor simplicity, we will just install a standard pipeline that can be used for all languages.

In the official documentation, the team recommends using spaCy pipeline but we will be using the supervised_embeddings pipeline which is based on Tensorflow.

Activate the virtual environment and run the following command:pip install rasaIt may take a while for the modules installation and upgrade.

Once it is completed, point to to a directory of your choice and run the following command:rasa init –no-promptYou will be able to see training process for both nlu and core using the default data.

The following files will be created:__init__.

py: An empty file that helps python find your actions.

actions.

py: Code for your custom actionsconfig.

yml: Configuration of your NLU and Core modelscredentials.

yml: Details for connecting to other servicesdata/nlu.

md: Your NLU training datadata/stories.

md: Your storiesdomain.

yml: Your assistant’s domainendpoints.

yml: Details for connecting to endpoint channelsmodels/<timestamp>.

tar.

gz: Your initial model.

Timestamp is in the format of YYYYMMDD-hhmmss.

NLU-only models will have nlu prefix at the front.

In fact, you have already trained a complete model that can be used for intent classification.

Let’s move on to the next section to learn more about the training data format.

[Section 2] Data preparation and formatIf you would like to train it using your own custom data, you can prepare it in either markdown or json format.

I will be using markdown in this tutorial since it is the easiest.

Kindly refer to the following link for more information on all the available training data format.

Open up data/nlu.

md data start to modify the content according to your own use case.

IntentYou can specify intent with ## intent:name_of_intent followed by a list of questions for the intent (space between each intent):## intent:goodbye- bye- goodbye- see you around- see you later- talk to you later## intent:ask_identity- who are you- what is your name- how should i address you- may i know your name- are you a botEntityYou can specify the entity inside each of the question as follow [value](name of entity):## intent:ask_shop_open- does the shop open on [monday](weekday)- does the shop open on [wednesday](weekday)- does the shop open on [friday](weekday)In this case, weekday is the name of the entity and monday is the value.

You need to provide a lot of examples in order to capture the entity.

Please be noted that upper case and lower case affects the accuracy.

Monday is not the same as monday.

Hence, it is advisable to train all in lower case and parse input data to lower case during evaluation.

Lookup tableIf you have a long list of values for a single entity, it is advisable to include a lookup table instead of filling in all of them as example sentences.

There are two ways to do it.

The first one is to include them in-line:## lookup:weekday- monday- tuesday- wednesday- thursday- fridayThe second is to list it in a text file and include the path in-line.

Let’s try it with a new entity called countries:## lookup:countriespath/to/countries.

txtIn the countries.

txt, you can specify each of the element in a new line as follow:singaporemalaysiavietnamindonesiathailandJust like weekday entity, you have to provide a few examples for it to generalize.

## intent:inform_country_of_origin- i am from [malaysia](countries)- i am from [vietnam](countries)- i came from [thailand](countries)SynonymRasa also provides a way to identify synonym and map it back to a single value.

The first method is to add it inline like [synonym1](entity:value):## intent:ask_eaten- what did you have for [breakfast](meal)- what did you have for [break fast](meal:breakfast)- what did you have for [breakfat](meal:breakfast)The second method is as follow:## synonym:breakfast- brekfast- brokefastWhat makes synonym differs from lookup table is that synonym will map the value of the entity to a single value (breakfast in this example).

In other words, synonym is great for catching spelling mistakes and acronym while lookup table is great for generalizing the examples.

RegexThere is also a feature called regex that support regular expressions.

## intent:inform_zipcode- my zipcode is [12345](zipcode)- my zipcode is [33456](zipcode)- my zipcode is [94056](zipcode)## regex:zipcode- [0-9]{5}I have attached a sample text file for your reference:Convert data formatMarkdown is arguably the safest choice for beginner to create the data.

However, there can be cased where the training data is automated or came from other source such as LUIS data format, WIT data format, Dialogflow data format and json.

Rasa also provides a way for you to convert the data format.

Check out the following link to know more about it.

Make sure that the virtual environment is activated and run the following command (it converts md to json):rasa data convert nlu –data data/nlu.

md –out data/nlu.

json -f json–data is the path to the file or directory containing Rasa NLU data.

–out is the name of the file to save training data in Rasa format.

-f is the output format the training data should be converted into.

Accepts either json or md.

Once you have all the required data, move it to the data folder and remove any existing .

let’s move on to the next section.

[Section 3] Training and testingTraining modelTo train the nlu model, you can just run the following command:rasa train nluAs stated in the official documentation, it will look for NLU training data files in the data folder and saves a trained model in the model folder.

Remember to remove any unnecessary data files from the data folder.

The name of the model will be prefixed with nlu- to indicate that this is a nlu-only model.

Having said that, you can specify the path using the –data parameter.

The full list of parameters can be found here.

Testing modelYou can test the model by running an interactive shell mode via the following command:rasa shell nluIf you have multiple nlu models and would like to test a specific model, use the following command instead.

rasa shell -m models/nlu-20190515-144445.

tar.

gzCheck the following link to find out more about the additional parameters.

You can input your text and press enter.

The shell will return a json indicating the intent and confidence.

[Section 4] Running NLU ServerRunning serverRasa also provides a way for you to start a nlu server which you can call via HTTP API.

Run the following command (modify the name of the model accordingly):rasa run –enable-api -m models/nlu-20190515-144445.

tar.

gzYou should see the following output:Starting Rasa Core server on http://localhost:5005You can modify some settings by specifying the parameters together in the command.

Check out the following link to find out more.

For cors parameters, it accepts a list of URL.

It allows Cross-Origin Resources Sharing that tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin.

You can use “*” to whitelist all the domains.

rasa run –enable-api -m models/nlu-20190515-144445.

tar.

gz –cors "*"At the time of this writing, there seems to be no way to stop or interrupt the server.

I did tried Ctrl+C but it only works from time to time.

If you encounter such issue, the only way is to kill the process.

Simply click close the command prompt and re-run it.

Once the server is running, you can test the result using curl.

Open up a new command prompt and run the following line:curl localhost:5005/model/parse -d '{"text":"hello"}'You should be to obtain a json result indicating the intent and confidence level as follow:{"intent":{"name":"greet","confidence":0.

9770460725},"entities":[],"intent_ranking":[{"name":"greet","confidence":0.

9770460725},{"name":"mood_unhappy","confidence":0.

0257926807},{"name":"ask_identity","confidence":0.

0009481288},{"name":"mood_great","confidence":0.

0},{"name":"inform_identity","confidence":0.

0},{"name":"goodbye","confidence":0.

0}],"text":"hello"}HTTP APIRasa also comes with its own HTTP API that can be useful if you intent to call it via AJAX.

Kindly refer the the full list here.

In this tutorial, we will be concentrating on just one API call that is used to predict the intent and entities of the message posted to the end point.

You can simply send a POST call to the following URL:http://localhost:5005/model/parseThe following is an example via AJAX POST call:The latest framework removed the ability to call multiple model in a single server.

In the previous framework, we can specify our own model as parameter to indicate which model to be used for classification.

Now, it is officially one model per server.

[Section 5] ConclusionThat’s it, folks!.Let’s recap that we have learnt on how to use Rasa NLU to train our own model for intent classification and entity extractions.

The next step is to fine-tune and conduct further training to optimize the current model.

You can choose to check out Rasa Core as well if you intend to have a full-fledge chatbot framework that reply based on stories.

On the other hand, using just NLU provides you with greater flexibility if you are already using other framework for your chatbot.

If you are interested in setting up a chatbot in messaging platform, check out the following link.

Thanks for reading and have a nice day!Referencehttp://rasa.

com/docs/rasa/user-guide/installation/http://rasa.

com/docs/rasa/user-guide/rasa-tutorial/http://rasa.

com/docs/rasa/api/http-api/http://rasa.

com/docs/rasa/user-guide/command-line-interfacehttp://rasa.

com/docs/rasa/nlu/training-data-format/http://rasa.

com/docs/rasa/nlu/using-nlu-only/http://rasa.

com/docs/rasa/user-guide/messaging-and-voice-channels/.. More details

Leave a Reply