Do’s and Don’ts For Building Your Chatbot
Building A Natural Language Understanding Application For Unstructured Data Input
Usually once a chatbot launched, the default approach is to just add more and more utterances to each intent, somehow hoping that this will magically improve the chatbot’s performance.
Somehow we believe the more utterance examples we add to the bot, the cleverer the bot will become.
Below are a few Do’s and Don’ts that will help you in taking your chatbot in a journey of improvement and with each iteration grow in understanding and reacting to user utterances.
Do Define Distinct Intents
Make sure the vocabulary for an intent is specific to the intent it is meant for. Avoid having intents which overlaps.
For example, if you have a chatbot which handles travel arrangements such as flights and hotels, you can choose:
- To have these two user utterances and ideas as separate intents
- Or use the same intent with two entities for specific data inside the utterance; be it flights or hotels.
If the vocabulary between two intents are the same, combine the intent, and use entities.
Take a look at the following two user utterances:
- Book a flight
- Book a hotel
Both use the same wording, “book a”. The format is the same so it should be the same intent with different entities. One entity being flight and the other hotel.
Do Determine Intent Sweet Spots
Use prediction data to determine if your intents are overlapping. Overlapping intents introduce ambiguity to your NLU model. Where it becomes hard to know how the model will react to a certain user utterance. Sometimes even with testing inconsistency is experienced.
Sometimes overlapping intents can change being first or second in training, after each build or training iteration. The score of utterances for each should not be so close that interpretation flip-flop.
Hence good distinctive intents are vital.
Do Iterate Using Versions
While taking your NLU model / chatbot through iterations, ensure that you use versioning hence allowing you to track the performance of versions against each other and easing the process of rollback.
Most NLU environments, like IBM Watson Assistant and Microsoft Azure LUIS allow for versioning. If yo use an environment like RASA, it is even easier to version your chatbot iterations and keep tract of changes and performance.
Do Build For Model Decomposition
Model decomposition has a typical process of:
- Create Intent based on chatbot user intentions
- Add 15–30 example utterances per intent based on real-world user input
- Label top-level data concept in example utterance
- Break data concept into sub-components
- Add descriptors (features) to sub-components
- add descriptors (features) to intent
Once you have create an intent and have added a few example user utterances, the following example describes Entity Decomposition:
Begin with identifying complete data concepts you need to extract from an utterance; this is your machine-learned entity.
Then decompose the phrase into its parts. This includes identifying sub-components (also known as entities), along with descriptors and constraints.
For Example: if you want to extract an address, the top machine-learned entity could be called merely “Address”. While creating the address, identify some of its sub-components such as:
- Street Name and Number
- Postal Code
Continue decomposing these elements by constraining the postal code to a regular expression. Then, Decompose the street address into parts of a street number (using a pre-built number), a street name, and a street type.
The street type can be described with a descriptor list such as avenue, circle, road, and lane.
As you read this a pattern should emerge, where an utterance is taken and decomposed in to basic elements which constitutes this data which you are interested in. Then these more basic elements are constrained so it can be pattern matched and assigned to a variable or slot.
Do Add Patterns Later
Hopefully your NLU design and development environment allows for patterns. Patterns are very powerful and allow for presition extraction of multiple entities within an utterance.
It is best to implement patterns later due to the weight they carry and it would make sense to study user utterances to determine what patterns are established in their way of speaking.
Do Balance Utterances Across Intents
In order for your NLU predictions to be accurate, the number of example utterances for each intent defined, must be comparatively equal.
Should there be an intent with 100 example utterances and an intent with 20 example utterances, the 100-utterance intent will have a higher rate of prediction.
Do Leverage the Suggest Features
Most chatbot development environments have a suggest feature for intents and it is advisable to make use of it. Even if you just look at it for ideas and guidance.
Do Monitor Chatbot Performance
You need to understand the performance of your chatbot, what causes the performance parameters and what you need to tweak to improve the performance.
Don’t Add Too Many Example Utterances To Intents
Be discerning in adding example utterances to intents. If utterances are too similar, add a pattern.
Don’t Use Too Few Or Simple Entities
Creating and crafting entities is hard work, but don’t take shortcuts. Think clearly of the data you want to capture and create an adequate number of entities. If they are meant to be more complex in nature; create them accordingly.
Don’t Create Example Utterances With The Same Format
Utterances can be vastly different while having the same basic meaning. Variations can include utterance length, word choice, and word placement.
- Buy a ticket to Seattle
- Buy a ticket to Paris
- Buy a ticket to Orlando
- Buy 1 ticket to Seattle
- Reserve two seats on the red eye to Paris next Monday
- I would like to book 3 tickets to Orlando for spring break
The second scenario is more plausible.
Don’t Mix The Definition Of Intents & Entities
Think of intents as verbs or actions and entities as nouns. The intent is the intention of the user, the action the user wants to take or expect from the chatbot. The entity or entities are the nouns or relative detail you want to capture.
Don’t Train & Publish With Every Single Example Utterance
Add 10 or 15 utterances before training and publishing. That allows you to see the impact on prediction accuracy. Adding a single utterance may not have a visible impact on the score.