Your Chatbot Has A Big Vulnerability And This Is How To Fix It
Improve Chatbot Resilience By Adding An Initial High-Pass NLP Layer
Invariably the majority of Chatbot design processes start with the squad imagining what customers might say…
These are often thought up by employees who are very excited to be part of the chatbot initiative. This process of looking at products, services and current customer queries usually culminate in single-line user intents. These single-line user utterances are very conveniently, but extremely artificially organized and grouped according to pre-defined intents.
These utterances with matching intents are usually a short sentence, well formed, with one verb and at most two nouns. The one or two verbs are used to assign the utterance to an intent. And then, the noun is utilized to capture the entity. Through all of this frantic planning, designing and developing, the memo does not reach the user. This results in an user not playing to the design rules existing in the squads shared understanding.
But We Tested
Testing regimes are most often closely aligned with the design process, and in essence the design is tested, together with the expected user input. Also, most often the real user input is not known at this stage and the expectations of the Chatbot team is definitely not align with the reality which is about to hit them. One of the problems is, that with most user interfaces there are models of convention.
For interfaces which have been around for a while; like for instance websites, mobile apps, IVR etc. there are models of convention. These models have been established over years and the designer have the liberty to call on these known models of convention to ensure ease of user navigation. The designers have the luxury to coalesce these models of convention for a good, or at least reasonable user experience.
Whenever a new medium emerges, there are loose patterns of behavior. These loose patterns of behavior causes frustration on the side of the designers and users alike. But, somehow along the line, form starts following function, and a model of convention is established. The users’ experiences are distilled and streamlined, and design principles are established.
Chatbots do not have this luxury, or at least not currently. And in the case of Conversational Interfaces the users’ expectations overshoots the capability of the Chatbot.
This is mainly due to users not being able with previous interfaces to enter their data unstructured. This sudden freedom of having a conversation and entering data unstructured invokes the notion with the user to want to explore this freedom and input augmented and elaborate utterances.
Our Tools Are Shaping Us
Almost all Chatbot development environments force developers to create single word intents, then create at least 10 sentences applicable to this intent. And subsequently entities need to be tied to these intents. You can see intents as verbs and entities as nouns. There are some variation in defining entities; where entities can be defined contextually within the intents. Or, where entities are categorized into different types; lists, regular expressions etc. There is no real scope within the tool to examine, categorize or breakup the user input.
What Goes Wrong
The user input too much text; in short. The medium impacts the message, and in some mediums, like sms/text and messaging applications in the general, the user input might be shorter. Then, in mediums access via a keyboard or a browser, the user input is longer.
The longer user input can have multiple sentences, with numerous user intents embedded in the text. There can also be multiple entities.
Users don’t always speak in single intent and entity utterances. On the contrary, the users will speak in compound utterances. The moment these complex user utterances are thrown at the chatbot, the bot needs to play a game of single intent winner. Which intent from the whole host of intents from the user is going to win this round of dialog turn.
But…what if the chat could detect, that I just received 4 sentences; the intent of the first one is weather tomorrow in Cape Town. The second sentence is the stock price for Apple, the third is an alarm for tomorrow morning etc.
Too ambitious you might think?
Not at all…very possible, doable and the tools to achieve this exist. Best of all, these tools are opensource and free to use…
Introduce a first, high-pass Natural Language Processing (NLP) layer. This layer will analyze the text of the user input. This being the dialog or utterance sent through from the user. This layer will perform pre-processing on the text and from here make the dialog digestible for the chatbot. Allowing the chatbot to answer a long compound question we as humans will answer the question.
We as humans take the question from the top down and answer different aspects of the question.
Step 1: Automatic Language Detection
The chatbot can only accommodate a finite number of languages; usually it is a single language. The last thing you want is your user rambling on in a language your chatbot does not accommodate.
Consider the scenario where your chatbot keeps on replying with a “I do not understand” dialog, while the user tweak their utterances in an attempt to get a suitable response from the chatbot. All the while the language used by the chatbot is not provisioned in the bot.
Especially for multinational organizations this can be a pain-point.
It is such an easy implemented solution to to a first pass language check on a user to determine the language, and subsequently respond to the user advising on the languages available.
The nice part is, you don’t have to always identify which of the 6,500 languages in the world your user speaks. You just need to know the user is not using one of the languages your chatbot can speak.
It is however a nice feature to have, where your chatbot advises the user that currently they are speaking French, but the chatbot only makes provision for English and Spanish. This can be implemented in a limited fashion though.
Step 2: Sentence Boundary Detection
A initial process can be to extract reasonable sentences especially when the format and domain of the input text are unknown. The size of the input and the number of intents can be loosely gauged by the amount of sentences.
This also allows for parsing the user input separately and responding to the user accordingly.
Irrelevant sentences can be ignored, and sentences with a good intent and entity match can be given special attention in reverting to the user.
Step 3: Find All Named Entities
But first, what is an entity?
Entities are the information in the user input that is relevant to the user’s intentions.
Intents can be seen as verbs (the action a user wants to execute), entities represent nouns (for example; the city, the date, the time, the brand, the product.). Consider this, when the intent is to get a weather forecast, the relevant location and date entities are required before the application can return an accurate forecast.
Recognizing entities in the user’s input helps you to craft more useful, targeted responses. For example, You might have a #buy_something intent. When a user makes a request that triggers the #buy_something intent, the assistant’s response should reflect an understanding of what the something is that the customer wants to buy. You can add a product entity, and then use it to extract information from the user input about the product that the customer is interested in.
spaCy has a very efficient entity detection system which also assigns labels. The default model identifies a host of named and numeric entities. This can include places, companies, products and the like.
- Text: The original entity text.
- Start: Index of start of entity in the doc
- End: Index of end of entity in the doc
- Label: Entity label, i.e. type
Step 4: Determine Dependencies
Words can often have different meanings depending on the how it is used within a sentence. Hence analyzing how a sentence is constructed can help us determine how single worlds relate to each other.
If we look at the sentence, “Mary rows the boat.” There are two nouns, being Mary and boat. There is also a single verb, being rows. To understand the sentence correctly, the word order is important, we cannot only look at the words and their part of speech.
Now this will be an arduous task, but within spaCy we can use noun chunks. According to the spaCy documentation, You can think of noun chunks as a noun plus the words describing the noun — for example, “the lavish green grass” or “the world’s largest tech fund”. To get the noun chunks in a document, simply iterate over Doc.noun_chunks.
The sentence “Smart phones pose a problem for insurance companies in terms of fraudulent claims”, returns the following data:
Text is the original noun chunk text. Root text is the original text of the word connecting the noun chunk to the rest o the phrase. Root dep: Dependency relation connecting the root to its head. Root head text: The text of the root token’s head.
- NSUBJ denotes Nominal subject.
- DOBJ is a direct object.
- POBJ is Object of preposition.
Step 5: Clean Text From Any Possible Markup
You can use a Python package for converting raw text in to clean, readable text and extracting metadata from that text. Functionalities include transforming raw text into readable text by removing HTML tags and extracting metadata such as the number of words and named entities from the text.
Step 6: Tokens
Tokenization is the task of splitting a text into meaningful segments, referred to as tokens. The example below is self-explanatory.
There is no magic remedy to make a conversational interface just that; conversational.
It will take time and effort.
But it is important to note that commercially available chatbot solutions should not be seen as a package by which you need to abide. Additional layers can be introduced to advise the user and inform the chatbot’s basic NLU.
A chatbot must be seen within an organization as a Conversational AI interface and the aim is to further the conversation and give the user guidelines to take the conversation forward.
If the user utterances just bounce off the the chatbot and the user needs to figure out how to approach the conversation, without any guidance, the conversation is bound to be abandoned.