GPT-3, Low-Code & Fine-Tuning
And Implementing Enterprise Scale Solutions
The advent of GPT-3 has sparked the discussion on low-code implementations for chatbots…
There is been much talk about the low-code approach to software development and how it acts as a catalyst for rapid development. How it acts as a vehicle for delivery solutions with minimal bespoke hand-coding.
Low-code interfaces are made available via a single or a collection of tools which is very graphic in nature; and initially intuitive to use. Thus delivering the guise of rapid onboarding and speeding up the process of delivery solutions to production.
As with many approaches of this nature, initially it seems like a very good idea. However, as functionality, complexity and scaling start playing a role, huge impediments are encountered.
In this article I want to explore:
- Two instances where a chatbot low-code approach have worked and how it was accomplished.
- What exactly does fine-tuning refer to in chatbots and why a low-code approach cannot accommodate it.
Current Examples of Low-Code
Two low-code implementations have been successful of late. IBM Watson Assistant Actions and Microsoft Power Virtual Agents.
The reason these have been successful is because the low-code component can be used as a stand-alone approach. Or for complex implementations, act as an extension.
IBM Watson Assistant Actions
A new feature has been added to IBM Watson Assistant called Actions. This new feature allows users to develop dialogs in a rapid fashion.
The approach taken with Actions is one of an extreme non-technical nature. The interface is intuitive and requires virtually no prior development knowledge or training. User input (entities) variables are picked up automatically with a descriptive reference.
Conversational steps can be re-arranged and moved freely to update the flow of the dialog.
Updates can be saved automatically, machine learning takes place in the background.
And the application (action) can be tested in a preview pane.
There is something of Actions which reminds me of the Microsoft’s Power Virtual Agent interface.
The same general idea is there, but with Watson the interface is more simplistic and minimalistic. And perhaps more a natural extension of the current functionality.
- You can think of an action as an encapsulation of an intent. Or the fulfillment of an intent.
- An action is a single conversation to fulfill an intent and capture the entities.
- A single action is not intended to stretch across multiple intents or be a horizontally focused conversation.
- Think of an action as a narrow vertical and very specific conversation.
The concept of Actions is astute and the way it is introduced to Watson Assistant 100% complements the current development environment. There is no disruption or rework required of any sort.
Actions democratizes the development environment for designers to also create a conversational experience, again, without disrupting the status quo.
Actions used as intended will advance any Watson Assistant implementation.
But I hasten to add this caveat; Actions implemented in a way it was not intended will lead to impediments in scaling and leveraging user input in terms of intents and entities.
Microsoft Power Virtual Agents
Microsoft necessitated native code when it came to the dialog flow and state management, making us of a bot emulator for testing. Code can run locally on your machine or in the Azure cloud. Which in principle is a good thing.
But, when you opt for a visual design tool for call flow and state management, three impediments arise in time:
- Invariably you would want to include functions and extensions not available in your authoring environment.
Microsoft has extended their Conversational AI offering with an environment they call Power Virtual Agent (PVA). Before we look at the PVA functionality, it is important to note the following…
The PVA is a good design, prototype and wire-frame environment.
The PVA is an excellent tool to get your chatbot going, and a fairly advanced chatbot can be crafted with the PVA with API integration, the dialog authoring canvas is a advanced in functionality.
Invariably in time your chatbot is going to outgrow PVA, and then what? This is the part where the Microsoft Conversational AI ecosystem is ready to allow extend, in particular Composer.
Extend To Bot Framework Skills
Power Virtual Agents enables you to extend your bot using Azure Bot Framework Skills. If you have already built and deployed bots in your organization (using Bot Framework pro-code tools) for specific scenarios, you can convert bots to a Skill and embed the Skill within a Power Virtual Agents bot.
You can combine experiences by linking re-usable conversational components, known as Skills.
Within an Enterprise, this could be creating one parent bot bringing together multiple sub-bots owned by different teams, or more broadly leveraging common capabilities provided by other developers.
Skills are in itself Bots, invoked remotely and a Skill developer template (.NET, TS) is available to facilitate creation of new Skills.
Power Virtual Agents, provide exceptional support to customers and employees with AI-driven virtual agents. Easily create and maintain bots with a no-code interface. That is the sales pitch, and it is true.
Mention needs to be made of Microsoft Bot Framework Composer.
When someone refers to the ability or the extend to which fine-tuning can be performed, what exactly are they referring to? In this section we are going to step through a few common elements which constitutes fine-tuning.
- Forms & Slots
- Natural Language Generation (NLG)
- Dialog Management
Forms & Slots
An Intent is the user’s intention with their utterance, or engagement with your bot. Think of intents as verbs, or working words. An utterance or single dialog from a user needs to be distilled into an intent.
Entities can be seen as nouns, often they are referred to as slots. These are usually things like date, time, cities, names, brands etc. Capturing these entities are crucial for taking action based on the user’s intent.
Think of a travel bot, capturing the cities of departure, destination, with travel mode, costs, dates and times etc. are at the foundation of the interface. Yet, this is the hardest part of the NLU process. Keep in mind the user enters data randomly and unstructured; in no particular order.
We as humans identify entities based on the context we detect and hence we know where to pick out a city name; even though we have never previously heard the city name.
Make sure the vocabulary for an intent is specific to the intent it is meant for. Avoid having intents which overlaps.
For example, if you have a chatbot which handles travel arrangements such as flights and hotels, you can choose:
- To have these two user utterances and ideas as separate intents
- Or use the same intent with two entities for specific data inside the utterance; be it flights or hotels.
If the vocabulary between two intents are the same, combine the intent, and use entities.
Take a look at the following two user utterances:
- Book a flight
- Book a hotel
Both use the same wording, “book a”. The format is the same so it should be the same intent with different entities. One entity being flight and the other hotel.
In most chatbot design endeavors, the process starts with intents. But what are intents? Think of it like this…a large part of this thing we call the human experience is intent discovery. If a clerk or general assistant is behind a desk, and a customer walks up to them…the first action from the assistant is intent discovery. Trying to discover what the intention of the person is entering the store, bank, company etc.
We perform intent discovery dozens of times a day, without even thinking of it.
Google is the biggest intent discovery machine in the world!
The Google search engine can be considered as a single dialog-turn chatbot. The main aim of Google is to discover your intent, and then return relevant information based on the discovered intent. Even the way we search has inadvertently changed. We do not search with key words anymore, but in natural language.
Intents can be seen as purposes or goals expressed in a customer’s dialog input. By recognizing the intent expressed in a customer’s input, the assistant can select an applicable next action.
Current customer conversations can be instrumental in compiling a list of possible user intents. These customer conversations can be data via speech analytics (call recordings) or live agent chat conversations. Lastly, think of intents as the verb.
Entities can be seen as the nouns.
Entities are the information in the user input that is relevant to the user’s intentions.
Intents can be seen as verbs (the action a user wants to execute), entities represent nouns (for example; the city, the date, the time, the brand, the product.). Consider this, when the intent is to get a weather forecast, the relevant location and date entities are required before the application can return an accurate forecast.
Recognizing entities in the user’s input helps you to craft more useful, targeted responses. For example, You might have a
#buy_something intent. When a user makes a request that triggers the
#buy_something intent, the assistant’s response should reflect an understanding of what the something is that the customer wants to buy. You can add a product entity, and then use it to extract information from the user input about the product that the customer is interested in.
NLG is a software process where structured data is transformed into Natural Conversational Language for output to the user. In other words, structured data is presented in an unstructured manner to the user. Think of NLG is the inverse of NLU.
With NLU we are taking the unstructured conversational input from the user (natural language) and structuring it for our software process. With NLG, we are taking structured data from backend and state machines, and turning this into unstructured data. Conversational output in human language.
Commercial NLG is emerging and forward looking solution providers are looking at incorporating it into their solution.
Most often in conversational journeys, where the user make use of a voice smart assistant, the dialog is constituted by one or two dialog turns. Questions like, what is the weather or checking travel time.
With text based conversations, like chatbots, multiple dialog turns are involved hence management of the dialog becomes critical. For instance, if an user want to make a travel booking, or making a restaurant reservation the dialog will be longer.
Your chatbot typically has a domain, a specific area of concern, be it travel, banking, utilities etc.
Grounding is important to establish a shared understanding of the conversation scope.
You will see many chatbots conversations start with a number of dialogs initiated by the chatbot. The sole purpose of these messages is to ground the conversation going forwards.
Secondly, the initiative can sit with the user or with the system; system-directed initiative. In human conversations the initiative is exchanged between the two parties in a natural way.
Ideally the initiative sits with the user, and once the intent is discovered, the system-directed initiative takes over to fulfill the intent.
If the initiative is not managed, the flow of dialog can end up being brittle. Where the user struggles to inject intent and further the dialog. Or even worse, the chatbot drops out of the dialog.
Digression is a common and natural part of most conversations. The speaker, introduces a topic, subsequently the speaker can introduce a story that seems to be unrelated. And then return to the original topic.
Digression can also be explained in the following way… when an user is in the middle of a dialog, also referred to customer journey, Topic or user story.
And, it is designed to achieve a single goal, but the user decides to abruptly switch the topic to initiate a dialog flow that is designed to address a different goal.
Hence the user wants to jump midstream from one journey or story to another. This is usually not possible within a Chatbot, and once an user has committed to a journey or topic, they have to see it through. Normally the dialog does not support this ability for a user to change subjects.
Often an attempt to digress by the user ends in an “I am sorry” from the chatbot and breaks the current journey.
Hence the chatbot framework you are using, should allow for this, to pop out and back into a conversation.
Ambiguity is when we hear something which is said, which is open for more than one interpretation. Instead of just going off on a tangent which is not intended by the utterance, I perform the act of disambiguation; by asking a follow-up question. This is simply put, removing ambiguity from a statement or dialog.
Ambiguity makes sentences confusing. For example, “I saw my friend John with binoculars”. This this mean John was carrying a pair of binoculars? Or, I could only see John by using a pair of binoculars?
Hence, I need to perform disambiguation, and ask for clarification. A chatbot encounters the same issue, where the user’s utterance is ambiguous and instead of the chatbot going off on one assumed intent, it could ask the user to clarify their input. The chatbot can present a few options based on a certain context; this can be used by the user to select and confirm the most appropriate option.
Just to illustrate how effective we as humans are to disambiguate and detect subtle nuances, have a look at the following two sentences:
- A drop of water on my mobile phone.
- I drop my mobile phone in the water.
These two sentences have vastly different meanings, and compared to each other there is no real ambiguity, but for a conversational interface this will be hard to detect and separate.
Looking at fine-tuning, it is clear that GPT-3 is not ready for this level of configuration, and when a low-code approach is implemented, it should be an extension of a more complex environment. In order to allow scaling into that environment.