A Comparison Of Nine Chatbot Environments

A Comparison Of Nine Chatbot Environments

Discover Which Chatbot Framework Might Work Best For Your Organization

Introduction

This research based on a ten point comparison matrix, here I look at the strengths, weaknesses and possible growth areas of each solution…

I have built prototypes with most of the commercial cloud and opensource Conversational UI & AI platforms currently available. Obviously there will be important aspects I will miss; any feedback in this regards will be much appreciated.

There are also a host of other design, development and design tools available; the idea here is to focus on the larger commercial and self-contained services.

General Trends

For starters, there are six general chatbots trends emerging…

1️⃣ There has been growing activity in voice interfaces, particularly access via a phone call, and not necessarily a dedicated voice assistant device. IBM Watson Voice Agent was launched 2018, but from March 2021 it will be deprecated and fully integrated into Watson Assistant as the newly released phone integration. Also, Google DialogFlow CX and NVIDIA Jarvis were launched.

2️⃣ Deprecating of intents. This is also referred to as end-to-end learning. Intent deprecation introduces more flexibility in terms of user inputs and matching those inputs to a dialog node.

3️⃣ Intents and Entities continue to merge and contextual annotation of entities within the intent or utterance is becoming commonplace & very necessary. Compound entities are also becoming more important.

4️⃣ Data structures are introduced to Entities... This trend is visible with RasaAlexa Conversations tool and especially Microsoft LUISRasa calls it Entities Roles & Groups. AWS calls it Slots with Properties. And Microsoft LUIS, ML entities which can be decomposed.

5️⃣ Edge installations are becoming more important…NVIDIA Jarvis and Rasa come to mind for install anywhere.

6️⃣ Deprecating of the State Machine is inevitable, Rasa is leading the charge here. IBM is introducing automation to their Dialog Management system with customer effort scores and auto disambiguation menus. Watson Actions need to be mentioned.

Overview Of Development Environment

Environments are generally very similar in their approach to tools available for crafting a conversational interface.

Considering what’s available, chatbot development environments can still be segmented into 4 distinct groups.

These being:

  • Leading Commercial Cloud Offerings
  • NLU / NLP Tools (mostly opensource)
  • The Avant-Garde & Edge
  • The Use-the-Cloud-You’re-In

Leading Commercial Cloud Offering

The leading commercial cloud environments attract customers and users to them purely for their natural language processing prowess and presence, ease of use without installation and environment management.

Microsoft Bot Framework Composer

Among these I count IBM Watson Assistant, Microsoft Bot Framework / Composer / LUIS / Virtual Agents, Google Dialog Flow etc.

Established companies gravitate to these environments, at significant cost of course. These are seen as a safe bet, to meet their Conversational AI requirements.

They are seen as chatbot tools providers in and of their-self.

Scaling of any enterprise solution will not be an issue and continuous development and augmentation of the tools are a given. Resources abound with technical material, tutorials and more.

NLU /NLP Tools

There are also (some opensource) tools like Hugging FacespaCy, Apache OpenNLPRASA NLU and others which can be used to to process natural language in your environment.

Some organizations are creating their own chatbot framework making use of these tools.

Industrial-Strength Natural Language Processing

This is the harder route and is more time consuming, but if you have an existing environment, augmenting it with natural language processing capability, making use of these tools is a viable option.

It is truly astonishing the power of most of these opensource tools. And with the documentation available, it can serve as a “no software cost” point of departure for a first foray into natural language processing. It needs to be noted that in some cases enterprise costs exist.

The Avant-Garde

Here RASA really finds itself alone at the forefront. Recently from a speech access perspective, NVIDIA Jarvis arrived on the scene. Jarvis does have the two impediments; access to NVIDIA GPU based on their Turing or Volta architecture. And, secondly, the Jarvis dialog development and management feature is under development and has not been released yet.

RASA

Rasa follows a very unique path in terms of wanting to deprecate the state machine with its hard-coded dialog flows/trees. Together with their Conversation Driven Design (CDD) in the form of Rasa-X this is a very compelling option.

Their entities are contextually aware and they follow an approach where entities and intents really merge.

Compound entities are part of the offering. Entities can be segmented according to roles and groups.

Deprecation of intents have been announced and initiated.

Based on their expansion, funding, developer advocacy and events, this is a company to watch.

Hopefully the bigger players will emulate them. One of their strong points is developer advocacy and being the technology of choice for seed projects.

RASA has succeeded in creating a loyal developer following.

Use-the-Cloud-You’re-In

I cannot help but feel Amazon Lex with Oracle Digital Assistant (ODA) find themselves in this group. My sense is that someone will not easily opt for ODA or Lex if they do not have an existing attachment with Oracle or AWS from a cloud perspective.

Especially if the existing attachment is Oracle Cloud or Oracle Mobile Cloud Enterprise. Or with AWS via Echo & Alexa.

Oracle Digital Assistant

Another impediment with ODA is cost. Free access plays a huge role in developer adoption and the platform gaining that critical mass. We have seen this with IBM being very accessible in terms of their free tier with an abundance of functionality.

Microsoft has gone a long way in more accessible tools, especially with developer environments. RASA, even though a relatively late starter, has invested much time and effort in developer advocacy. Google Dialogflow is also popular and often a point of departure for companies exploring NLU and NLP.

ODA is not accessible enough and the existing impediments to experimenting and prototyping are not helping.

Cross-Industry Trends

These trends include:

Chatbot Growth In Capability
  • Intent deprecation.
  • Intent Disambiguation with auto learning menus.
  • The merging of intents and entities
  • Deprecation of the State Machine. Or at least, towards a more conversational like interface.
  • Complex entities; introducing entities with properties, groups, roles etc.

There are both horizontal and vertical growth with chatbot technology. From the diagram above it is clear where this growth is taking place:

Vertical — Technology

The Conversational UI is moving away from a structured preset menu and keyword driven interface. With movement towards unstructured natural language input and longer conversational input. Allowing users to disambiguation when two or three intents are close in score. Using this as a mechanism for autolearning.

Horizontal — User Experience

In this dimension the bot is transforming from a messaging bot to a truly conversational interface. Away from click navigation to eventual unrestricted compound natural language.

The Digital Employee

The end-game is where the digital employee, emerging from the chatbot environment, has evolved into areas of text and speech.

With contextual awareness on four levels:

  • Within the Current Conversation
  • From Previous Conversations
  • From CRM & Other Customer/User Related Data Sources
  • Across different mediums
The Growth Of A Digital Employee Within An Organization

The digital employee with grow across different mediums and modalities. Mastering languages with detection, translation, tone, sentiment and automatically categorizing conversations.

Mediums will include devices like Google Home, Amazon Echo, traditional IVR and more. As we as humans can converse in text or voice; similarly the digital employee will be able to converse in text or voice.

Chatbot Offerings Rating Matrix

In rating the nine chatbot solutions I looked at nine key points. Obviously NLU capability is key in terms of intents and entities. I was especially harsh on the extend to which entities can be applied in a compound fashion, annotated and detected contextually with decomposition.

Dialog and state development and management are also a key points; ease of development is important and to what extend collaboration is possible.

The other elements are self explanatory.

Key to Ratings

For different organizations, disparate element are important and will guide their thinking and eventually determine their judgement. For instance, even-though Lex does not feature in many respects, if a company is steeped in AWS for other service, Lex might be the right choice.

The same goes for Oracle, MindMeld etc.

Chatbot Rating Matrix

Graphic Call Flow / Dialog Development Tools

For larger organizations and bigger teams, collaboration is important. Ease of sharing portions of the dialog and co-creating is paramount. Hence organizations have a need for graphic development environments. Other teams prefer a more flexible native code approach.

Rating of GUI Form Call Flow Development & Editing

IBM Watson Assistant made a big addition with the launch of Actions.

Rasa with their tool called Rasa-X is so unique that it is hard to accurately categorize with the other environments. Rasa-X is graphic, it allows for editing and development, but is far more comprehensive.

The Jarvis dialog development and management feature is under development and has not been released yet.

NLU

Natural Language Understanding underpins the capabilities of the chatbot. Without entity detection and intent recognition all efforts to understand the user come to naught.

On some elements of a chatbot environment, improvisation can go a long way. This is not the case with NLU. LUIS has exceptional entity categorization and functionality. This includes decomposable entities. IBM Watson Assistant can also be counted as one of the leaders, with RASA & NVIDIA Jarvis.

Natural Language Understanding Capability

I also looked at the the integration of the NLU components into the other chatbot components. This is where Microsoft excels with their growing chatbot real-estate.

Scalability

Maturity of any framework is tested in an enterprise environment where implementations with diverse use-cases and ever expanding scale are present.

Scalability & Enterprise Readiness

Enterprise readiness is an evaluation criteria which does not enjoy the attention it deserves. Once vulnerabilities are detected, too much money and time have already been invested in the technology.

Overall Ratings

It is impossible to compare frameworks on a one-to-one basis, hence I created the five points of consideration as seen in the image below. It must be noted that one or more of these five elements, might be of higher importance to some organization than others. Hence that may draw them into a certain direction.

Again, if a company is already heavily invested in Oracle Cloud or AWS, then that will be a huge deciding factor for them. Overriding other considerations and easing the pain of other shortcomings.

Scoring Matrix Based On 5 Elements

Cost plays a big role, and this again speaks to the accessibility of environments like Cisco MindMeld and RASA; especially for initial prototyping.

Conclusion

This is a mere overview based on a matrix with points of assessment I personally deem as important.

And again, based in how important a particular point on the matrix is to you or your organization, will influence our judgement.

In the final analysis the software is to serve a purpose in your organization and current cloud landscape. The offering best suited for that purpose is the best choice for you.

Key Considerations In Designing A Conversational User Interface

Photo by Simon Williams on Unsplash

Key Considerations In Designing A Conversational User Interface

Start Here If You are Thinking Of Creating A Chatbot

How do we as humans have a conversation…

Introduction

The conversation you are having with a computer must not feel weird or awkward.

But also, it must not disrupt the patterns of human behavior which have evolved over time.

It is experience, rather than understanding, that influences behavior ~ Marshall McLuhan.

Instead, your conversational interface must adapt to the way of communication we all use and know the best.

We find conversation in general very intuitive and frictionless; hence conversational interfaces must follow suit.

Be Mindful Of Technical Impediments

In most respects Conversational AI and existing software frameworks are inferior to what we as humans are capable of. These Technical limitations should be catered for during your design and build phase. For instance, human conversations do not come to an abrupt end and then terminated, due to a unrecoverable system or dialog error.

Some Development Frameworks Can Accommodate Digression. Others Not

In reality, we as humans don’t abruptly reply with “I cannot help you with that” and dismiss the conversation. Or…at least we aught not.

Hence, your conversational interface should neither.

As humans we try and explore related topics or ideas during a discourse, in an attempt to detect intent and a common ground of sorts. Your software should try and emulate this as far as possible.

Play To your Technical Strengths

The inverse is also true, there are specific instances where computers exceed our human cognitive capabilities.

Sourced From Google: Different Modalities Present Different Strengths And Affordances

If I look at our children interact with Google Home and Alexa:

  • Conversational User Interfaces do not get annoyed and irritated by repetitive and simple questions.
  • They are not offended by receiving commands and requests all the time.
  • The conversation from the device or interface does not have all the filler sounds of “uhm”, and “aaaah” etc.
  • Information is imparted quickly and fairly accurate.

If you streamline the script of the Conversational Interface, you will find many opportunities to avoid user annoyance.

Turn-Taking

When we take turns to speak, also referred to as dialog turns, interrupting each other is avoided and the conversation is generally synchronized. This is our way as humans to manage the state of the conversation. As humans we do this intuitively and effortlessly.

Amazon Echo Alexa: The Light Acts As A Conversational Cue For The User

Google, in their Voice User Interface design principles, describe it as follows:

Turn-taking is about who “has the mic”: taking the mic, holding the mic, and handing it over to another speaker. To manage this complex process, we rely on a rich inventory of cues embedded in sentence structure, intonation, eye gaze, and body language.


Unique Voices Facilitates Creation Of A Persona

Your Conversational Interface will not have access to all these rich human nuances and cues, but there are elements which can be employed. For instance, silence from the user usually indicates a readiness to see the dialog turn.

Within the chatbot or voicebot script, you can use syntax and/or tone to signal to the user the interface is ready to receive input. Let your interface ask a question; this is the clearest way to signal dialog turn in a natural way.

Why A Persona?

You must see a persona as a design tool, and this tool assists in the writing of a conversation. Prior to start writing the dialog, you need to have a fairly complete understanding of who is communicating to the user.

Different Languages For A Locale Independent Conversational Design

What constitutes a persona? A few elements are used, like tone, script, personality and you should know what your personal will do or say in any particular conversational situation.

Weather you like it or not, users are going to project a persona with regards to your interface.

Context

Advances in automatic speech recognition (ASR) means that we almost always know exactly what users said. ASR can detect spoken word better now than what we as humans can. Speech recognition is not the challenge. The challenge is understanding, extracting meaning and intent and conversational entities.

In isolation user utterances are hard to understand, in context, it becomes easier. We as humans, when struggling to understand someone’s intent, would say: “give me some more context” or, “but in what context”.

Follow-up Intents

Your conversational interface needs to keep track of context in order to understand follow-up intents.

Follow-Up Mode Avails User Freedom

Unless the user changes the subject, we can assume that the thread of conversation continues. This allows for follow-up intents to be detected with greater easy in the customer conversation.

One of the harder things in writing a conversational interface is making provision for digression.

This is when the user moves from one context or conversational thread to another. Read more about regression here.

Variation

A lack of variation makes the interaction feel monotonous or robotic. It might takes some programmatical effort to introduce variation, but it is important.

Many development frameworks have functionality which allows you to easily randomize your bot’s output. Or at least have a sequence of utterances which breaks any monotony.

Users Are Generally Informative

Users are usually very cooperative. And this being a conversational interface, user’s are bound to supply more information than what you might expect. This will necessitate you to handle quite verbose user dialogs. Especially at the start of the conversation.

You can mitigate this risk by adding an initial high-level first NLP pass. You can read more about this here.

From here you will want to detect intent, there might be multiple intents you need to tackle.

And then there are the entities; for instance, in the case of a travelbot, entities will be cities of departure and arrival, dates, times, airlines etc.

Advanced entity detection is a must. Read more about this here.

Keep The Dialog On Track

Your conversational UI will be domain specific, hence you will need to manage the dialog in a subtle way to ensure users understand the purpose and aim of the interface. You might not always be able to handle all cooperative responses from a user. But you should always be able to use lightweight and conversational exception handling to get the dialog back on track in a way that doesn’t draw attention to the error.

Telephone Based Conversational Interface ~ VoiceBot

Move The Conversation Forward

We all had conversation with bots which are sticky, repetitive, rude or plain unhelpful. You expect your user to be cooperative and informative, and your bot must be the same. Always sharing a dialog which is intended and helpful in moving the conversation forward and to conclusion.

Stick To Your Domain

In any conversation, saying too little or too much are equally uncooperative. You must try and facilitate your bot’s comprehension by trying, via the script, to keep the user’s response brief and concise. With optimal relevance to the current context.

This is one of the best Voice User Interface Design Talks Available…