NVIDIA Riva Virtual Assistant With Rasa

Rasa & Riva Compliment Each Other Extremely Well

Introduction

In order to develop a virtual assistant with a speech interface, four key elements are required. The first being Speech Recognition, also referred to Automatic speech recognition or Speech-To-Text. Hence transcribing the user

A first configuration. In this configuration Rasa is utilized for dialog management and NLU.

speech into text. This is the input user touch point.

The second is the output to the user touch point. Being the conversion of text into speech. And preferably natural sounding speech. This is also referred to as Test-To-Speech or Speech Synthesis.

These two elements needs to have low latency, preferably less than 300 milliseconds. It also requires to be trained.

The remaining two elements are synonymous with text based conversational agents; dialog management and Natural Language Understanding. Rasa is the avant-garde when it comes to these two elements.

A second configuration utilizes the NLP of Riva and only relies on Rasa for Dialog Management.

The second configuration for the Riva & Rasa demo application is where the Natural Language Processing is performed by Riva.

Currently ASR, NLU and TTS models are available in NVIDIA Riva. Trained on thousands of hours of speech data.

On the roadmap of Riva are other cognitive elements like computer vision. The vision component includes lip activity, gaze detection, gesture detection and more.

Conversational AI Skills

I first heard about this from Fjord Design & Innovation where they referred to some of these elements as a phenomena called face speed.

Face Speed is the cues and hints we pick up from gestures, facial expressions and lip activity.

By incorporating these elements in their roadmap, Jarvis is poised to become a true conversational agent, taking cues from the speaker’s appearance.

What makes this collaboration between NVIDA and Rasa so compelling is that it is the combination of two technological environments who needs each other as much as they compliment each other.

This is an avenue to speech enable a Rasa digital assistant.

Environment Setup

In the Medium article I wrote on getting started with your NVIDIA Riva environment you will find a step-by-step guide to setup a Virtual Machine Instance using AWS EC2. Cost is always a consideration if you are just experimenting, especially if you are charged in a weaker currency.

The EC2 instance can also be started and stopped in order to save on costs.

SSH Tunnels work wonders in accessing URL’s on the VM, latency is a problem when testing the conversational agent in voice.

Why Rasa?

Rasa is a complete chatbot framework solution for any implementation where the user input is not voice. Hence text input, which includes conversational components like buttons, links etc.

It needs to be noted that from a Conversational AI perspective Rasa has all the features and elements required.

Rasa Open Source architecture

Elements contributing to Rasa being a good option for the NVIDIA Riva environment:

  • Free to download and use.
  • Contained and complete chatbot framework.
  • Open architecture for integration.
  • Install anywhere.

The addition Rasa requires to be speech enabled are:

  • Automatic Speech Recognition (aka Speech-To-Text)
  • Speech Synthesis (aka Text-To-Speech)

I will be remiss not to mention that the NLP capability of Riva is significant and hence the two architectural approaches mentioned at the start. It need not be a choice between the NLU/P of Riva or Rasa. The two can be used in conjunction and complimenting each-other.

A Voicebot sequence of events with the Riva and Rasa functionality maximized.

The basic sequence of events her shows how the power of Riva NLP and Rasa’s NLU capability can be leveraged, especially for longer input.

One last thought on why Rasa, Rasa is currently the only industrial strength conversational framework which employs machine learning for their dialog management; what is currently in most cases a state machine on other systems.

With Rasa’s vision of deprecating intent classification and also the dialog (or bot script), the flexibility matches the vision of Riva.

Running The Demo

To run the demo and also validate your installation, follow the step-by-step instructions found here. There are two modes to run the conversational agent, one is with Rasa NLU, and the other with Riva NLP.

The conversational agent is served onhttps://0.0.0.0:5555/rivaWeatherand does look like a slimmed down version of what you see in the official demo videos.

The demo can handle small talk to some degree.

Weather API Integration example with NVIDIA Riva chat demo using Riva ASR, STT & Rasa NLU, State Management

To run the weather bot, be sure to add the Weather API key to your Riva configuration. I had trouble with the Rasa Weather action extracting the key, so I hard coded it in the action.

(rasa) root@156ggcbd3bg9:/workspace/samples/rasa-chatbot/rasa-weatherbot/actions# vim weather.py

You will also need to setup the network configuration for the demo to work. There are two locations in the code base that have to be configured for inter-service communication:

rasa-chatbot/rasa-weatherbot/endpoints.yml

and…

rasa-chatbot/config.py

Accessing the conversational agent via a browser on my machine is enabled with a SSH tunnel setup to port 5555 on the AMI.

Conclusion

NVIDIA Riva has an ambitious roadmap to become an imbedded voice assistant with speech and visual capabilities. Justice will not be done to the abilities of Riva via a medium like a phone call. But rather imbedded in an application on a phone, smart devices or smart home with audio and vision.

As mentioned, the Riva NLP callabilities are astute and the state management can be facilitated within Riva. Integration to existing text base digital assistants will stand Riva in good stead.