Providing Expert Guidance In Building Conversational User Interfaces
Riva brings deep learning to the masses. The multimodal aspect of Riva is best understood in the context of where NVIDIA wants to take Riva in terms of functionality.
Within NVIDIA GPU Cloud, also known as NGC, there is a catalog of different implementations. Each of these catalog items, hold step-by-step instructions and scripts for creating deep learning models, with sample performance and accuracy metrics to compare results to.
These notebooks are provide guidance on creating models for language translation, text-to-speech, text classification and more.
What is exciting about this collection of functionality, is that Riva is poised to become a true Conversational Agent. We communicate as humans not only in voice, but by detecting the gaze of the speaker, lip activity etc.
Another key focus are of Riva is transfer learning. There is significant cost saving when it comes to taking the advanced base models of Riva and repurposing them for specific uses.
The functionality which is currently available in Riva includes ASR, STT and NLU. Edge installation is a huge benefit.
Setting Up Your Environment
Access to NVIDA software, Jupyter Notebooks and demo applications are easy and resources are abundant. The only impediment is access to a NVIDIA GPU based on the Turing or Volta architecture.
In this article I look at one of the more cost effective ways to access such infrastructure via an AWS EC2 instance.
NVIDA Riva Notebooks
To get you started, NVIDIA Riva has quite a few Jupyter Notebook examples available which you can use to step through. These comprise of different speech implementations, including speech-to-text, text-to-speech, named entities, intent & slot detection and more.
When clicking on each the catalog items, you will see a list of commands to execute in order to launch the note book. These commands are fairly accurate and execution is not a problem.
When NGC commands are used, the command line prompts for an API key, which must be gleaned from the NVIDIDA NGC Setup page.
In this article I explain the installation, SSH and tunneling process in detail. A SSH tunnel on port 8888 is required to launch the Jupyter Notebook in a browser on your local machine.
The notebook takes you through the process of defining directories, training models and exporting to a
.riva file. And subsequent deployment workflow to consume the
.riva file and deploy it to Riva.
My first thought was that getting past the point of an own installation and running the demos would be very daunting…seeing this is a NVIDA and deep learning environment.
But on the contrary, getting to grips with Riva on a demo application level was straight forward when following the documentation. After running this basic demo voicebot, what are the next steps?
The voicebot where Rasa integration to Riva is performed is a step up in complexity and a logic next step. Also perusing the Jupyter Notebooks provide good examples on how to interact with API’s.