NVIDIA Jarvis Has Been Renamed To NVIDIA Riva
How To Get Started With NVIDIA Riva For Conversational AI Services
On 28 July 2021, NVIDIA Jarvis got rebranded to Riva. I always thought of the name Jarvis to be too generally used already. The good news is, the core technologies, performance and roadmap remain unchanged.
NVIDIA Riva is a GPU-accelerated SDK for developing multimodal conversational AI applications.
According to NVIDIA, the only changes from a user perspective:
- The name “Jarvis” has been replaced with “Riva” in APIs, NGC containers, and other developer resources.
- Older APIs and applications that use the term “Jarvis” will continue to work but these API’s will be deprecated in favor of the new API’s. Hence a migration strategy needs to be thought of.
- Performance achievements and optimizations remain unchanged with this change.
- A new version of Transfer Learning Toolkit that uses Riva in place of Jarvis APIs should be available soon.
NVIDIA Riva is an application framework for Multimodal Conversational AI.
The focus is on low latency, less than 300 milliseconds, and high performance demands.
It is a high performance conversational AI solution incorporating speech and visual cues; often referred to as face-speed. Face-speed includes gaze detection, lip activity etc.
The multimodal aspect of Riva is best understood in the context of where NVIDIA wants to take Riva in terms of functionality.
- ASR (Automatic Speech Recognition) / STT (Speech To Text)
- NLU (Natural Language Understanding)
- Gesture Recognition
- Lip Activity Detection
- Object Detection
- Gaze Detection
- Sentiment Detection
Again, what is exciting about this collection of functionality, is that Riva is poised to become a true Conversational Agent.
Day to day, as humans we communicate not only in voice, but by detecting the gaze of the speaker, lip activity etc.
Another key focus are of Riva is transfer learning.
There is significant cost saving when it comes to taking the advanced base models of Riva and repurposing them for specific uses. The functionality which is currently available in Riva 1.0 Beta includes:
- STT and
Positives & Considerations
The positives are overwhelming…
- Implementations can be cloud, or local/edge.
- Riva speaks to mission critical, industrial strength cognitive services & Conversational AI.
- A new framework for high-performance ASR, STT and NLU.
- Developers have access to transfer learning and the leveraging the investment made by NVIDIA.
- The NVIDIA GPU environment addresses mission critical requirements, where latency can be negated.
- Clear roadmap for Riva in terms of the near future and imminent features.
- Riva addresses requirements for ambient ubiquitous interfaces.
- Access, development and deployment seem daunting and the framework appears complicated. In this article I want to debunk access apprehensions. However, production deployment will most certainty be complex.
- Most probably for a production environment specific hardware considerations will be paramount; especially where cloud/connectivity latency cannot be tolerated.
As per NVIDIA:
Developers at enterprises can easily fine-tune state-of-art-models on their data to achieve a deeper understanding of their specific context and optimize for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7x higher throughput on GPUs compared with CPUs.
The Riva framework includes pre-trained conversational AI models, tools in the NVIDIA AI Toolkit, and optimized end-to-end services for speech, vision, and natural language understanding (NLU) tasks.
As mentioned, NVIDIA Riva is well suited for cloud or edge computing.
Edge computing is computing on localized servers and devices to facilitate speed and negate latency. Instead of relying entirely on cloud computing providers edge computing first processes data initially on a locally.
It is easy to be overwhelmed when getting started with something like Riva.
This article is not a tutorial, but rather a guide on how to:
- Start as small and simple as possible.
- Become familiar with the environment, to some extend at least.
- And spiral your prototype outwards with measured iterations from this initial prototype with increased functionality and complexity.
Graphics Processing Unit
A requirement for experimenting and building with Riva is access to a GPU. And specifically in the case of Riva, NVIDIA GPU based on the Turing or Volta architecture.
This is the one big impediment to experimentation…in this story I am looking at one of the cost effective options you can make use of to access Riva and start building amazing Conversational AI experiences.
The following are requirements for a successful Riva install:
- Access to NVIDIA GPU based on the Turing or Volta architecture.
- Access and are logged into NVIDIA GPU Cloud (NGC)
Let’s solve the GPU access problem first. I opted to make use of a NVIDIA Deep Learning AMI (Amazon Machine Image). This is available on AWS EC2 and can be created in a few minutes.
Once the EC2 (elastic cloud computing) instance is running, be sure the stop it when not in use. The charge is per hour while the instance is running, for prototyping and experimenting there is no need to run it 24/7.
It is also worthwhile to compare the cost of different regions; I found the cost differs significantly from one region to another.
As your installation runs (which we will get to later), you will find the standard 32 GB storage does not suffice. I increased it to 256 GB. Storage can easily be increased via the EC2 portal on AWS; as seen in the image above.
Accessing Your Hardware
Once your EC2 Ubuntu instance is up and running, you obviously need to connect to it. The easiest way is via PuTTY. Install putty on your machine…
When creating the EC2 instance, you are presented with a option to download a PEM key. Download this certificate file and save it on somewhere on your machine.
You will need it to create a private key making use of the PuTTY key generator.
Once the private key is generated (*.ppk), you need to click within PuTTY on SSH and Auth, and select this file.
This is an effective and lightweight way to connect to your Ubuntu machine. At this stage your AWS machine is up and running and you have access to it via the command line.
Next, let’s take a look at the software requirements…
Now that we have our hardware up and running with access, we need to start installing Riva, and launch some of the test and demo application.
Your staring point is to access the NVIDIA NGC website.
NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing.
Create a user profile on NGC, this is free and quick to perform. After you have created your profile, be sure to check your email for the confirmation email and click confirm.
Be sure to save the API key as seen in the image above; we will use this to authentication on our AMI.
Obviously this process can be performed on your machine, at this stage we do not have any GUI or desktop access to our virtual machine.
Installing NGC On The AMI
Back on your virtual machine on AWS, from the PuTTY application command line, you can execute all the installs and actions.
Before installing Riva , you need to install the NGC command line tool (NGC CLI)with these commands:
wget -O ngccli_cat_linux.zip https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip && unzip -o ngccli_cat_linux.zip && chmod u+x ngcmd5sum -c ngc.md5echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profilengc config set
During this installation process, you will be prompted for an API key, which you accessed while registering on NGC.
Deploying Riva On The AMI
Deploying Riva takes a while, it is here where the installation might halt due to disc space requirements.
The option shown below is the Quick Start scripts approach to set up a local workstation and deploy the Riva services using Docker.
ngc registry resource download-version nvidia/riva/riva_quickstart:1.4.0-beta
Initialize and start Riva. The initialization step downloads and prepares Docker images and models. This step takes quite a while.
Start a Jupyter Notebook instance and access it from your local machine.
jupyter notebook --ip=0.0.0.0 --allow-root --notebook-dir=/work/notebooks
From inside the client container, try the different services using the provided Jupyter notebooks by running this command.
In order to access the Jupyter Notebook on your machine…you will need to create a SSH tunnel to the AMI. This sounds more daunting than what it is.
The PuTTY utility makes tunneling easy to setup. Once you have clicked the Open button to connect to the server via SSH and tunnel the desired ports.
http://localhost:8000 (or whatever port you chose) in a web browser on your local machine to connect to Jupyter Notebook running on the AMI server.
To continue past this point, copy and paste the token from your putty session from where you launched the notebook instance.
This view should be familiar to you, and and opening the folders take you into a good presentation of how you might go about interacting with the Riva services.
The services available now via Riva are:
- Speech recognition trained on thousands of hours of speech data with stream or batch mode.
- Speech synthesis available in batch and streaming mode.
- NLU API’s with a host of services.
The advent of Riva will surely be a jolt to the current marketplace, especially with imbedded conversational AI solutions. The freedom of installation and the open architecture will stand NVIDIA in good stead. As noted, production architecture and deployment will demand careful consideration.