Getting Started With NVIDIA Jarvis For Conversational AI Services

Getting Started With NVIDIA Jarvis For Conversational AI Services

NVIDIA Jarvis Is An Application Framework For Conversational AI


I was privileged to be selected for early access to Jarvis 1.0 Beta.

Sequence of events we will follow to get to a working prototype.

And today NVIDIA released Jarvis, which is a is described as an application framework for Multimodal Conversational AI.

The focus is on low latency, less than 300 milliseconds, and high performance demands.

It is a high performance conversational AI solution incorporating speech and visual cues; often referred to as face-speed. Face-speed includes gaze detection, lip activity etc.

The multimodal aspect of Jarvis is best understood in the context of where NVIDIA wants to take Jarvis in terms of functionality.

This includes:

  • ASR (Automatic Speech Recognition)
  • STT (Speech To Text)
  • NLU (Natural Language Understanding)
  • Gesture Recognition
  • Lip Activity Detection
  • Object Detection
  • Gaze Detection
  • Sentiment Detection

Again, what is exciting about this collection of functionality, is that Jarvis is poised to become a true Conversational Agent.

The NVIDIA Jarvis Demo Weather Bot

Day to day, as humans we communicate not only in voice, but by detecting the gaze of the speaker, lip activity etc.

Another key focus are of Jarvis is transfer learning.

There is significant cost saving when it comes to taking the advanced base models of Jarvis and repurposing them for specific uses. The functionality which is currently available in Jarvis 1.0 Beta includes:

  • ASR,
  • STT and
  • NLU.

Positives & Considerations

The positives are overwhelming…

  • Implementations can be cloud, or local/edge.
  • Jarvis speaks to mission critical, industrial strength cognitive services & Conversational AI.
  • A new framework for high-performance ASR, STT and NLU.
  • Developers have access to transfer learning and the leveraging the investment made by NVIDIA.
  • The NVIDIA GPU environment addresses mission critical requirements, where latency can be negated.
  • Clear roadmap for Jarvis in terms of the near future in imminent features.
  • Jarvis addresses requirements for ambient ubiquitous interfaces.


  • Access, development and deployment seem daunting and the framework appears complicated. In this article I want to debunk access apprehensions. However, production deployment will most certainty be complex.
  • Most probably for a production environment specific hardware considerations will be paramount; especially where cloud/connectivity latency cannot be tolerated.

As per NVIDIA:

Developers at enterprises can easily fine-tune state-of-art-models on their data to achieve a deeper understanding of their specific context and optimize for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7x higher throughput on GPUs compared with CPUs.

The Jarvis framework includes pre-trained conversational AI models, tools in the NVIDIA AI Toolkit, and optimized end-to-end services for speech, vision, and natural language understanding (NLU) tasks.

Getting Started

As mentioned, NVIDIA Jarvis is well suited for cloud or edge computing.

Edge computing is computing on localized servers and devices to facilitate speed and negate latency. Instead of relying entirely on cloud computing providers edge computing first processes data initially on a locally.

It is easy to be overwhelmed when getting started with something like Jarvis.

This article is not a tutorial, but rather a guide on how to:

  • Start as small and simple as possible.
  • Become familiar with the environment, to some extend at least.
  • And spiral your prototype outwards with measured iterations from this initial prototype with increased functionality and complexity.

Graphics Processing Unit

A requirement for experimenting and building with Jarvis is access to a GPU. And specifically in the case of Jarvis, NVIDIA GPU based on the Turing or Volta architecture.

This is the one big impediment to experimentation…in this story I am looking at one of the cost effective options you can make use of to access Jarvis and start building amazing Conversational AI experiences.

The following are requirements for a successful Jarvis install:

  • Access to NVIDIA GPU based on the Turing or Volta architecture.
  • Access and are logged into NVIDIA GPU Cloud (NGC)

Hardware Requirements

Let’s solve the GPU access problem first. I opted to make use of a NVIDIA Deep Learning AMI (Amazon Machine Image). This is available on AWS EC2 and can be created in a few minutes.

Choosing the NVIDIA Deep Learning AMI

Once the EC2 (elastic cloud computing) instance is running, be sure the stop it when not in use. The charge is per hour while the instance is running, for prototyping and experimenting there is no need to run it 24/7.

It is also worthwhile to compare the cost of different regions; I found the cost differs significantly from one region to another.

Adding Disk Space to the EC2 Instance.

As your installation runs (which we will get to later), you will find the standard 32 GB storage does not suffice. I increased it to 256 GB. Storage can easily be increased via the EC2 portal on AWS; as seen in the image above.

Accessing Your Hardware

Once your EC2 Ubuntu instance is up and running, you obviously need to connect to it. The easiest way is via PuTTY. Install putty on your machine…

PuTTY launch screen.

When creating the EC2 instance, you are presented with a option to download a PEM key. Download this certificate file and save it on somewhere on your machine.

You will need it to create a private key making use of the PuTTY key generator.

Once the private key is generated (*.ppk), you need to click within PuTTY on SSH and Auth, and select this file.

This is an effective and lightweight way to connect to your Ubuntu machine. At this stage your AWS machine is up and running and you have access to it via the command line.

PuTTY Startup screen after login.

Next, let’s take a look at the software requirements…

Software Requirements

Now that we have our hardware up and running with access, we need to start installing Jarvis, and launch some of the test and demo application.

Your staring point is to access the NVIDIA NGC website.NVIDIA

NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing.

Create a user profile on NGC, this is free and quick to perform. After you have created your profile, be sure to check your email for the confirmation email and click confirm.

Generate an API Key within NVIDIA NGC and save it.

Be sure to save the API key as seen in the image above; we will use this to authentication on our AMI.

Obviously this process can be performed on your machine, at this stage we do not have any GUI or desktop access to our virtual machine.

Installing NGC On The AMI

Back on your virtual machine on AWS, from the PuTTY application command line, you can execute all the installs and actions.

Before installing Jarvis, you need to install the NGC command line tool (NGC CLI)with these commands:

wget -O && unzip -o && chmod u+x ngcmd5sum -c ngc.md5echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profilengc config set

During this installation process, you will be prompted for an API key, which you accessed while registering on NGC.

The NGC CLI installation procedure for Ubuntu.

Deploying Jarvis On The AMI

Deploying Jarvis takes a while, it is here where the installation might halt due to disc space requirements.

The option shown below is the Quick Start scripts approach to set up a local workstation and deploy the Jarvis services using Docker.

Download script.

ngc registry resource download-version nvidia/jarvis/jarvis_quickstart:1.0.0-b.1

Initialize and start Jarvis. The initialization step downloads and prepares Docker images and models. This step takes quite a while.

cd jarvis_quickstart_v1.0.0-b.1

Start a Jupyter Notebook instance and access it from your local machine.

jupyter notebook --ip= --allow-root --notebook-dir=/work/notebooks

From inside the client container, try the different services using the provided Jupyter notebooks by running this command.

Copy the token to use in the browser.

SSH Tunneling

In order to access the Jupyter Notebook on your machine…you will need to create a SSH tunnel to the AMI. This sounds more daunting than what it is.

The PuTTY option on the left to setup a SSH tunnel between your workstation and the AMI.

The PuTTY utility makes tunneling easy to setup. Once you have clicked the Open button to connect to the server via SSH and tunnel the desired ports.

Navigate to http://localhost:8000 (or whatever port you chose) in a web browser on your local machine to connect to Jupyter Notebook running on the AMI server.

To continue past this point, copy and paste the token from your putty session from where you launched the notebook instance.

To continue, enter the token which was presented from the last command sent.

This view should be familiar to you, and and opening the folders take you into a good presentation of how you might go about interacting with the Jarvis services.

Demo applications which can be run within Jupyter Notebook.


The services available now via JARVIS are:

  • Speech recognition trained on 7000 hours of speech data with stream or batch mode.
  • Speech synthesis available in batch and streaming mode.
  • NLU API’s with a host of services.

The advent of Jarvis will surely be a jolt to the current marketplace, especially with imbedded conversational AI solutions. The freedom of installation and the open architecture will stand NVIDIA in good stead. As noted, production architecture and deployment will demand careful consideration.