A First Look At The NVIDIA Jarvis Demo Applications
And What Is Available In Jupyter Notebooks
This story is a brief overview of what is available in the NVIDIA Jarvis demo applications and what can be learned from them.
I was privileged to be selected for early access to Jarvis 1.0 Beta.
And today NVIDIA released Jarvis, which is a is described as an application framework for Multimodal Conversational AI.
The focus is on low latency, less than 300 milliseconds, and high performance demands.
The multimodal aspect of Jarvis is best understood in the context of where NVIDIA wants to take Jarvis in terms of functionality.
- ASR (Automatic Speech Recognition)
- STT (Speech To Text)
- NLU (Natural Language Understanding)
- Gesture Recognition
- Lip Activity Detection
- Object Detection
- Gaze Detection
- Sentiment Detection
What is exciting about this collection of functionality, is that Jarvis is poised to become a true Conversational Agent. We communicate as humans not only in voice, but by detecting the gaze of the speaker, lip activity etc.
Another key focus are of Jarvis is transfer learning. There is significant cost saving when it comes to taking the advanced base models of Jarvis and repurposing them for specific uses.
The functionality which is currently available in Jarvis 1.0 Beta includes ASR, STT and NLU.
Virtual Voice Assistant
This Virtual Assistant sample application demonstrates how to use Jarvis AI Services, specifically ASR, NLP, and TTS, to build a simple but complete conversational AI application.
It demonstrates receiving input via speech from the user, interpreting the query via intent recognition and slot filling approach, compiling a response, and speaking this back to the user in a natural voice.
Read more about the installation process here.
To install and run the Jarvis Voicebot demo, start your Jarvis services:
Download the samples image from NGC.
docker pull nvcr.io/nvidia/jarvis/jarvis-speech-client:1.0.0-b.1-samples
Run the service within a Docker container.
docker run -it --rm -p 8009:8009 nvcr.io/nvidia/jarvis/jarvis-speech-client:1.0.0-b.1-samples /bin/bash
Within this directory…
config.py with the right Jarvis IP, hosting port and your weatherstack API access key (from https://weatherstack.com/). Then, start the server.
Getting your weatherstack API, on the free tier…
Getting your API Access Key…
Start the service…
Below you can see the NVIDIA Jarvis weather bot accessible on the url https://127.0.0.1:8009/jarvisWeather/. Again, you will have to setup to SSH tunneling from your virtual machine. Read about that here.
To take a closer look at example code for ASR, TTS and NLU take a look at the Jupyter Notebook examples…
Jupyter Notebook Examples
The functionality on display via Jupyter notebook:
- Offline ASR Example
- Core NLP Service Examples
- TTS Service Example
- Jarvis NLP Service Examples
In short, here are a few extracts from the example applications.
The command to run to start the Jarvis service Jupyter Notebook.
bash jarvis_start.shbash jarvis_start_client.shjupyter notebook --ip=0.0.0.0 --allow-root --notebook-dir=/work/notebooks
Add Punctuation To Text
Adding punctuation is a very useful feature, especially in the following use cases:
- When user speech is generated when interacting with a voicebot and presented on a display.
- Archiving user input speech in text formant.
- When text is generated, this function can also be helpful.
add punctuation to this sentence here please ok
do you have any red nvidia shirts
i need one cpu four gpus and lots of memory
for my new computer it's going to be very cool
Add punctuation to this sentence here, please, Ok?
Do you have any red Nvidia shirts?
I need one cpu, four gpus and lots of memory for my new computer. It's going to be very cool.
In NLP, a named entity is a real-world object, such as people, places, companies, products etc.
These named entities can be abstract or have a physical existence. Below are examples of named entities being detected by Jarvis NLU.
Jensen Huang is the CEO of NVIDIA Corporation, located in Santa Clara, California.
jensen huang (PER)
nvidia corporation (ORG)
santa clara (LOC)
Jarvis NLP has a default text classification feature.
Intents & Entities With Input Domain
This is a good example of how a NLU API can be implemented to extract intents and entities; or as Jarvis revers to it, slots. The input domain is defined upfront.
Intents & Entities
Here the domain is not provided, the intent and slot is shown with the score.
Is it going to rain tomorrow?
Weather Intent Batch Queries
What I particularly like about this function is the intent of weather in this case, and the sub-intents of cloudy, rainfall or humidity.
This can also be useful for real-time disambiguation in conversations.
"Is it currently cloudy in Tokyo?",
"What is the annual rainfall in Pune?",
"What is the humidity going to be tomorrow?
[weather.cloudy] Is it currently cloudy in Tokyo?
[weather.rainfall] What is the annual rainfall in Pune?
[weather.humidity] What is the humidity going to be tomorrow?
The NVIDIA Jarvis team made sure the documentation is thorough and comprehensive. The demo and quick start applications is of great help to get started. Especially in an environment which is complex and can be very tricky to get started and prototype.
The services available now via JARVIS are:
- Speech recognition trained on 7000 hours of speech data with stream or batch mode.
- Speech synthesis available in batch and streaming mode.
- NLU API’s with a host of services.
The advent of Jarvis will surely be a jolt to the current marketplace, especially with imbedded conversational AI solutions. The freedom on installation of open architecture will stand NVIDIA in good stead. Deployment and production architecture will demand careful consideration.