Whisperer is a Unity VR game experience using the Voice SDK for interactions. This repository includes the complete buildable Unity project.
- Unity 2021.3.9f1
- Windows or Mac
- Meta Quest 2 (standalone) or Rift (PCVR)
Ensure you have Git LFS installed:
git lfs install
Then, clone this repo using the "Code" button above, or with:
git clone [email protected]:wit-ai/voicesdk_samples_whisperer.git
All of the project files can be found in Assets/Whisperer
. This folder includes all scripts and assets to run the experience, excluding those that are part of the Interaction SDK. The project includes v50 of the Voice SDK.
To run Whisperer in-editor, after configuring Wit.ai (see below), open the project in Unity 2021.3.9f1. Then open the Assets/Scenes/Loader
scene and press play.
Using Whisperer reqiures a Wit.ai account.
-
Once logged in, on wit.ai/apps, click New App and import the zipped app backup included in this repo.
-
Then find the
Server Access
andClient Acess Tokens
your app setup underManagment > Settings
. Enter these values in the appropriate fields on the Wit.ai App Config asset in the unity project.
For more information on setting up an App, check out the Wit.ai Quickstart.
Note: Wit.ai will need to train its model before it's ready to use. On Wit.ai, the current status of the training is indicated by the dot next to the app name.
Whisperer's introduction will help guide you, through narrative instruction and visual prompts, how to interact with objects using your hands and voice.
-
When you raise your hands in front of you, as if to speak through them, the microphone will be automatically activated and the voice SDK will listen to you. You can then speak to various objects, telling them to move, open, turn on, etc.
-
The inset menu button on the left controller (
☰
) will open the in-game panel displaying an instruction card and demonstration video, as well as buttons to restart the current level or return the starting scene.
-
To move the camera around, use your mouse.
-
To select an object, look towards it and hold down space key.
-
To jump from one level to another, use 1, 2, 3, and 4 keys on the keyboard.
The Loader
scene contains two game objects that persist throughout the entire experience: Player Rig
and Management
.
The Player Rig
is the XR Origin, and contains the necessary components for Unity's XR Interaction Toolkit, as well as the SpeakGestureWatcher.cs
component and any UI canvases.
Attached to the Management game object are AppVoiceExperience.cs
and LevelLoader.cs
. LevelLoader additively loads the necessary Unity scenes for each level, unloading them when a level is completed.
Each level consists of two scenes additively loaded by the levelLoader -- a base scene containing all static geometry and non-interactable objects, and a level scene containing all scene logic, animated objects, and listenable objects for that particular level.
Every level contains a Level Manager prefab and a Listenables prefab. The Level Manager is responsible for that scene's logic and instantiates a VoiceUI prefab for any objects derived from Listenable.cs at Start().
AppVoiceExperience is the core component of the Voice SDK. It holds the reference to the Wit.ai App Config asset, sends data to Wit.ai for processing, and responds with the appropriate Unity Events. When an object derived from Listenable.cs is selected and deselected by the player, it subscribes and unsubscribes to the events on AppVoiceExperience.
Whisperer utilizes several different methods of handling responses from Wit.ai. Depending on the type of interaction (action
) we're trying to resolve, we use either intents
, entities
, or manual parsing of the text transcription.
To determine when to activate and deactivate Wit.ai, the SpeakGestureWatcher.cs
component checks the position of the tracked hand controllers and raycasts for objects that contain the Listenable.cs
class. If the player's hands are in position and an object is found, AppVoiceExperience.Activate()
is called. If at any time the player breaks the pose, Wit.ai is deactivated.
The AppVoiceExperience
class itself is initated in the LevelManager.cs
parent class which all subsequent levels inherit from.
If an object derived from Listenable.cs
is selected and the player says something, the Whisperer will wait for a response from Wit.ai, then read the WitResponseNode
to determine the action to be taken.
Example: If a [
ForceMovable.cs
](Assets/Whisperer/Scripts/Voice/Listenable Objects/ForceMovable.cs) is selected and the utterance "Move right a lot" is detected by Wit.ai, we read the intent and entities from theWitResponseNode
to determine the direction and strength of move force applied. Whisperer reads the returned intent (move
), direction entity (right
) and strength entity (strong
) and performs an appropriate action.
This code base uses Conduit framework from Voice SDK.
To use Conduit, simply annotate the callback method with the MatchIntent
attribute and annotate the assembly containing the callback method with the ConduitAssembly
attribute. When changes are made to the callback method, such as adding, removing, or changing it, Unity generates a new manifest file. Please note that Use Conduit
should be checked in your wit config asset file as documented here.
For example, in HeroPlant.cs the ForceMove
method takes two parameters: a ForceDirection
enum and a WitResponseNode
.
In this code ForceMove
is decorated with the MatchIntent
attribute, with possible intent values: move
, jump
, pull
, and push
. By using the Conduit framework, these callback methods can be automatically registered without the need for manual registration.
[MatchIntent("move")]
[MatchIntent("jump")]
[MatchIntent("pull")]
[MatchIntent("push")]
public override void ForceMove(ForceDirection direction, WitResponseNode node)
{
// method implementation
}
These intents are used to move objects in the scene. The move, pull, push, and jump intents can be used with strength
and direction
entities. For example, "Push away from me a little bit."
move
pull
push
jump
levitate
drop
The entity direction
determines the direction an object is moved when accompanied with the move, pull, push, or jump intents.
right
left
up
toward
(moves object toward the user)away
(moves object away from the user)wall
(moves object away from center of the room)across
(moves object toward the center of the room)
The entity move_strength
determies the strength of the force applied to an object when it's moved.
weak
normal
strong
Generic intents for interacting with objects such as the radio, drawers, water hose or treasure chest:
open
close
turn_off
turn_on
Intents for interacting with specific objects, include:
turn_off_radio
turn_on_radio
change_station
ask_bird_name
bird_song
The entity color
is used to identify when the user has communicated a color selection to the bird in level 3.
yellow
blue
red
The Oculus License applies to the SDK and supporting material. The MIT licence applies to the files and assets in the Assets/Whisperer folder. Otherwise, if an individual file does not indicate which license it is subject to, then the Oculus License applies.