Home

Everything Jarvis

📓 Wiki updated on April 20, 2023

Demo videos were recorded back in 2020, which are way outdated but will give a basic idea.

Jarvis uses specific keywords to trigger the respective function which performs a specific task. It is not 100% foolproof but most of the common errors are caught and responses are configured accordingly. If there are any unexpected exceptions please raise an issue.

Restrictions:

Jarvis is a heavy and processor intense package. To make sure the host machine doesn't suffer Jarvis' wrath, it runs on LIMITED mode if the machine is low on CPU cores. This will disable all background processes. Although it may limit Jarvis' ability to communicate offline, it will still be fully functional on voice commands (except to set alarms and reminders as they require functional background processes).

Currently, Jarvis can run only on Linux, macOS and Windows.

Tested on macOS High Sierra, Mojave, Catalina, Big Sur, Monterey, Ventura and Windows 10 and Ubuntu 22.0 LTS

Usage:

Jarvis works by automatically detecting the Operating System it is being run on.
Some key features require API keys, but they can be generated for free.

None of the features in Jarvis require paid subscriptions *

Stability

There are broad exception clauses implemented to prevent Jarvis from crashing.
To make sure Jarvis is always connected to the internet, it runs a connection checker in the background.
The connection checker uses a built-in OS-agnostic module to enable Wi-Fi and connect to a given SSID (stored as env vars).

Versioning

Jarvis' versioning follows the same pattern as python's. major, minor and micro

Major version change (eg: 1.* to 2.*) means there are major changes (requires an immediate update)
Minor version change (eg: 1.2 to 1.3) means the changes have considerable enhancements/feature updates.
Micro version change (eg: 1.2 to 1.2.4) means the changes have very little enhancements/feature updates.

Offline Communication with Jarvis:

Using Telegram:
1. A Telegram bot has to be created using BotFather
2. Token has to be added to the env var BOT_TOKEN
3. List of chat ids have to be added to the env var BOT_CHAT_IDS
4. List of bot usernames have to be added to the env var BOT_USERS
Using FastAPI:
1. Hosts the offline communicator on localhost.
2. Requires a port number added to the env var OFFLINE_PORT. Defaults to 4483
3. Requires a password for authentication added to the env var OFFLINE_PASS. Defaults to OfflineComm

Conditional API Endpoints:

There are some conditional endpoints (/surveillance and /investment) in Jarvis API.
The availability of these endpoints depend on the environment variables setup in the host machine.
Please refer to the Features section in the readme document under the topic API Features

OS Agnostic Voice Model

If the LIMITED mode is disabled, Jarvis will automatically try to launch speech-synthesis in a docker container.

If this launch fails or the SPEECH_SYNTHESIS_TIMEOUT is set to 0, this part will be skipped

To enable independent speech-synthesis

docker run \
    -it \
    -p 5002:5002 \
    -e "HOME=${HOME}" \
    -v "$HOME:${HOME}" \
    -v /usr/share/ca-certificates:/usr/share/ca-certificates \
    -v /etc/ssl/certs:/etc/ssl/certs \
    -w "${PWD}" \
    --user "$(id -u):$(id -g)" \
    thevickypedia/speech-synthesis

To test speech synthesis running locally

curl -X POST \
 -H "Content-Type: text/plain" \
 -d 'Welcome to the world of speech synthesis' \
 'http://localhost:5002/api/tts?voice=en-us_northern_english_male-glow_tts&vocoder=medium' \
 --output temp.wav && open temp.wav

💡 Speech Synthesis can run on a docker container for better voices but, response might be negligibly slower. If you don't have docker installed or simply don't want to use it, set the SPEECH_SYNTHESIS_TIMEOUT env var to 0. This is also done automatically if failed to launch a docker container upon startup.

Features:

Lock, unlock, honk & blink, remote start/stop and set the AC temperature in any JLR (Jaguar Land Rover) vehicles.
Open and close garage door using MyQ garage controller.
Control any LG Web OS or Roku Televisions.
Guard the surroundings using face detection and audio recognition. Sends an email and SMS alert in case of an intruder.
Send a text message to most US based mobile numbers.
Email any recipient using the contact name and email address mapping stored in contacts.yaml file.
Facial recognition and detection. Click here to read the setup instructions.
Automatically turn on Wi-Fi and connect to a set SSID. Also happens automatically anytime within 10 seconds of internet disconnect.
Scale up/down a vpn server (in any region) on demand using vpn-server module
Monitor stocks using the stock monitor endpoint.
Run cron scheduled jobs using regular cron expressions without the need of a crontab.
Run certain tasks in the background every few seconds/minutes/hours/days.
Set up a reminder at a said time and send a message to your phone and/or email at the given time.
- Reminders can also be setup to a different person if their contact details are present in the contacts.yaml file.
Set an alarm/timer at any desired time.
Wish you on events/festivals and birthday using an env var.
Increase or decrease master volume of your machine via voice commands.
Locate, ring and enable lost mode on any of the user's Apple devices.
Control smart lights (that uses MagicHome application) in the same IP range.
Get meetings information using ICS parser from a shared calendar with an ics url.
Read Outlook/Calendar and inform about meetings in the next 12 hours. macOS only
Tell public and private IP address of the running device.
Swap voices on demand.
Mute on demand.
Display realtime microphone usage on a graph.
Adjust Screen Brightness.
Tell system vitals including boot time, fan speed*, CPU and GPU temperature*. macOS only
Restart the running device and suggest a restart in case of high boot time.
Scan local ip range to get smart devices connected which acts as IP feeder for smart lights and TV.
Tell random facts.
Heads or Tails.
Take notes and saves it to a notes.txt file.
Tell a joke.
Tell the number of repositories on your GitHub and clone a particular one.
Tell the list of google home devices in your IP range.
Get you the distance between two places or distance from your location to a particular place.
Create or remove tasks for todo lists.
Scan your Music folder for .mp3 and plays them on any smart devices in IP range.
Look for unread emails in your gmail account.
Get the weather information at any location.
Get the news update from fox-news.
Get your investment details using Robinhood API.
Get facts from Wikipedia using wiki api.
Open any application installed in running machine.
Get the current date and time.
Open a Google search for any query.
Shutdown/restart the running device.
Respond to most basic conversations.

If anything apart from the above is requested, Jarvis uses google's Places API to match the requested phrase and suggest options. If the requested phrase doesn't resolve on Places API, it then uses ChatGPT to respond (if APIKey is provided)

Telegram API

The telegram bot integration is implemented in two ways.

Telegram Webhooks
Telegram API polling

Refer telegram-webhook for more details.

Investment Details:

Login to your Robinhood Web App
Go to Account -> Settings or click me
Turn on Two-Factor Authentication
- Select “Authentication App”
- Click “Can’t Scan It?”, and copy the 16-character QR code

Facial Recognition:

For this feature to work, a bit of setup is required
- Create a parent directory named train and create more subdirectories with the name of a people within the parent directory train
- Now drop the images directly within the subdirectories named after each person
- Please note that even if the subdirectories are not added, the face_recognition script is written in a way that it can learn from unrecognized/new faces by storing them with named subdirectories within train
- The camera id has been made dynamic so that it can choose the camera automatically
Try to keep the images as light as possible (which means both the display and file size of each image)
- Reduced image size will not affect the accuracy as the facial recognition script converts each image into a pixel matrix
- Reduced image size and display area can actually help in faster scanning and recognition
Make good of the tolerance level in the face recognition script
- The learning_rate can be switched to match your needs
- Learning rate generally depends on the clarity of the images stored and the lighting at which it was taken
- Try to avoid images with very low resolution to maintain an adequate learning rate
- Lower the learning rate, higher the tolerance level (which means the lower you go on learning rate, the stricter or perhaps more accurate matching it does)
- Layman's terms:
  - Increased learning_rate: Exact to No match
  - Decreased learning_rate: Incorrect to close match
  - 0.5 - 0.7 should suffice in most cases though

Provide feedback

Saved searches

Use saved searches to filter your results more quickly