- support new LLM APIs
- rewrite how services are called
- handle no API selected
- rewrite prompts + service block formats
- implement new LLM API that has
HassCallService
so old models can still work
- update dataset so new models will work with the API
- make ICL examples into conversation turns
- translate ICL examples + make better ones
- areas/room support
- convert requests to aiohttp
- detection/mitigation of too many entities being exposed & blowing out the context length
- figure out DPO to improve response quality
- setup github actions to build wheels that are optimized for RPIs
- mixtral + prompting (no fine tuning)
- add in context learning variables to sys prompt template
- add new options to setup process for setting prompt style + picking fine-tuned/ICL
- prime kv cache with current "state" so that requests are faster
- ChatML format (actually need to add special tokens)
- Vicuna dataset merge (yahma/alpaca-cleaned)
- Phi-2 fine tuning
- Quantize /w llama.cpp
- Make custom component use llama.cpp + ChatML
- Continued synthetic dataset improvements (there are a bunch of TODOs in there)
- Licenses + Attributions
- Finish Readme/docs for initial release
- Function calling as JSON
- Fine tune Phi-1.5 version
- make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
- make a proper evaluation framework to run. not just loss. should test accuracy on the function calling
- add more remote backends
- LocalAI (openai compatible)
- Ollama
- support chat completions API (might fix Ollama + adds support for text-gen-ui characters)
- more config options for prompt template (allow other than chatml)
- publish snapshot of dataset on HF
- use varied system prompts to add behaviors
- "context requests"
- basically just let the model decide what RAG/extra context it wants
- the model predicts special tokens as the first few tokens of its output
- the requested content is added to the context after the request tokens and then generation continues
- needs more complicated training b/c multi-turn + there will be some weird masking going on for training the responses properly
- integrate with llava for checking camera feeds in home assistant
- can check still frames to describe what is there
- for remote backends that support images, could also support this
- depends on context requests because we don't want to feed camera feeds into the context every time
- RAG for getting info for setting up new devices
- set up vectordb
- ingest home assistant docs
- "context request" from above to initiate a RAG search
- train the model to respond to house events
- present the model with an event + a "prompt" from the user of what you want it to do (i.e. turn on the lights when I get home = the model turns on lights when your entity presence triggers as being home)
- basically lets you write automations in plain english