The Official Python Client for Galileo.
Galileo is a tool for understanding and improving the quality of your NLP and CV data.
Galileo gives you access to all of the information you need, at a UI and API level, to continuously build better and more robust datasets and models.
dataquality
is your entrypoint to Galileo. It helps you start and complete the loop of data quality improvements.
Install the package.
pip install dataquality
Create an account at Galileo
Grab your token
Get your dataset and analyze it with dq.auto
(You will be prompted for your token here)
import dataquality as dq
dq.auto(
train_data="/path/to/train.csv",
val_data="/path/to/val.csv",
test_data="/path/to/test.csv",
project_name="my_first_project",
run_name="my_first_run",
)
☕️ Wait for Galileo to train your model and analyze the results.
✨ A link to your run will be provided automatically
By setting the token, you'll never be prompted to log in
import dataquality as dq
dq.config.token = 'MY-TOKEN'
For long-lived flows like CI/CD, see our docs on environment variables
Currently, you can analyze Text Classification and NER
If you want support for other kinds, reach out!
auto
params train_data
, val_data
, and test_data
can also take as input pandas dataframes and huggingface dataframes!
Use the hf_data
param to point to a dataset in huggingface
import dataquality as dq
dq.auto(hf_data="rungalileo/emotion")
Run help(dq.auto)
for more information on usage
Check out our docs for the inspiration behind this methodology.
Yes! Check out our full documentation and example notebooks on how to integrate your own model with Galileo
We have an app for that! Currently text classification only, but reach out if you want a new modality!
This is currently in development, and not an official part of the Galileo product, but rather an open source tool for the community.
We've built a bulk-labeling tool (and hosted it on streamlit) to help you generate labels quickly using semantic embeddings and text search.
For more info on how it works and how to use it, check out the open source repo.
Yes! See our docs on dq.metrics
to access things like overall metrics, your analyzed dataframe, and even your embeddings.
Read our contributing doc!