Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Webhook example with record completed #5477

Draft
wants to merge 90 commits into
base: feat/add-webhooks-feature-branch
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
0bdd174
feat: add rq as dependency
jfcalvo Aug 1, 2024
879e18c
feat: add rq support and first job to update dataset records status
jfcalvo Aug 27, 2024
68dad75
feat: add Redis and rq workers to argilla-hf-spaces Dockerfile
jfcalvo Aug 27, 2024
65ae689
Merge branch 'develop' into feat/add-rq
jfcalvo Aug 27, 2024
24733c2
feat: add Redis as service dependency to GitHub argilla-server workflow
jfcalvo Aug 27, 2024
684c025
feat: fix some problems with argilla-hf-spaces Dockerfile adding redis
jfcalvo Aug 27, 2024
abac3c6
feat: change docker-compose.yaml example file to use Redis and a rq w…
jfcalvo Aug 27, 2024
27b349c
feat: add ping redis function at application startup
jfcalvo Aug 27, 2024
4c30308
feat: add documentation about background jobs to argilla server READM…
jfcalvo Aug 27, 2024
f671597
feat: add Redis as dependency to developer docs
jfcalvo Aug 27, 2024
f47cb0c
Merge branch 'develop' into feat/add-rq
jfcalvo Aug 27, 2024
0755c96
chore: update CHANGELOG.md
jfcalvo Aug 27, 2024
7c939d2
feat: remove useless conditional
jfcalvo Aug 28, 2024
a7ca9ae
feat: add new job to update one single record status
jfcalvo Aug 28, 2024
689bc12
feat: update records status outside main job transaction
jfcalvo Aug 28, 2024
a8b17bd
feat: small refactors
jfcalvo Aug 28, 2024
275d6df
feat: disable job timeout for default queue
jfcalvo Aug 29, 2024
9cd7ee1
feat: move timeout definition outside of the queue
jfcalvo Aug 29, 2024
a6ed2cb
feat: add environment variable to define the number of background wor…
jfcalvo Aug 29, 2024
67c34c6
feat: increase default value of BACKGROUND_JOB_WORKERS environment va…
jfcalvo Aug 30, 2024
9be4f6d
feat: improve how number of workers is specified on docker compose an…
jfcalvo Aug 30, 2024
fc228fc
Merge branch 'develop' into feat/add-rq
jfcalvo Aug 30, 2024
cfa5219
feat: first proof of concept for webhooks using standard-webhooks
jfcalvo Aug 30, 2024
2ed5c34
feat: add webhooks table and API endpoints to support webhooks CRUD
jfcalvo Sep 2, 2024
7df6309
feat: improve notify events using database Webhooks
jfcalvo Sep 2, 2024
6268389
feat: add new endpoint to ping a webhook
jfcalvo Sep 3, 2024
c484c99
feat: add enabled and description to webhooks table
jfcalvo Sep 3, 2024
9ccfb58
feat: remove uniqueness for url and secret columns
jfcalvo Sep 3, 2024
fe228f3
feat: add a limit to the number of webhooks that can be created
jfcalvo Sep 3, 2024
fcbd310
feat: small refactor
jfcalvo Sep 3, 2024
aa678d1
Merge branch 'develop' into feat/add-webhooks
jfcalvo Sep 3, 2024
c0adb56
Merge branch 'develop' into feat/add-rq
jfcalvo Sep 4, 2024
e9f664b
✨ Invalidate progress cache
damianpumar Sep 4, 2024
2a5089c
chore: add missing file
jfcalvo Sep 4, 2024
e9d54cf
Merge branch 'feat/add-rq' into feat/add-webhooks-with-background-jobs
jfcalvo Sep 4, 2024
977f1a3
Merge branch 'develop' into feat/add-rq
jfcalvo Sep 6, 2024
a6a930d
Merge branch 'feat/add-rq' into feat/add-webhooks-with-background-jobs
jfcalvo Sep 6, 2024
12471a1
feat: add webhook integration with background jobs
jfcalvo Sep 6, 2024
1acb668
Merge branch 'develop' into feat/add-webhooks
frascuchon Sep 9, 2024
2be0e13
feat: remove admin as role that can use webhooks API
jfcalvo Sep 9, 2024
1645f6a
feat: use pydantic_v1 package to get HttpUrl class
jfcalvo Sep 9, 2024
c44422d
Merge branch 'feat/add-webhooks' into feat/add-webhooks-with-backgrou…
jfcalvo Sep 9, 2024
9d17339
chore: update CHANGELOG.md
jfcalvo Sep 9, 2024
37c8a5a
Merge branch 'feat/add-webhooks' into feat/add-webhooks-with-backgrou…
jfcalvo Sep 9, 2024
cbe5823
feat: add explicit timeout for webhooks notify event function
jfcalvo Sep 9, 2024
bfb0294
Merge branch 'feat/add-webhooks' into feat/add-webhooks-with-backgrou…
jfcalvo Sep 9, 2024
e2f671a
Merge branch 'feat/add-webhooks-feature-branch' into feat/add-webhook…
jfcalvo Sep 9, 2024
f9b2350
chore: update CHANGELOG.md
jfcalvo Sep 9, 2024
906dc0d
feat: add low queue to RQ
jfcalvo Sep 9, 2024
4a7c185
feat: move jsonable encoding of data to enqueue_notify_events function
jfcalvo Sep 10, 2024
0e424bb
feat: add some more example of invalid urls for webhooks
jfcalvo Sep 10, 2024
9b790e9
feat: retry webhooks event notifications jobs when the response statu…
jfcalvo Sep 10, 2024
ddc5a9b
feat: force jobs to return None so we don't polute Redis with useless…
jfcalvo Sep 10, 2024
c8cfd75
feat: use raise_for_status instead of manually checking for not success
jfcalvo Sep 10, 2024
71f0ffb
feat: add dataset created, updated, deleted and published webhook events
jfcalvo Sep 10, 2024
d876682
chore: update CHANGELOG.md
jfcalvo Sep 11, 2024
3ff2c07
feat: add support for response.upserted webhook events
jfcalvo Sep 11, 2024
d23894d
chore: webhook example code
frascuchon Sep 11, 2024
3a3e278
add required env variable to avoid worker priviledges errors
frascuchon Sep 11, 2024
00a4164
fix: Prevent None values for vectors and add dataset_id attribute
frascuchon Sep 11, 2024
7944148
expose WebhooksAPI in client.api
frascuchon Sep 11, 2024
5aaf551
add record.completed event
frascuchon Sep 11, 2024
027ba82
send record event with record responses and suggestions
frascuchon Sep 11, 2024
598bfe6
add server notify record event module
frascuchon Sep 11, 2024
fbaa728
prevent errors on None vector, suggestions or responses
frascuchon Sep 11, 2024
962f340
Update README.md
frascuchon Sep 11, 2024
9a981fc
Merge branch 'feat/add-webhooks-feature-branch' into feat/webhook-exa…
frascuchon Sep 11, 2024
9c735e2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 11, 2024
9a50db5
Remove dockerfile
frascuchon Sep 11, 2024
f1b584b
feat: add expanded events for webhooks
jfcalvo Sep 11, 2024
2c56e7a
feat: use awaitable_attrs to don't replace the current state of the r…
jfcalvo Sep 11, 2024
fbf8511
feat: add missing note comment
jfcalvo Sep 11, 2024
e22879e
feat: add additional resources to Webhooks DatasetEventSchema
jfcalvo Sep 12, 2024
767127c
chore: add comment
jfcalvo Sep 12, 2024
2ee36a6
chore: add comment
jfcalvo Sep 12, 2024
10c5983
Merge branch 'feat/add-webhook-expanded-events' into feat/webhook-exa…
frascuchon Sep 13, 2024
0d352b5
feat: use database query with selectinload to load resources needed b…
jfcalvo Sep 13, 2024
443d563
feat: using a different Redis database for testing
jfcalvo Sep 13, 2024
35dbdc9
Merge branch 'feat/add-webhook-expanded-events' into feat/webhook-exa…
frascuchon Sep 16, 2024
c7874f3
Merge branch 'feat/add-webhooks-feature-branch' into feat/webhook-exa…
frascuchon Sep 16, 2024
b5608fb
Merge branch 'feat/add-webhooks-feature-branch' into feat/webhook-exa…
frascuchon Sep 16, 2024
039efce
create record-related events
frascuchon Sep 16, 2024
841555a
create record.is_completed method
frascuchon Sep 16, 2024
ee3704b
feat: Notify record events
frascuchon Sep 16, 2024
3475e21
feat: Notify record events
frascuchon Sep 16, 2024
aaaa214
chore: Update changelog
frascuchon Sep 16, 2024
6ec297d
Merge branch 'feat/add-record-related-webhook-events' into feat/webho…
frascuchon Sep 16, 2024
d2ffa45
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 16, 2024
45cb194
Merge branch 'feat/add-webhooks-feature-branch' into feat/webhook-exa…
frascuchon Sep 16, 2024
aca87e8
Merge branch 'feat/add-webhooks-feature-branch' into feat/webhook-exa…
frascuchon Sep 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions argilla-server/.env.dev
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@ OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES # Needed by RQ to work with forked proce
ALEMBIC_CONFIG=src/argilla_server/alembic.ini
ARGILLA_AUTH_SECRET_KEY=8VO7na5N/jQx+yP/N+HlE8q51vPdrxqlh6OzoebIyko= # With this we avoid using a different key every time the server is reloaded
ARGILLA_DATABASE_URL=sqlite+aiosqlite:///${HOME}/.argilla/argilla.db?check_same_thread=False
# For mac users only https://github.com/rq/rq/issues/1418
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
7 changes: 3 additions & 4 deletions argilla-server/src/argilla_server/contexts/distribution.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import backoff
import sqlalchemy

from typing import List
from uuid import UUID

from sqlalchemy.orm import selectinload
import backoff
import sqlalchemy
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload

from argilla_server.api.webhooks.v1.enums import RecordEvent
from argilla_server.api.webhooks.v1.records import notify_record_event as notify_record_event_v1
Expand Down
28 changes: 28 additions & 0 deletions argilla/src/argilla/_models/_record/_record.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ class RecordModel(ResourceModel):
suggestions: Optional[Union[Tuple[SuggestionModel], List[SuggestionModel]]] = Field(default_factory=tuple)
external_id: Optional[Any] = Field(default=None)

dataset_id: Optional[uuid.UUID] = Field(default=None)

@field_serializer("external_id", when_used="unless-none")
def serialize_external_id(self, value: str) -> str:
return str(value)
Expand Down Expand Up @@ -77,3 +79,29 @@ def validate_external_id(cls, external_id: Any) -> Union[str, int, uuid.UUID]:
if external_id is None:
external_id = uuid.uuid4()
return external_id

@field_validator("vectors", mode="before")
@classmethod
def empty_vectors_if_none(cls, vectors: Optional[List[VectorModel]]) -> Optional[List[VectorModel]]:
"""Ensure vectors is None if not provided."""
if vectors is None:
return []
return vectors

@field_validator("responses", mode="before")
@classmethod
def empty_responses_if_none(cls, responses: Optional[List[UserResponseModel]]) -> Optional[List[UserResponseModel]]:
"""Ensure responses is None if not provided."""
if responses is None:
return []
return responses

@field_validator("suggestions", mode="before")
@classmethod
def empty_suggestions_if_none(
cls, suggestions: Optional[Union[Tuple[SuggestionModel], List[SuggestionModel]]]
) -> Optional[Union[Tuple[SuggestionModel], List[SuggestionModel]]]:
"""Ensure suggestions is None if not provided."""
if suggestions is None:
return []
return suggestions
162 changes: 162 additions & 0 deletions examples/webhooks/distilabel_trigger/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
31 changes: 31 additions & 0 deletions examples/webhooks/distilabel_trigger/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<!--
This example is based on the work done by Ben on this repo https://github.com/burtenshaw/distilabel_trigger
-->


## Running the app

1. Start argilla server and argilla worker
```bash
pdm server start
pdm worker
```

2. Add the `localhost.org` alias in the `/etc/hosts` file to comply with the Top Level Domain URL requirement.
```
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost localhost.org
```

2. Start the app
```bash
uvicorn webhook:server
```

## Testing the app
Annotate some record in the argilla UI and check the logs of the app to see the webhook being triggered.
49 changes: 49 additions & 0 deletions examples/webhooks/distilabel_trigger/configure_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import os
from typing import List

from distilabel.llms import InferenceEndpointsLLM
from distilabel.steps.tasks import TextGeneration, Task, UltraFeedback

LLAMA_MODEL_ID = os.environ.get(
"LLAMA_MODEL_ID", "meta-llama/Meta-Llama-3.1-8B-Instruct"
)
GEMMA_MODEL_ID = os.environ.get("GEMMA_MODEL_ID", "google/gemma-1.1-7b-it")
ULTRAFEEDBACK_MODEL_ID = os.environ.get(
"ULTRAFEEDBACK_MODEL_ID", "meta-llama/Meta-Llama-3.1-70B-Instruct"
)


def initialize_text_generation_models() -> List["Task"]:
llama31 = TextGeneration(
name="text-generation",
llm=InferenceEndpointsLLM(
model_id=LLAMA_MODEL_ID,
tokenizer_id=LLAMA_MODEL_ID,
),
)
llama31.load()

gemma_tiny = TextGeneration(
name="text-generation",
llm=InferenceEndpointsLLM(
model_id=GEMMA_MODEL_ID,
tokenizer_id=GEMMA_MODEL_ID,
),
)
gemma_tiny.load()

return [llama31, gemma_tiny]


def initialize_ultrafeedback():
ultrafeedback = UltraFeedback(
aspect="overall-rating",
llm=InferenceEndpointsLLM(
model_id=ULTRAFEEDBACK_MODEL_ID,
tokenizer_id=ULTRAFEEDBACK_MODEL_ID,
),
)

ultrafeedback.load()

return ultrafeedback
24 changes: 24 additions & 0 deletions examples/webhooks/distilabel_trigger/configure_webhook.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import os

import argilla as rg
from argilla._api._webhooks import WebhookModel
from standardwebhooks.webhooks import Webhook

WEBHOOK_BASE_URL = os.getenv("WEBHOOK_BASE_URL", "http://localhost.org:8000")


def configure_webhook(client: rg.Argilla, path: str) -> Webhook:
# Configure the webhook
for wh_model in client.api.webhooks.list():
client.api.webhooks.delete(wh_model.id)

model = WebhookModel(
url=f"{WEBHOOK_BASE_URL}{path}",
events=["record.completed"],
description="Webhook for record completion",
)

webhook_model = client.api.webhooks.create(model)
webhook = Webhook(whsecret=webhook_model.secret)

return webhook
Loading