Skip to content

Conversation

jhcipar
Copy link

@jhcipar jhcipar commented Sep 4, 2025

This PR adds an example of using Tetra with vLLM for inference using the instructor package.

A CPU endpoint is created to handle remote dependencies and acts as a client submitting requests to a GPU endpoint running Qwen3-0.6B.

This example requires runpod/tetra-rp#89 to work properly.

@jhcipar jhcipar requested review from deanq and pandyamarut September 4, 2025 23:56
# This key is required to authenticate with RunPod's serverless API
RUNPOD_API_KEY = os.environ["RUNPOD_API_KEY"]

# Qwen3-0.6B is a compact model that's efficient for structured data extraction tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add all of these comments regarding choices on the top.
Keep it clean. Since we don't have separate readme for examples, an informative comment on the top works the best.

Copy link
Contributor

@pandyamarut pandyamarut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants