Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictions often fail on meta/llama-2-70b #259

Open
jdkanu opened this issue Mar 15, 2024 · 2 comments
Open

Predictions often fail on meta/llama-2-70b #259

jdkanu opened this issue Mar 15, 2024 · 2 comments

Comments

@jdkanu
Copy link

jdkanu commented Mar 15, 2024

Calls to meta/llama-2-70b are sometimes succeeding, but sometimes failing. It is very unreliable.

This is the code

output = replicate.run(
        "meta/llama-2-70b",
        input={
            "prompt": "Q: Would a pear sink in water? A: Let's think step by step. ",
            "max_new_tokens": 10000,
            "temperature": 0.01,
        }
    )

Example failure: https://replicate.com/p/x3brrjtbwq4ky6zm2z2ay27amy
Example failure: https://replicate.com/p/72pdpvtby7l7wgdzrpzzldqzne
Example failure: https://replicate.com/p/ucbimbtbhyjrw5udypzf6srsm4
Example success: https://replicate.com/p/n6hg2cdbym5ksifmlm6yfahjzm
Example success: https://replicate.com/p/j2jkwn3bfl6w2wc4q53mwju2o4
Example success: https://replicate.com/p/mtb6jcrbzjh57s2wjzfycegxoa

@mattt
Copy link
Contributor

mattt commented Mar 15, 2024

Hi @jdkanu. Thank you for reporting this. Looking at our telemetry, it does seem like predictions on GPUs in certain regions are failing more often due to read timeouts. We're investigating the cause, and working on a remediation.

@jdkanu
Copy link
Author

jdkanu commented Mar 15, 2024

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants