You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+41-25Lines changed: 41 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,11 @@ This Docker image runs a Llama model on a serverless RunPod instance using the o
7
7
8
8
## Set Up
9
9
1. Create a RunPod account and navigate to the [RunPod Serverless Console](https://www.runpod.io/console/serverless).
10
-
2. Navigate to `My Templates` and click on the `New Template` button.
11
-
3. Enter in the following fields and click on the `Save Template` button:
10
+
2. (Optional) Create a Network Volume to cache your model to speed up cold starts (but will incur some cost per hour for storage).
11
+
-*Note: Only certain Network Volume regions are compatible with certain instance types on RunPod, so try out if your Network Volume makes your desired instance type Unavailable, try other regions for your Network Volume.*
4. Now click on `My Endpoints` and click on the `New Endpoint` button.
39
44
5. Fill in the following fields and click on the `Create` button:
40
45
| Endpoint Field | Value |
@@ -45,8 +50,9 @@ This Docker image runs a Llama model on a serverless RunPod instance using the o
45
50
| Max Workers |`1`|
46
51
| Idle Timeout |`5` seconds |
47
52
| FlashBoot | Checked/Enabled |
48
-
| GPU Type(s) | Use the `Container Disk` section of step 3 to determine the smallest GPU that can load the entire 4 bit model. In our example's case, use 16 GB GPU. |
49
-
53
+
| GPU Type(s) | Use the `Container Disk` section of step 3 to determine the smallest GPU that can load the entire 4 bit model. In our example's case, use 16 GB GPU. Make smaller if using Network Volume instead. |
parser.add_argument('-p', '--params_json', type=str, help='JSON string of generation params')
134
144
135
145
prompt ="""Given the following clinical notes, what tests, diagnoses, and recommendations should the I give? Provide your answer as a detailed report with labeled sections "Diagnostic Tests", "Possible Diagnoses", and "Patient Recommendations".
136
146
@@ -143,7 +153,13 @@ if __name__ == '__main__':
143
153
-fh:father had MI recently,mother has thyroid dz
144
154
-sh:non-smoker,mariguana 5-6 months ago,3 beers on the weekend, basketball at school
145
155
-sh:no std,no other significant medical conditions."""
To run with streaming enabled, use the `--stream` option. To set generation parameters, use the `--params_json` option to pass a JSON string of parameters:
0 commit comments