You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
spyder@b01:~/repos/distributed-llama$ kubectl get pod -n text-gen -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dllama-root-0 0/1 Completed 0 72s 192.168.17.176 cloud24 <none> <none>
logs
spyder@b01:~/repos/distributed-llama$ kubectl logs -f -n text-gen dllama-root-0
./distributed-llama/main inference --model ./models/dllama_meta-llama_Llama-2-70b_q40/dllama_meta-llama_Llama-2-70b_q40.bin --tokenizer ./models/dllama_meta-llama_Llama-2-70b_q40/tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 8
💡 dim: 8192
💡 hiddenDim: 28672
💡 nLayers: 80
💡 nHeads: 64
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
⏩ Loaded 39706066944 bytes
🔶 G 1641 ms I 1639 ms T 1 ms S 0 kB R 0 kB Hello
🔶 G 1578 ms I 1577 ms T 1 ms S 0 kB R 0 kB world
🔶 G 1698 ms I 1697 ms T 1 ms S 0 kB R 0 kB !
🔶 G 1553 ms I 1551 ms T 1 ms S 0 kB R 0 kB Welcome
🔶 G 1619 ms I 1619 ms T 0 ms S 0 kB R 0 kB to
🔶 G 1754 ms I 1753 ms T 0 ms S 0 kB R 0 kB our
🔶 G 1629 ms I 1629 ms T 0 ms S 0 kB R 0 kB website
🔶 G 1590 ms I 1590 ms T 0 ms S 0 kB R 0 kB .
🔶 G 1657 ms I 1656 ms T 1 ms S 0 kB R 0 kB
🔶 G 1610 ms I 1609 ms T 1 ms S 0 kB R 0 kB Wel
🔶 G 1590 ms I 1589 ms T 1 ms S 0 kB R 0 kB come
🔶 G 1646 ms I 1645 ms T 1 ms S 0 kB R 0 kB to
🔶 G 1678 ms I 1677 ms T 1 ms S 0 kB R 0 kB our
🔶 G 1602 ms I 1602 ms T 0 ms S 0 kB R 0 kB website
🔶 G 1576 ms I 1574 ms T 2 ms S 0 kB R 0 kB .
🔶 G 1605 ms I 1604 ms T 0 ms S 0 kB R 0 kB We
Generated tokens: 16
Avg generation time: 1626.62 ms
Avg inference time: 1625.69 ms
Avg transfer time: 0.69 ms
1x Pod (1x root: 16t/96GB)
logs
spyder@b01:~/repos/distributed-llama$ kubectl logs -f -n text-gen dllama-root-0
./distributed-llama/main inference --model ./models/dllama_meta-llama_Llama-2-70b_q40/dllama_meta-llama_Llama-2-70b_q40.bin --tokenizer ./models/dllama_meta-llama_Llama-2-70b_q40/tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 16
💡 dim: 8192
💡 hiddenDim: 28672
💡 nLayers: 80
💡 nHeads: 64
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
⏩ Loaded 39706066944 bytes
🔶 G 1041 ms I 1038 ms T 2 ms S 0 kB R 0 kB Hello
🔶 G 1187 ms I 1183 ms T 4 ms S 0 kB R 0 kB world
🔶 G 1087 ms I 1084 ms T 2 ms S 0 kB R 0 kB !
🔶 G 1072 ms I 1066 ms T 5 ms S 0 kB R 0 kB
🔶 G 1098 ms I 1096 ms T 1 ms S 0 kB R 0 kB Well
🔶 G 1006 ms I 1005 ms T 1 ms S 0 kB R 0 kB ,
🔶 G 1072 ms I 1070 ms T 1 ms S 0 kB R 0 kB it
🔶 G 1064 ms I 1063 ms T 1 ms S 0 kB R 0 kB '
🔶 G 1169 ms I 1167 ms T 1 ms S 0 kB R 0 kB s
🔶 G 1169 ms I 1169 ms T 0 ms S 0 kB R 0 kB been
🔶 G 1074 ms I 1071 ms T 2 ms S 0 kB R 0 kB a
🔶 G 1154 ms I 1150 ms T 4 ms S 0 kB R 0 kB while
🔶 G 1056 ms I 1054 ms T 2 ms S 0 kB R 0 kB since
🔶 G 1051 ms I 1050 ms T 1 ms S 0 kB R 0 kB I
🔶 G 1090 ms I 1089 ms T 0 ms S 0 kB R 0 kB '
🔶 G 1028 ms I 1025 ms T 2 ms S 0 kB R 0 kB ve
Generated tokens: 16
Avg generation time: 1088.62 ms
Avg inference time: 1086.25 ms
Avg transfer time: 1.81 ms
1x Pod (1x root: 32t/96GB)
logs
spyder@b01:~/repos/distributed-llama$ kubectl logs -f -n text-gen dllama-root-0
./distributed-llama/main inference --model ./models/dllama_meta-llama_Llama-2-70b_q40/dllama_meta-llama_Llama-2-70b_q40.bin --tokenizer ./models/dllama_meta-llama_Llama-2-70b_q40/tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 32
💡 dim: 8192
💡 hiddenDim: 28672
💡 nLayers: 80
💡 nHeads: 64
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
⏩ Loaded 39706066944 bytes
🔶 G 941 ms I 933 ms T 7 ms S 0 kB R 0 kB Hello
🔶 G 919 ms I 918 ms T 1 ms S 0 kB R 0 kB world
🔶 G 917 ms I 915 ms T 2 ms S 0 kB R 0 kB ,
🔶 G 924 ms I 921 ms T 2 ms S 0 kB R 0 kB I
🔶 G 918 ms I 918 ms T 0 ms S 0 kB R 0 kB ’
🔶 G 917 ms I 916 ms T 0 ms S 0 kB R 0 kB m
🔶 G 930 ms I 927 ms T 2 ms S 0 kB R 0 kB so
🔶 G 928 ms I 925 ms T 2 ms S 0 kB R 0 kB happy
🔶 G 926 ms I 921 ms T 4 ms S 0 kB R 0 kB to
🔶 G 928 ms I 926 ms T 1 ms S 0 kB R 0 kB announ
🔶 G 932 ms I 928 ms T 4 ms S 0 kB R 0 kB ce
🔶 G 933 ms I 928 ms T 4 ms S 0 kB R 0 kB that
🔶 G 933 ms I 929 ms T 3 ms S 0 kB R 0 kB my
🔶 G 926 ms I 923 ms T 2 ms S 0 kB R 0 kB new
🔶 G 930 ms I 928 ms T 2 ms S 0 kB R 0 kB video
🔶 G 934 ms I 933 ms T 0 ms S 0 kB R 0 kB series
Generated tokens: 16
Avg generation time: 927.25 ms
Avg inference time: 924.31 ms
Avg transfer time: 2.25 ms
spyder@b01:~/repos/text-gen$ kubectl logs -f -n text-gen dllama-root-0
./distributed-llama/main inference --model ./models/dllama_meta-llama_Llama-2-70b_q40/dllama_meta-llama_Llama-2-70b_q40.bin --tokenizer ./models/dllama_meta-llama_Llama-2-70b_q40/tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 32 --workers 192.168.30.59:9998 192.168.31.99:9998 192.168.27.60:9998 192.168.21.49:9998 192.168.24.251:9998 192.168.25.122:9998 192.168.21.109:9998
💡 dim: 8192
💡 hiddenDim: 28672
💡 nLayers: 80
💡 nHeads: 64
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
⏩ Loaded 39706066944 bytes
🔶 G 554 ms I 183 ms T 370 ms S 3569849 kB R 4016 kB Hello
🔶 G 534 ms I 178 ms T 356 ms S 28857 kB R 4016 kB world
🔶 G 546 ms I 180 ms T 365 ms S 28857 kB R 4016 kB .
🔶 G 547 ms I 179 ms T 367 ms S 28857 kB R 4016 kB This
🔶 G 578 ms I 189 ms T 389 ms S 28857 kB R 4016 kB is
🔶 G 584 ms I 203 ms T 380 ms S 28857 kB R 4016 kB my
🔶 G 557 ms I 189 ms T 368 ms S 28857 kB R 4016 kB first
🔶 G 630 ms I 215 ms T 415 ms S 28857 kB R 4016 kB post
🔶 G 566 ms I 181 ms T 385 ms S 28857 kB R 4016 kB .
🔶 G 572 ms I 165 ms T 406 ms S 28857 kB R 4016 kB I
🔶 G 605 ms I 196 ms T 408 ms S 28857 kB R 4016 kB thought
🔶 G 547 ms I 193 ms T 353 ms S 28857 kB R 4016 kB I
🔶 G 607 ms I 197 ms T 409 ms S 28857 kB R 4016 kB would
🔶 G 581 ms I 214 ms T 366 ms S 28857 kB R 4016 kB start
🔶 G 649 ms I 202 ms T 447 ms S 28857 kB R 4016 kB with
🔶 G 861 ms I 192 ms T 668 ms S 28857 kB R 4016 kB a
Generated tokens: 16
Avg generation time: 594.88 ms
Avg inference time: 191.00 ms
Avg transfer time: 403.25 ms
spyder@b01:~/repos/distributed-llama$ kubectl logs -f -n text-gen dllama-worker-0
./distributed-llama/main worker --port 9998 --nthreads 6
Listening on 0.0.0.0:9998...
Client connected
💡 sliceIndex: 1
💡 nSlices: 8
⏩ Received 60162048 bytes for block 0 (87700 kB/s)
⏩ Received 60162048 bytes for block 1 (89261 kB/s)
⏩ Received 60162048 bytes for block 2 (86069 kB/s)
⏩ Received 60162048 bytes for block 3 (92273 kB/s)
⏩ Received 60162048 bytes for block 4 (88344 kB/s)
⏩ Received 60162048 bytes for block 5 (89660 kB/s)
⏩ Received 60162048 bytes for block 6 (92986 kB/s)
⏩ Received 60162048 bytes for block 7 (95952 kB/s)
⏩ Received 60162048 bytes for block 8 (89660 kB/s)
⏩ Received 60162048 bytes for block 9 (91571 kB/s)
⏩ Received 60162048 bytes for block 10 (93565 kB/s)
⏩ Received 60162048 bytes for block 11 (77729 kB/s)
⏩ Received 60162048 bytes for block 12 (76835 kB/s)
⏩ Received 60162048 bytes for block 13 (91710 kB/s)
⏩ Received 60162048 bytes for block 14 (92415 kB/s)
⏩ Received 60162048 bytes for block 15 (89261 kB/s)
⏩ Received 60162048 bytes for block 16 (95043 kB/s)
⏩ Received 60162048 bytes for block 17 (90198 kB/s)
⏩ Received 60162048 bytes for block 18 (94594 kB/s)
⏩ Received 60162048 bytes for block 19 (92986 kB/s)
⏩ Received 60162048 bytes for block 20 (92843 kB/s)
⏩ Received 60162048 bytes for block 21 (91850 kB/s)
⏩ Received 60162048 bytes for block 22 (89928 kB/s)
⏩ Received 60162048 bytes for block 23 (89527 kB/s)
⏩ Received 60162048 bytes for block 24 (91850 kB/s)
⏩ Received 60162048 bytes for block 25 (90333 kB/s)
⏩ Received 60162048 bytes for block 26 (92557 kB/s)
⏩ Received 60162048 bytes for block 27 (91710 kB/s)
⏩ Received 60162048 bytes for block 28 (96414 kB/s)
⏩ Received 60162048 bytes for block 29 (52960 kB/s)
⏩ Received 60162048 bytes for block 30 (94150 kB/s)
⏩ Received 60162048 bytes for block 31 (91710 kB/s)
⏩ Received 60162048 bytes for block 32 (91991 kB/s)
⏩ Received 60162048 bytes for block 33 (94893 kB/s)
⏩ Received 60162048 bytes for block 34 (91017 kB/s)
⏩ Received 60162048 bytes for block 35 (88997 kB/s)
⏩ Received 60162048 bytes for block 36 (87318 kB/s)
⏩ Received 60162048 bytes for block 37 (92700 kB/s)
⏩ Received 60162048 bytes for block 38 (91017 kB/s)
⏩ Received 60162048 bytes for block 39 (92415 kB/s)
⏩ Received 60162048 bytes for block 40 (91710 kB/s)
⏩ Received 60162048 bytes for block 41 (92986 kB/s)
⏩ Received 60162048 bytes for block 42 (94298 kB/s)
⏩ Received 60162048 bytes for block 43 (92415 kB/s)
⏩ Received 60162048 bytes for block 44 (91155 kB/s)
⏩ Received 60162048 bytes for block 45 (93710 kB/s)
⏩ Received 60162048 bytes for block 46 (90198 kB/s)
⏩ Received 60162048 bytes for block 47 (91710 kB/s)
⏩ Received 60162048 bytes for block 48 (97507 kB/s)
⏩ Received 60162048 bytes for block 49 (95952 kB/s)
⏩ Received 60162048 bytes for block 50 (91571 kB/s)
⏩ Received 60162048 bytes for block 51 (91710 kB/s)
⏩ Received 60162048 bytes for block 52 (93857 kB/s)
⏩ Received 60162048 bytes for block 53 (93130 kB/s)
⏩ Received 60162048 bytes for block 54 (69311 kB/s)
⏩ Received 60162048 bytes for block 55 (81520 kB/s)
⏩ Received 60162048 bytes for block 56 (72484 kB/s)
⏩ Received 60162048 bytes for block 57 (98144 kB/s)
⏩ Received 60162048 bytes for block 58 (87956 kB/s)
⏩ Received 60162048 bytes for block 59 (96414 kB/s)
⏩ Received 60162048 bytes for block 60 (98626 kB/s)
⏩ Received 60162048 bytes for block 61 (94743 kB/s)
⏩ Received 60162048 bytes for block 62 (93419 kB/s)
⏩ Received 60162048 bytes for block 63 (98144 kB/s)
⏩ Received 60162048 bytes for block 64 (92843 kB/s)
⏩ Received 60162048 bytes for block 65 (94893 kB/s)
⏩ Received 60162048 bytes for block 66 (96414 kB/s)
⏩ Received 60162048 bytes for block 67 (91850 kB/s)
⏩ Received 60162048 bytes for block 68 (97192 kB/s)
⏩ Received 60162048 bytes for block 69 (99114 kB/s)
⏩ Received 60162048 bytes for block 70 (90742 kB/s)
⏩ Received 60162048 bytes for block 71 (94003 kB/s)
⏩ Received 60162048 bytes for block 72 (91571 kB/s)
⏩ Received 60162048 bytes for block 73 (94298 kB/s)
⏩ Received 60162048 bytes for block 74 (90469 kB/s)
⏩ Received 60162048 bytes for block 75 (93419 kB/s)
⏩ Received 60162048 bytes for block 76 (96106 kB/s)
⏩ Received 60162048 bytes for block 77 (91850 kB/s)
⏩ Received 60162048 bytes for block 78 (95344 kB/s)
⏩ Received 60162048 bytes for block 79 (97192 kB/s)
Error receiving data: socket closed
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Provider: homelab (k8s v1.24.14)
Pod/VolcanoJob resource limits: cpu: [8, 16, 32] , memory: 96Gi
Average Single Token Generation Time
Llama 7B Q40 Weights Q80 Buffer
1x Pod (1x root: 8t/96GB)
pods
logs
1x Pod (1x root: 16t/96GB)
logs
1x Pod (1x root: 32t/96GB)
logs
8x Pod (1x root: 32t/96GB, 7x 6t/8GB)
pods
logs
top nodes
CPU Info
CPU: Intel Xeon CPU E5-2696 v3
Intel Xeon E3-1246 v3
Beta Was this translation helpful? Give feedback.
All reactions