You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
n4no_com@instance-1:~/distributed-llama$
sudo nice -n -20 ./main inference --model ../dllama_llama-2-70b_q40.bin --tokenizer ../tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 16
💡 dim: 8192
💡 hiddenDim: 28672
💡 nLayers: 80
💡 nHeads: 64
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
⏩ Loaded 39706066944 bytes
🔶 G 910 ms I 909 ms T 1 ms S 0 kB R 0 kB Hello
🔶 G 905 ms I 903 ms T 2 ms S 0 kB R 0 kB world
🔶 G 907 ms I 905 ms T 2 ms S 0 kB R 0 kB .
🔶 G 908 ms I 906 ms T 1 ms S 0 kB R 0 kB I
🔶 G 909 ms I 908 ms T 0 ms S 0 kB R 0 kB am
🔶 G 907 ms I 904 ms T 2 ms S 0 kB R 0 kB a
🔶 G 908 ms I 905 ms T 2 ms S 0 kB R 0 kB writer
🔶 G 912 ms I 907 ms T 4 ms S 0 kB R 0 kB .
🔶 G 910 ms I 906 ms T 3 ms S 0 kB R 0 kB
🔶 G 910 ms I 907 ms T 2 ms S 0 kB R 0 kB I
🔶 G 908 ms I 908 ms T 0 ms S 0 kB R 0 kB had
🔶 G 913 ms I 912 ms T 0 ms S 0 kB R 0 kB a
🔶 G 913 ms I 909 ms T 4 ms S 0 kB R 0 kB lot
🔶 G 913 ms I 912 ms T 0 ms S 0 kB R 0 kB of
🔶 G 909 ms I 905 ms T 4 ms S 0 kB R 0 kB people
🔶 G 913 ms I 910 ms T 1 ms S 0 kB R 0 kB tell
Generated tokens: 16
Avg generation time: 909.69 ms
Avg inference time: 907.25 ms
Avg transfer time: 1.75 ms
Llama 70B / 2 VM
n4no_com@instance-1:~/distributed-llama$ sudo nice -n -20 ./main inference --model ../dllama_llama-2-70b_q40.bin --tokenizer ../tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 16 --workers 10.132.0.14:9998
💡 dim: 8192
💡 hiddenDim: 28672
💡 nLayers: 80
💡 nHeads: 64
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 2
⏩ Loaded 39706066944 bytes
🔶 G 499 ms I 470 ms T 29 ms S 2026654 kB R 2295 kB Hello
🔶 G 499 ms I 470 ms T 29 ms S 3230 kB R 2295 kB world
🔶 G 498 ms I 471 ms T 26 ms S 3230 kB R 2295 kB !
🔶 G 499 ms I 476 ms T 22 ms S 3230 kB R 2295 kB
🔶 G 499 ms I 475 ms T 23 ms S 3230 kB R 2295 kB 1
🔶 G 499 ms I 481 ms T 17 ms S 3230 kB R 2295 kB 5
🔶 G 499 ms I 474 ms T 24 ms S 3230 kB R 2295 kB Jun
🔶 G 501 ms I 473 ms T 27 ms S 3230 kB R 2295 kB
🔶 G 502 ms I 475 ms T 27 ms S 3230 kB R 2295 kB 2
🔶 G 507 ms I 473 ms T 31 ms S 3230 kB R 2295 kB saved
🔶 G 502 ms I 476 ms T 25 ms S 3230 kB R 2295 kB by
🔶 G 502 ms I 477 ms T 24 ms S 3230 kB R 2295 kB Anna
🔶 G 503 ms I 476 ms T 26 ms S 3230 kB R 2295 kB Sav
🔶 G 504 ms I 478 ms T 26 ms S 3230 kB R 2295 kB ost
🔶 G 505 ms I 485 ms T 19 ms S 3230 kB R 2295 kB ina
🔶 G 504 ms I 478 ms T 25 ms S 3230 kB R 2295 kB
Generated tokens: 16
Avg generation time: 501.38 ms
Avg inference time: 475.50 ms
Avg transfer time: 25.00 ms
n4no_com@instance-2:~/distributed-llama/distributed-llama$ sudo nice -n -20 ./main worker --port 9998 --nthreads 16
Listening on 0.0.0.0:9998...
Client connected
💡 sliceIndex: 1
💡 nSlices: 2
⏩ Received 240648192 bytes for block 0 (39270 kB/s)
⏩ Received 240648192 bytes for block 1 (99730 kB/s)
⏩ Received 240648192 bytes for block 2 (100354 kB/s)
⏩ Received 240648192 bytes for block 3 (99813 kB/s)
⏩ Received 240648192 bytes for block 4 (99813 kB/s)
⏩ Received 240648192 bytes for block 5 (99771 kB/s)
⏩ Received 240648192 bytes for block 6 (99854 kB/s)
⏩ Received 240648192 bytes for block 7 (99813 kB/s)
⏩ Received 240648192 bytes for block 8 (99771 kB/s)
⏩ Received 240648192 bytes for block 9 (99813 kB/s)
⏩ Received 240648192 bytes for block 10 (99771 kB/s)
⏩ Received 240648192 bytes for block 11 (99854 kB/s)
⏩ Received 240648192 bytes for block 12 (99813 kB/s)
⏩ Received 240648192 bytes for block 13 (99771 kB/s)
⏩ Received 240648192 bytes for block 14 (99771 kB/s)
⏩ Received 240648192 bytes for block 15 (99854 kB/s)
⏩ Received 240648192 bytes for block 16 (99813 kB/s)
⏩ Received 240648192 bytes for block 17 (99854 kB/s)
⏩ Received 240648192 bytes for block 18 (99771 kB/s)
⏩ Received 240648192 bytes for block 19 (99813 kB/s)
⏩ Received 240648192 bytes for block 20 (99771 kB/s)
⏩ Received 240648192 bytes for block 21 (99895 kB/s)
⏩ Received 240648192 bytes for block 22 (99771 kB/s)
⏩ Received 240648192 bytes for block 23 (99854 kB/s)
⏩ Received 240648192 bytes for block 24 (99813 kB/s)
⏩ Received 240648192 bytes for block 25 (99771 kB/s)
⏩ Received 240648192 bytes for block 26 (99771 kB/s)
⏩ Received 240648192 bytes for block 27 (99854 kB/s)
⏩ Received 240648192 bytes for block 28 (99813 kB/s)
⏩ Received 240648192 bytes for block 29 (97271 kB/s)
⏩ Received 240648192 bytes for block 30 (99771 kB/s)
⏩ Received 240648192 bytes for block 31 (100605 kB/s)
⏩ Received 240648192 bytes for block 32 (99895 kB/s)
⏩ Received 240648192 bytes for block 33 (99771 kB/s)
⏩ Received 240648192 bytes for block 34 (99854 kB/s)
⏩ Received 240648192 bytes for block 35 (99689 kB/s)
⏩ Received 240648192 bytes for block 36 (99895 kB/s)
⏩ Received 240648192 bytes for block 37 (99895 kB/s)
⏩ Received 240648192 bytes for block 38 (101454 kB/s)
⏩ Received 240648192 bytes for block 39 (662943 kB/s)
⏩ Received 240648192 bytes for block 40 (1162552 kB/s)
⏩ Received 240648192 bytes for block 41 (1162552 kB/s)
⏩ Received 240648192 bytes for block 42 (1162552 kB/s)
⏩ Received 240648192 bytes for block 43 (1173894 kB/s)
⏩ Received 240648192 bytes for block 44 (1168195 kB/s)
⏩ Received 240648192 bytes for block 45 (1173894 kB/s)
⏩ Received 240648192 bytes for block 46 (1173894 kB/s)
⏩ Received 240648192 bytes for block 47 (1162552 kB/s)
⏩ Received 240648192 bytes for block 48 (1168195 kB/s)
⏩ Received 240648192 bytes for block 49 (177863 kB/s)
⏩ Received 240648192 bytes for block 50 (100020 kB/s)
⏩ Received 240648192 bytes for block 51 (99813 kB/s)
⏩ Received 240648192 bytes for block 52 (99895 kB/s)
⏩ Received 240648192 bytes for block 53 (99895 kB/s)
⏩ Received 240648192 bytes for block 54 (99813 kB/s)
⏩ Received 240648192 bytes for block 55 (99854 kB/s)
⏩ Received 240648192 bytes for block 56 (99978 kB/s)
⏩ Received 240648192 bytes for block 57 (99854 kB/s)
⏩ Received 240648192 bytes for block 58 (99771 kB/s)
⏩ Received 240648192 bytes for block 59 (99854 kB/s)
⏩ Received 240648192 bytes for block 60 (99771 kB/s)
⏩ Received 240648192 bytes for block 61 (99854 kB/s)
⏩ Received 240648192 bytes for block 62 (99771 kB/s)
⏩ Received 240648192 bytes for block 63 (99854 kB/s)
⏩ Received 240648192 bytes for block 64 (99813 kB/s)
⏩ Received 240648192 bytes for block 65 (99771 kB/s)
⏩ Received 240648192 bytes for block 66 (99854 kB/s)
⏩ Received 240648192 bytes for block 67 (99771 kB/s)
⏩ Received 240648192 bytes for block 68 (99895 kB/s)
⏩ Received 240648192 bytes for block 69 (99813 kB/s)
⏩ Received 240648192 bytes for block 70 (99854 kB/s)
⏩ Received 240648192 bytes for block 71 (99771 kB/s)
⏩ Received 240648192 bytes for block 72 (99771 kB/s)
⏩ Received 240648192 bytes for block 73 (99854 kB/s)
⏩ Received 240648192 bytes for block 74 (99730 kB/s)
⏩ Received 240648192 bytes for block 75 (99895 kB/s)
⏩ Received 240648192 bytes for block 76 (99771 kB/s)
⏩ Received 240648192 bytes for block 77 (99854 kB/s)
⏩ Received 240648192 bytes for block 78 (99771 kB/s)
⏩ Received 240648192 bytes for block 79 (94261 kB/s)
Error receiving data: socket closed
Llama 70B / 4 VM
n4no_com@instance-1:~/distributed-llama$ sudo nice -n -20 ./main inference --model ../dllama_llama-2-70b_q40.bin --tokenizer ../tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 16 --workers 10.132.0.14:9998 10.132.0.15:9998 10.132.0.16:9998
💡 dim: 8192
💡 hiddenDim: 28672
💡 nLayers: 80
💡 nHeads: 64
💡 nKvHeads: 8
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
⏩ Loaded 39706066944 bytes
🔶 G 290 ms I 269 ms T 21 ms S 3046611 kB R 3442 kB Hello
🔶 G 288 ms I 254 ms T 34 ms S 11475 kB R 3442 kB world
🔶 G 295 ms I 256 ms T 38 ms S 11475 kB R 3442 kB !
🔶 G 290 ms I 254 ms T 35 ms S 11475 kB R 3442 kB -
🔶 G 291 ms I 261 ms T 29 ms S 11475 kB R 3442 kB Autom
🔶 G 291 ms I 267 ms T 24 ms S 11475 kB R 3442 kB atic
🔶 G 293 ms I 255 ms T 37 ms S 11475 kB R 3442 kB B
🔶 G 293 ms I 267 ms T 25 ms S 11475 kB R 3442 kB log
🔶 G 293 ms I 268 ms T 25 ms S 11475 kB R 3442 kB W
🔶 G 294 ms I 265 ms T 29 ms S 11475 kB R 3442 kB riter
🔶 G 294 ms I 267 ms T 26 ms S 11475 kB R 3442 kB
🔶 G 295 ms I 262 ms T 32 ms S 11475 kB R 3442 kB Hello
🔶 G 295 ms I 267 ms T 27 ms S 11475 kB R 3442 kB world
🔶 G 295 ms I 270 ms T 25 ms S 11475 kB R 3442 kB !
🔶 G 296 ms I 271 ms T 25 ms S 11475 kB R 3442 kB /
🔶 G 296 ms I 271 ms T 24 ms S 11475 kB R 3442 kB December
Generated tokens: 16
Avg generation time: 293.06 ms
Avg inference time: 264.00 ms
Avg transfer time: 28.50 ms
n4no_com@instance-4:~/distributed-llama$ sudo nice -n -20 ./main worker --port 9998 --nthreads 16
Listening on 0.0.0.0:9998...
Client connected
💡 sliceIndex: 3
💡 nSlices: 4
⏩ Received 120324096 bytes for block 0 (35907 kB/s)
⏩ Received 120324096 bytes for block 1 (49906 kB/s)
⏩ Received 120324096 bytes for block 2 (49886 kB/s)
⏩ Received 120324096 bytes for block 3 (49886 kB/s)
⏩ Received 120324096 bytes for block 4 (49906 kB/s)
⏩ Received 120324096 bytes for block 5 (49906 kB/s)
⏩ Received 120324096 bytes for block 6 (49844 kB/s)
⏩ Received 120324096 bytes for block 7 (50010 kB/s)
⏩ Received 120324096 bytes for block 8 (49886 kB/s)
⏩ Received 120324096 bytes for block 9 (49927 kB/s)
⏩ Received 120324096 bytes for block 10 (49865 kB/s)
⏩ Received 120324096 bytes for block 11 (49948 kB/s)
⏩ Received 120324096 bytes for block 12 (49906 kB/s)
⏩ Received 120324096 bytes for block 13 (49865 kB/s)
⏩ Received 120324096 bytes for block 14 (49927 kB/s)
⏩ Received 120324096 bytes for block 15 (49906 kB/s)
⏩ Received 120324096 bytes for block 16 (49906 kB/s)
⏩ Received 120324096 bytes for block 17 (49886 kB/s)
⏩ Received 120324096 bytes for block 18 (49906 kB/s)
⏩ Received 120324096 bytes for block 19 (49927 kB/s)
⏩ Received 120324096 bytes for block 20 (49906 kB/s)
⏩ Received 120324096 bytes for block 21 (49906 kB/s)
⏩ Received 120324096 bytes for block 22 (49906 kB/s)
⏩ Received 120324096 bytes for block 23 (49906 kB/s)
⏩ Received 120324096 bytes for block 24 (49906 kB/s)
⏩ Received 120324096 bytes for block 25 (49906 kB/s)
⏩ Received 120324096 bytes for block 26 (49927 kB/s)
⏩ Received 120324096 bytes for block 27 (49886 kB/s)
⏩ Received 120324096 bytes for block 28 (49886 kB/s)
⏩ Received 120324096 bytes for block 29 (49927 kB/s)
⏩ Received 120324096 bytes for block 30 (49927 kB/s)
⏩ Received 120324096 bytes for block 31 (50324 kB/s)
⏩ Received 120324096 bytes for block 32 (49906 kB/s)
⏩ Received 120324096 bytes for block 33 (49906 kB/s)
⏩ Received 120324096 bytes for block 34 (49906 kB/s)
⏩ Received 120324096 bytes for block 35 (49906 kB/s)
⏩ Received 120324096 bytes for block 36 (97904 kB/s)
⏩ Received 120324096 bytes for block 37 (518638 kB/s)
⏩ Received 120324096 bytes for block 38 (505563 kB/s)
⏩ Received 120324096 bytes for block 39 (509848 kB/s)
⏩ Received 120324096 bytes for block 40 (520884 kB/s)
⏩ Received 120324096 bytes for block 41 (523148 kB/s)
⏩ Received 120324096 bytes for block 42 (523148 kB/s)
⏩ Received 120324096 bytes for block 43 (520884 kB/s)
⏩ Received 120324096 bytes for block 44 (523148 kB/s)
⏩ Received 120324096 bytes for block 45 (520884 kB/s)
⏩ Received 120324096 bytes for block 46 (523148 kB/s)
⏩ Received 120324096 bytes for block 47 (520884 kB/s)
⏩ Received 120324096 bytes for block 48 (520884 kB/s)
⏩ Received 120324096 bytes for block 49 (495161 kB/s)
⏩ Received 120324096 bytes for block 50 (514206 kB/s)
⏩ Received 120324096 bytes for block 51 (516412 kB/s)
⏩ Received 120324096 bytes for block 52 (525433 kB/s)
⏩ Received 120324096 bytes for block 53 (518638 kB/s)
⏩ Received 120324096 bytes for block 54 (520884 kB/s)
⏩ Received 120324096 bytes for block 55 (518638 kB/s)
⏩ Received 120324096 bytes for block 56 (520884 kB/s)
⏩ Received 120324096 bytes for block 57 (520884 kB/s)
⏩ Received 120324096 bytes for block 58 (512017 kB/s)
⏩ Received 120324096 bytes for block 59 (509848 kB/s)
⏩ Received 120324096 bytes for block 60 (514206 kB/s)
⏩ Received 120324096 bytes for block 61 (518638 kB/s)
⏩ Received 120324096 bytes for block 62 (516412 kB/s)
⏩ Received 120324096 bytes for block 63 (516412 kB/s)
⏩ Received 120324096 bytes for block 64 (523148 kB/s)
⏩ Received 120324096 bytes for block 65 (523148 kB/s)
⏩ Received 120324096 bytes for block 66 (525433 kB/s)
⏩ Received 120324096 bytes for block 67 (520884 kB/s)
⏩ Received 120324096 bytes for block 68 (525433 kB/s)
⏩ Received 120324096 bytes for block 69 (520884 kB/s)
⏩ Received 120324096 bytes for block 70 (523148 kB/s)
⏩ Received 120324096 bytes for block 71 (523148 kB/s)
⏩ Received 120324096 bytes for block 72 (523148 kB/s)
⏩ Received 120324096 bytes for block 73 (520884 kB/s)
⏩ Received 120324096 bytes for block 74 (523148 kB/s)
⏩ Received 120324096 bytes for block 75 (523148 kB/s)
⏩ Received 120324096 bytes for block 76 (523148 kB/s)
⏩ Received 120324096 bytes for block 77 (520884 kB/s)
⏩ Received 120324096 bytes for block 78 (355988 kB/s)
⏩ Received 120324096 bytes for block 79 (225326 kB/s)
Error receiving data: socket closed
CPU Info
n4no_com@instance-1:~/distributed-llama$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 30
On-line CPU(s) list: 0-29
Thread(s) per core: 2
Core(s) per socket: 15
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9B14
Stepping: 1
CPU MHz: 2600.000
BogoMIPS: 5200.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 480 KiB
L1i cache: 480 KiB
L2 cache: 15 MiB
L3 cache: 32 MiB
NUMA node0 CPU(s): 0-29
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB fil
ling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf
lush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm cons
tant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pcl
mulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx
f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw topoext invpcid_single ssbd ibrs ibpb stibp vmmcall fsgs
base tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed ad
x smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveop
t xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr wbnoinvd arat avx512v
bmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512
_vpopcntdq rdpid fsrm
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Provider: Google Cloud
VM: c3d-highcpu-30 (30 vCPU, 15 core, 59 GB memory) europe-west1, AMD Genoa
Distributed Llama version: 0.1.1
Each VM used 16 threads.
Average Single Token Generation Time
Llama 7B / Q40 Weights Q80 Buffer
Llama 13B / Q40 Weights Q80 Buffer
Llama 70B / Q40 Weights Q80 Buffer
Llama 70B / 1 VM
Llama 70B / 2 VM
Llama 70B / 4 VM
CPU Info
Beta Was this translation helpful? Give feedback.
All reactions