You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
n4no_com@instance-3:~/distributed-llama$ sudo nice -n -20 ./main inference --model ../dllama_llama-2-7b_q40.bin --tokenizer ../tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 16
💡 dim: 4096
💡 hiddenDim: 11008
💡 nLayers: 32
💡 nHeads: 32
💡 nKvHeads: 32
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
⏩ Loaded 4242882560 bytes
🔶 G 133 ms I 133 ms T 0 ms S 0 kB R 0 kB Hello
🔶 G 131 ms I 130 ms T 0 ms S 0 kB R 0 kB world
🔶 G 130 ms I 129 ms T 1 ms S 0 kB R 0 kB !
🔶 G 131 ms I 129 ms T 1 ms S 0 kB R 0 kB I
🔶 G 131 ms I 130 ms T 0 ms S 0 kB R 0 kB am
🔶 G 131 ms I 130 ms T 0 ms S 0 kB R 0 kB so
🔶 G 131 ms I 130 ms T 0 ms S 0 kB R 0 kB glad
🔶 G 131 ms I 131 ms T 0 ms S 0 kB R 0 kB to
🔶 G 131 ms I 131 ms T 0 ms S 0 kB R 0 kB have
🔶 G 132 ms I 132 ms T 0 ms S 0 kB R 0 kB found
🔶 G 132 ms I 131 ms T 0 ms S 0 kB R 0 kB you
🔶 G 131 ms I 131 ms T 0 ms S 0 kB R 0 kB !
🔶 G 132 ms I 132 ms T 0 ms S 0 kB R 0 kB I
🔶 G 133 ms I 132 ms T 1 ms S 0 kB R 0 kB ’
🔶 G 136 ms I 135 ms T 1 ms S 0 kB R 0 kB m
🔶 G 132 ms I 131 ms T 1 ms S 0 kB R 0 kB really
Generated tokens: 16
Avg generation time: 131.75 ms
Avg inference time: 131.06 ms
Avg transfer time: 0.31 ms
2 VMs
n4no_com@instance-3:~/distributed-llama$ sudo nice -n -20 ./main inference --model ../dllama_llama-2-7b_q40.bin --tokenizer ../tokenizer.bin --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 16 --workers 10.164.0.3:9998
💡 dim: 4096
💡 hiddenDim: 11008
💡 nLayers: 32
💡 nHeads: 32
💡 nKvHeads: 32
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 2
⏩ Loaded 4242882560 bytes
🔶 G 80 ms I 76 ms T 4 ms S 1779278 kB R 522 kB Hello
🔶 G 79 ms I 63 ms T 16 ms S 590 kB R 522 kB world
🔶 G 80 ms I 64 ms T 16 ms S 590 kB R 522 kB ,
🔶 G 81 ms I 66 ms T 14 ms S 590 kB R 522 kB I
🔶 G 80 ms I 70 ms T 9 ms S 590 kB R 522 kB '
🔶 G 81 ms I 72 ms T 8 ms S 590 kB R 522 kB m
🔶 G 81 ms I 72 ms T 8 ms S 590 kB R 522 kB Chris
🔶 G 81 ms I 73 ms T 8 ms S 590 kB R 522 kB .
🔶 G 84 ms I 75 ms T 6 ms S 590 kB R 522 kB prü
🔶 G 81 ms I 73 ms T 7 ms S 590 kB R 522 kB m
🔶 G 81 ms I 76 ms T 5 ms S 590 kB R 522 kB .
🔶 G 82 ms I 74 ms T 7 ms S 590 kB R 522 kB
🔶 G 81 ms I 76 ms T 5 ms S 590 kB R 522 kB I
🔶 G 82 ms I 73 ms T 8 ms S 590 kB R 522 kB '
🔶 G 82 ms I 74 ms T 7 ms S 590 kB R 522 kB m
🔶 G 82 ms I 74 ms T 8 ms S 590 kB R 522 kB an
Generated tokens: 16
Avg generation time: 81.12 ms
Avg inference time: 71.94 ms
Avg transfer time: 8.50 ms
n4no_com@instance-4:~/distributed-llama$ sudo nice -n -20 ./main worker --port 9998 --nthreads 16
Listening on 0.0.0.0:9998...
Client connected
💡 sliceIndex: 1
💡 nSlices: 2
⏩ Received 56918016 bytes for block 0 (1459436 kB/s)
⏩ Received 56918016 bytes for block 1 (1459436 kB/s)
⏩ Received 56918016 bytes for block 2 (1422950 kB/s)
⏩ Received 56918016 bytes for block 3 (1459436 kB/s)
⏩ Received 56918016 bytes for block 4 (1422950 kB/s)
⏩ Received 56918016 bytes for block 5 (1459436 kB/s)
⏩ Received 56918016 bytes for block 6 (1459436 kB/s)
⏩ Received 56918016 bytes for block 7 (1422950 kB/s)
⏩ Received 56918016 bytes for block 8 (1422950 kB/s)
⏩ Received 56918016 bytes for block 9 (1459436 kB/s)
⏩ Received 56918016 bytes for block 10 (1422950 kB/s)
⏩ Received 56918016 bytes for block 11 (1459436 kB/s)
⏩ Received 56918016 bytes for block 12 (1459436 kB/s)
⏩ Received 56918016 bytes for block 13 (1459436 kB/s)
⏩ Received 56918016 bytes for block 14 (1422950 kB/s)
⏩ Received 56918016 bytes for block 15 (1459436 kB/s)
⏩ Received 56918016 bytes for block 16 (1459436 kB/s)
⏩ Received 56918016 bytes for block 17 (1422950 kB/s)
⏩ Received 56918016 bytes for block 18 (1459436 kB/s)
⏩ Received 56918016 bytes for block 19 (1459436 kB/s)
⏩ Received 56918016 bytes for block 20 (1422950 kB/s)
⏩ Received 56918016 bytes for block 21 (1422950 kB/s)
⏩ Received 56918016 bytes for block 22 (1459436 kB/s)
⏩ Received 56918016 bytes for block 23 (1459436 kB/s)
⏩ Received 56918016 bytes for block 24 (1422950 kB/s)
⏩ Received 56918016 bytes for block 25 (1459436 kB/s)
⏩ Received 56918016 bytes for block 26 (1459436 kB/s)
⏩ Received 56918016 bytes for block 27 (1459436 kB/s)
⏩ Received 56918016 bytes for block 28 (1422950 kB/s)
⏩ Received 56918016 bytes for block 29 (1459436 kB/s)
⏩ Received 56918016 bytes for block 30 (1422950 kB/s)
⏩ Received 56918016 bytes for block 31 (1422950 kB/s)
Error receiving data: socket closed
CPU Info
CPU:
n4no_com@instance-4:~/distributed-llama$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9B14
Stepping: 1
CPU MHz: 2599.998
BogoMIPS: 5199.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 256 KiB
L1i cache: 256 KiB
L2 cache: 8 MiB
L3 cache: 32 MiB
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB fil
ling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf
lush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm cons
tant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pcl
mulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx
f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw topoext invpcid_single ssbd ibrs ibpb stibp vmmcall fsgs
base tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed ad
x smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveop
t xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr wbnoinvd arat avx512v
bmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512
_vpopcntdq rdpid fsrm
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Provider: Google Cloud
VM:
c3d-standard-16 (16 vCPU, 8 core, 64 GB memory)
europe-west4, AMD GenoaAverage Single Token Generation Time
Llama 7B Q40 Weights Q80 Buffer
1 VM
2 VMs
CPU Info
CPU:
Beta Was this translation helpful? Give feedback.
All reactions