Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: windows support #63

Merged
merged 34 commits into from
May 27, 2024
Merged

Conversation

DifferentialityDevelopment
Copy link
Contributor

Was able to get it working on Windows, still need to do some cleaning up and more thorough testing.

@DifferentialityDevelopment
Copy link
Contributor Author

I do need to do some alterations as it doesn't currently build on Linux, but it should be minor changes.
Though I've confirmed that Windows seems to work correctly at least.

@b4rtaz
Copy link
Owner

b4rtaz commented May 24, 2024

Nice job @DifferentialityDevelopment!

First thoughts:

  • we should not commit binary files,
  • I think we don't need the pthreads-w32 dependency, this project uses just 2 functions from pthread.h. You can easily replace them by using CreateThread and WaitForSingleObject from windows.h. You can check how it's done in llama.cpp.
  • I'm not sure if we need the allocateAndLockMemory function. I didn't observe any significant improvement by using posix_memalign, maybe this could be removed. Maybe calloc would be enought? Or maybe here we need some tests.

@DifferentialityDevelopment
Copy link
Contributor Author

Yeah the changes in utils.cpp were messy, haven't yet gotten around to working my backwards so that it changes the least amount of code while still working.
Malloc just wouldn't work with the size of what needed to be allocated and I wasn't having much luck with the windows API functions for allocating large amount of memory and ended up just using the simple vector approach which worked like a charm.

It's a first working draft, can modify it forwards to remove the dependency on pthreads-win32

@DifferentialityDevelopment
Copy link
Contributor Author

Still need to do some more testing and more refinement, but it does at least build again now on both Linux & Windows, also removed pthreads-win32 dependency

src/socket.cpp Outdated Show resolved Hide resolved
src/socket.cpp Outdated Show resolved Hide resolved
src/socket.cpp Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@b4rtaz
Copy link
Owner

b4rtaz commented May 25, 2024

Can we add Windows to .github/workflows/main.yml?

@DifferentialityDevelopment
Copy link
Contributor Author

Can we add Windows to .github/workflows/main.yml?

That would be great, then we know if a change breaks either platform :)
I'll do some refactoring on the code tonight to clean it up a bit.

@DifferentialityDevelopment
Copy link
Contributor Author

I think the main things left to refactor are the changes in transformers.cpp & utils.cpp

@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented May 25, 2024

I refactored utils.cpp, gracefullyAllocateBuffer acts like a fallback for allocating the memory buffer, which also sorts out the weirdness that happens if you happen to not run distributed-llama as sudo.

I was able to run dllama without running it as sudo in Linux using this approach

./dllama inference --model /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama_original_q40.bin --tokenizer /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama-llama3-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world"
💡 arch: llama2
💡 dim: 4096
💡 hiddenDim: 14336
💡 nLayers: 32
💡 nHeads: 32
💡 nKvHeads: 8
💡 vocabSize: 128256
💡 seqLen: 2048
💡 nSlices: 1
💡 ropeTheta: 500000.0
📄 bosId: 128000
📄 eosId: 128001
mmap succeeded. data = 0x7f8c7260a000
weights = 0x7f8c7260a060
🕒 ropeCache: 32768 kB
⏩ Loaded 6175568 kB
🔶 G 421 ms I 421 ms T 0 ms S 0 kB R 0 kB Hello
🔶 G 382 ms I 382 ms T 0 ms S 0 kB R 0 kB world
🔶 G 421 ms I 420 ms T 0 ms S 0 kB R 0 kB !
🔶 G 385 ms I 384 ms T 0 ms S 0 kB R 0 kB This
🔶 G 390 ms I 389 ms T 0 ms S 0 kB R 0 kB is
🔶 G 377 ms I 377 ms T 0 ms S 0 kB R 0 kB a
🔶 G 389 ms I 387 ms T 1 ms S 0 kB R 0 kB test
🔶 G 395 ms I 395 ms T 0 ms S 0 kB R 0 kB of
🔶 G 381 ms I 380 ms T 1 ms S 0 kB R 0 kB the
🔶 G 376 ms I 374 ms T 1 ms S 0 kB R 0 kB emergency
🔶 G 453 ms I 451 ms T 2 ms S 0 kB R 0 kB broadcast
🔶 G 421 ms I 420 ms T 1 ms S 0 kB R 0 kB system
🔶 G 423 ms I 421 ms T 1 ms S 0 kB R 0 kB .

@DifferentialityDevelopment
Copy link
Contributor Author

I've updated the readme as well.

@b4rtaz
Copy link
Owner

b4rtaz commented May 27, 2024

Please let me know when I can review again (still for example main.yml is not updated).

@DifferentialityDevelopment
Copy link
Contributor Author

Please let me know when I can review again (still for example main.yml is not updated).

Your welcome to review again.

I haven't much experience with Github workflows, but will try and update main.yml to include a windows build 👍

.github/workflows/main.yml Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know how I swapped those two around! Must have been by accident..

@b4rtaz b4rtaz changed the title feat: Windows support feat: windows support May 27, 2024
@b4rtaz
Copy link
Owner

b4rtaz commented May 27, 2024

I reverted .github/workflows/main.yml to the previous approach. The new approach didn't test all Linux CPUs.

@b4rtaz b4rtaz merged commit 2fa9d9f into b4rtaz:main May 27, 2024
3 checks passed
@DifferentialityDevelopment DifferentialityDevelopment deleted the windows-support branch May 28, 2024 06:24
@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented May 28, 2024

@b4rtaz It seems you removed a bit of code that was necessary in transformers.cpp

#ifdef _WIN32
#define ftell(fp) _ftelli64(fp)
#define fseek(fp, offset, origin) _fseeki64(fp, offset, origin)
#endif

Without it I am unable to load the model files.

./dllama-api.exe --model D:\openchat-3.6-8b-20240522-distributed\dllama_model_openchat-3.6-8b-20240522_q40.m --tokenizer D:\openchat-3.6-8b-20240522-distributed\dllama_tokenizer_llama3.t --weights-float-type q40 --buffer-float-type q80 --nthreads 8 --chat-template openchat3 --port 10111
­ƒÆí arch: llama
­ƒÆí hiddenAct: silu
­ƒÆí dim: 4096
­ƒÆí hiddenDim: 14336
­ƒÆí nLayers: 32
­ƒÆí nHeads: 32
­ƒÆí nKvHeads: 8
­ƒÆí vocabSize: 128256
­ƒÆí seqLen: 8192
­ƒÆí nSlices: 1
­ƒÆí ropeTheta: 500000.0
terminate called after throwing an instance of 'std::runtime_error'
what(): Error determining model file size

If I add it back in then it works.

@b4rtaz
Copy link
Owner

b4rtaz commented May 28, 2024

@DifferentialityDevelopment ah, sorry! But how this is able to compile now? If #define ftell(fp) _ftelli64(fp) overrides anything, I think it would be better to create an own function, like seekToEnd or similiar (with #ifdef _WIN32 ... else).

@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented May 28, 2024

@DifferentialityDevelopment ah, sorry! But how this is able to compile now? If #define ftell(fp) _ftelli64(fp) overrides anything, I think it would be better to create an own function, like seekToEnd or similiar (with #ifdef _WIN32 ... else).

It compiles fine because the arguments for both functions are the same, so on Windows, it's usually better to use _ftelli64 and _fseeki64 instead of the standard C++ functions.

That said if an argument happens to end on ftell it would be overwritten, which I guess is where your coming from.

Maybe something like:

#ifdef _WIN32
static inline long fileGetLength(FILE* fp){
return _ftelli64(fp);
}
static inline long fileSeekEnd(FILE* fp, long offset, int origin){
return _fseeki64(fp, offset, origin);
}
#else
static inline long fileGetLength(FILE* fp){
return ftell(fp);
}
static inline long fileSeekEnd(FILE* fp, long offset, int origin){
return fseek(fp, offset, origin);
}
#endif

@b4rtaz
Copy link
Owner

b4rtaz commented May 28, 2024

Does this PR solve the problem?

@DifferentialityDevelopment
Copy link
Contributor Author

Does this PR solve the problem?

I'll give it a try, but it looks like it should work just fine 👍

@DifferentialityDevelopment
Copy link
Contributor Author

Does this PR solve the problem?

I'll give it a try, but it looks like it should work just fine 👍

Not sure what's going on exactly now, I've just tried it out but now I'm getting "Cannot open file" error, will check if I did something wrong.

@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented May 28, 2024

@b4rtaz
I figured it out, you forgot to add fclose at the end of loadSpecFromFile

Other than that it works perfect, thank you!

@b4rtaz
Copy link
Owner

b4rtaz commented May 28, 2024

@DifferentialityDevelopment thanks for help and sorry the problem. Probably I need some Windows envenvironment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants