-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: windows support #63
feat: windows support #63
Conversation
I do need to do some alterations as it doesn't currently build on Linux, but it should be minor changes. |
Nice job @DifferentialityDevelopment! First thoughts:
|
Yeah the changes in utils.cpp were messy, haven't yet gotten around to working my backwards so that it changes the least amount of code while still working. It's a first working draft, can modify it forwards to remove the dependency on pthreads-win32 |
Still need to do some more testing and more refinement, but it does at least build again now on both Linux & Windows, also removed pthreads-win32 dependency |
Can we add Windows to |
That would be great, then we know if a change breaks either platform :) |
I think the main things left to refactor are the changes in transformers.cpp & utils.cpp |
I refactored utils.cpp, gracefullyAllocateBuffer acts like a fallback for allocating the memory buffer, which also sorts out the weirdness that happens if you happen to not run distributed-llama as sudo. I was able to run dllama without running it as sudo in Linux using this approach ./dllama inference --model /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama_original_q40.bin --tokenizer /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama-llama3-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world" |
I've updated the readme as well. |
Please let me know when I can review again (still for example main.yml is not updated). |
Your welcome to review again. I haven't much experience with Github workflows, but will try and update main.yml to include a windows build 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know how I swapped those two around! Must have been by accident..
I reverted |
@b4rtaz It seems you removed a bit of code that was necessary in transformers.cpp #ifdef _WIN32 Without it I am unable to load the model files. ./dllama-api.exe --model D:\openchat-3.6-8b-20240522-distributed\dllama_model_openchat-3.6-8b-20240522_q40.m --tokenizer D:\openchat-3.6-8b-20240522-distributed\dllama_tokenizer_llama3.t --weights-float-type q40 --buffer-float-type q80 --nthreads 8 --chat-template openchat3 --port 10111 If I add it back in then it works. |
@DifferentialityDevelopment ah, sorry! But how this is able to compile now? If |
It compiles fine because the arguments for both functions are the same, so on Windows, it's usually better to use _ftelli64 and _fseeki64 instead of the standard C++ functions. That said if an argument happens to end on ftell it would be overwritten, which I guess is where your coming from. Maybe something like: #ifdef _WIN32 |
Does this PR solve the problem? |
I'll give it a try, but it looks like it should work just fine 👍 |
Not sure what's going on exactly now, I've just tried it out but now I'm getting "Cannot open file" error, will check if I did something wrong. |
@b4rtaz Other than that it works perfect, thank you! |
@DifferentialityDevelopment thanks for help and sorry the problem. Probably I need some Windows envenvironment. |
Was able to get it working on Windows, still need to do some cleaning up and more thorough testing.