Skip to content

Latest commit

 

History

History
147 lines (122 loc) · 5.46 KB

README_MACOS.md

File metadata and controls

147 lines (122 loc) · 5.46 KB

MACOS

CPU

Choose way to install Rust.

  • Native Rust:

    Install Rust:

    curl –proto ‘=https’ –tlsv1.2 -sSf https://sh.rustup.rs | sh

    Enter new shell and test: rustc --version

    When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '-march=native'_ during pip install. If so, set your archflags during pip install. eg: ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt

    If you encounter an error while building a wheel during the pip install process, you may need to install a C++ compiler on your computer.

    Setup environment:

    conda create -n h2ogpt python=3.10
    conda activate h2ogpt
    pip install -r requirements.txt
  • Conda Rust:

    If native rust does not work, try using conda way by creating conda environment with Python 3.10 and Rust.

    conda create -n h2ogpt python=3.10 rust
    conda activate h2ogpt
    pip install -r requirements.txt

To run CPU mode with default model, do:

python generate.py --base_model='llama' --prompt_type=wizard2 --score_model=None --langchain_mode='UserData' --user_path=user_path

For the above, ignore the CLI output saying 0.0.0.0, and instead point browser at http://localhost:7860 (for windows/mac) or the public live URL printed by the server (disable shared link with --share=False). To support document Q/A jump to Install Optional Dependencies.


GPU (MPS Mac M1)

Note: Few fixes required for MPS support are not available in torch 2.0, make sure to install torch 2.1 from nightly as shown below.

  1. Create conda environment with Python 3.10 and Rust.
    conda create -n h2ogpt python=3.10 rust
    conda activate h2ogpt
  2. Install torch dependencies from nightly build to get latest mps support
    pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
  3. Verify whether torch uses mps, run below python script.
     import torch
     if torch.backends.mps.is_available():
         mps_device = torch.device("mps")
         x = torch.ones(1, device=mps_device)
         print (x)
     else:
         print ("MPS device not found.")
    Output
    tensor([1.], device='mps:0')
  4. Install other h2ogpt requirements
    pip install -r requirements.txt
  5. Run h2oGPT (without document Q/A):
    python generate.py --base_model=h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b --cli=True

For the above, ignore the CLI output saying 0.0.0.0, and instead point browser at http://localhost:7860 (for windows/mac) or the public live URL printed by the server (disable shared link with --share=False).

To support document Q/A jump to Install Optional Dependencies.


Document Q/A dependencies

# Required for Doc Q/A: LangChain:
pip install -r reqs_optional/requirements_optional_langchain.txt
# Required for CPU: LLaMa/GPT4All:
pip install -r reqs_optional/requirements_optional_gpt4all.txt
# Optional: PyMuPDF/ArXiv:
pip install -r reqs_optional/requirements_optional_langchain.gpllike.txt
# Optional: Selenium/PlayWright:
pip install -r reqs_optional/requirements_optional_langchain.urls.txt
# Optional: for supporting unstructured package
python -m nltk.downloader all

and for supporting Word and Excel documents, download libreoffice: https://www.libreoffice.org/download/download-libreoffice/ . To support OCR, install tesseract and other dependencies:

brew install libmagic
brew link libmagic
brew install poppler
brew install tesseract --all-languages

Then for document Q/A with UI using CPU:

python generate.py --base_model='llama' --prompt_type=wizard2 --score_model=None --langchain_mode='UserData' --user_path=user_path

or MPS:

python generate.py --base_model=h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b --langchain_mode=MyData --score_model=None

For the above, ignore the CLI output saying 0.0.0.0, and instead point browser at http://localhost:7860 (for windows/mac) or the public live URL printed by the server (disable shared link with --share=False).

See CPU and GPU for some other general aspects about using h2oGPT on CPU or GPU, such as which models to try.


GPU with LLaMa

Note: Currently llama-cpp-python only supports v3 ggml 4 bit quantized models for MPS, so use llama models ends with ggmlv3 & q4_x.bin.

  1. Install dependencies
    # Required for Doc Q/A: LangChain:
    pip install -r reqs_optional/requirements_optional_langchain.txt
    # Required for CPU: LLaMa/GPT4All:
    pip install -r reqs_optional/requirements_optional_gpt4all.txt
  2. Install the LATEST llama-cpp-python...which happily supports MacOS Metal GPU as of version 0.1.62 (you should now have llama-cpp-python v0.1.62 or higher installed)
    pip uninstall llama-cpp-python -y
    CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir
  3. Edit below settings in .env_gpt4all
    • Uncomment line with n_gpu_layers=20
    • Change model name with your preferred model at line with model_path_llama=WizardLM-7B-uncensored.ggmlv3.q8_0.bin
  4. Run LLaMa model
    python generate.py --base_model='llama' --cli==True