Run other Models
Running other models link
Do you have already a model file? Skip to Run models manually.
To load models into LocalAI, you can either use models manually or configure LocalAI to pull the models from external sources, like Huggingface and configure the model.
To do that, you can point LocalAI to an URL to a YAML configuration file - however - LocalAI does also have some popular model configuration embedded in the binary as well. Below you can find a list of the models configuration that LocalAI has pre-built, see Model customization on how to configure models from URLs.
There are different categories of models: LLMs, Multimodal LLM , Embeddings, Audio to Text, and Text to Audio depending on the backend being used and the model architecture.
💡
To customize the models, see Model customization. For more model configurations, visit the Examples Section and the configurations for the models below is available here.
💡Don’t need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies
Model | Category | Docker command |
---|---|---|
phi-2 | LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core phi-2 |
🌋 bakllava | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core bakllava |
🌋 llava-1.5 | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core llava-1.5 |
🌋 llava-1.6-mistral | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core llava-1.6-mistral |
🌋 llava-1.6-vicuna | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core llava-1.6-vicuna |
mistral-openorca | LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core mistral-openorca |
bert-cpp | Embeddings | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core bert-cpp |
all-minilm-l6-v2 | Embeddings | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg all-minilm-l6-v2 |
whisper-base | Audio to Text | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core whisper-base |
rhasspy-voice-en-us-amy | Text to Audio | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core rhasspy-voice-en-us-amy |
🐸 coqui | Text to Audio | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg coqui |
🐶 bark | Text to Audio | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg bark |
🔊 vall-e-x | Text to Audio | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg vall-e-x |
mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core mixtral-instruct |
tinyllama-chat original model | LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core tinyllama-chat |
dolphin-2.5-mixtral-8x7b | LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core dolphin-2.5-mixtral-8x7b |
🐍 mamba | LLM | GPU-only |
animagine-xl | Text to Image | GPU-only |
transformers-tinyllama | LLM | GPU-only |
codellama-7b (with transformers) | LLM | GPU-only |
codellama-7b-gguf (with llama.cpp) | LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core codellama-7b-gguf |
hermes-2-pro-mistral | LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core hermes-2-pro-mistral |
To know which version of CUDA do you have available, you can check with
nvidia-smi
ornvcc --version
see also GPU acceleration.
Model | Category | Docker command |
---|---|---|
phi-2 | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core phi-2 |
🌋 bakllava | Multimodal LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core bakllava |
🌋 llava-1.5 | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda11-core llava-1.5 |
🌋 llava-1.6-mistral | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda11-core llava-1.6-mistral |
🌋 llava-1.6-vicuna | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda11-core llava-1.6-vicuna |
mistral-openorca | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core mistral-openorca |
bert-cpp | Embeddings | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core bert-cpp |
all-minilm-l6-v2 | Embeddings | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 all-minilm-l6-v2 |
whisper-base | Audio to Text | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core whisper-base |
rhasspy-voice-en-us-amy | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core rhasspy-voice-en-us-amy |
🐸 coqui | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 coqui |
🐶 bark | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 bark |
🔊 vall-e-x | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 vall-e-x |
mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core mixtral-instruct |
tinyllama-chat original model | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core tinyllama-chat |
dolphin-2.5-mixtral-8x7b | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core dolphin-2.5-mixtral-8x7b |
🐍 mamba | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 mamba-chat |
animagine-xl | Text to Image | docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:v2.23.0-cublas-cuda11 animagine-xl |
transformers-tinyllama | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 transformers-tinyllama |
codellama-7b | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11 codellama-7b |
codellama-7b-gguf | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core codellama-7b-gguf |
hermes-2-pro-mistral | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda11-core hermes-2-pro-mistral |
To know which version of CUDA do you have available, you can check with
nvidia-smi
ornvcc --version
see also GPU acceleration.
Model | Category | Docker command |
---|---|---|
phi-2 | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core phi-2 |
🌋 bakllava | Multimodal LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core bakllava |
🌋 llava-1.5 | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda12-core llava-1.5 |
🌋 llava-1.6-mistral | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda12-core llava-1.6-mistral |
🌋 llava-1.6-vicuna | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:v2.23.0-cublas-cuda12-core llava-1.6-vicuna |
mistral-openorca | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core mistral-openorca |
bert-cpp | Embeddings | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core bert-cpp |
all-minilm-l6-v2 | Embeddings | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 all-minilm-l6-v2 |
whisper-base | Audio to Text | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core whisper-base |
rhasspy-voice-en-us-amy | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core rhasspy-voice-en-us-amy |
🐸 coqui | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 coqui |
🐶 bark | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 bark |
🔊 vall-e-x | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 vall-e-x |
mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core mixtral-instruct |
tinyllama-chat original model | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core tinyllama-chat |
dolphin-2.5-mixtral-8x7b | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core dolphin-2.5-mixtral-8x7b |
🐍 mamba | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 mamba-chat |
animagine-xl | Text to Image | docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:v2.23.0-cublas-cuda12 animagine-xl |
transformers-tinyllama | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 transformers-tinyllama |
codellama-7b | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12 codellama-7b |
codellama-7b-gguf | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core codellama-7b-gguf |
hermes-2-pro-mistral | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:v2.23.0-cublas-cuda12-core hermes-2-pro-mistral |
💡
Tip You can actually specify multiple models to start an instance with the models loaded, for example to have both llava and phi-2 configured:
docker run -ti -p 8080:8080 localai/localai:v2.23.0-ffmpeg-core llava phi-2
Last updated 18 Jul 2024, 11:25 +0200 . history