# MLX-VLM

## Docs

- [Model conversion](https://mintlify.wiki/yocxy2/mlx-vlm/advanced/model-conversion.md): Convert Hugging Face vision-language models to MLX format, with optional quantization.
- [Quantization](https://mintlify.wiki/yocxy2/mlx-vlm/advanced/quantization.md): Reduce model memory usage and speed up inference by converting model weights to lower bit widths.
- [Thinking models](https://mintlify.wiki/yocxy2/mlx-vlm/advanced/thinking-models.md): Use chain-of-thought reasoning models with MLX-VLM, including token budget control.
- [convert()](https://mintlify.wiki/yocxy2/mlx-vlm/api/convert.md): Convert a Hugging Face vision-language model to MLX format, with optional quantization and Hugging Face Hub upload.
- [generate() and friends](https://mintlify.wiki/yocxy2/mlx-vlm/api/generate.md): Functions for generating text from vision-language models: single-turn, streaming, and batched generation.
- [load()](https://mintlify.wiki/yocxy2/mlx-vlm/api/load.md): Load a vision-language model and its processor from a local path or Hugging Face repository.
- [Prompt utilities](https://mintlify.wiki/yocxy2/mlx-vlm/api/prompt-utils.md): Functions for formatting prompts and messages in the correct structure for each model type.
- [REST API endpoints](https://mintlify.wiki/yocxy2/mlx-vlm/api/server-endpoints.md): HTTP endpoints exposed by the MLX-VLM FastAPI server, including OpenAI-compatible chat completions, responses, model listing, health check, and model unloading.
- [Dataset Preparation](https://mintlify.wiki/yocxy2/mlx-vlm/fine-tuning/dataset-preparation.md): Structure your Hugging Face dataset with the correct image and message format for your target model before fine-tuning.
- [LoRA & QLoRA Training](https://mintlify.wiki/yocxy2/mlx-vlm/fine-tuning/lora.md): Run parameter-efficient fine-tuning on vision language models using LoRA or QLoRA with the MLX trainer backend.
- [Fine-Tuning Overview](https://mintlify.wiki/yocxy2/mlx-vlm/fine-tuning/overview.md): Adapt pre-trained vision language models to your specific tasks and domains using LoRA, QLoRA, or full fine-tuning on Apple Silicon.
- [CLI reference](https://mintlify.wiki/yocxy2/mlx-vlm/inference/cli.md): Run inference from your terminal using mlx_vlm.generate, mlx_vlm.chat_ui, mlx_vlm.convert, and mlx_vlm.server.
- [Multi-image analysis](https://mintlify.wiki/yocxy2/mlx-vlm/inference/multi-image.md): Pass multiple images in a single request to compare, analyse, or reason across them simultaneously.
- [Python API](https://mintlify.wiki/yocxy2/mlx-vlm/inference/python-api.md): Use load, apply_chat_template, and generate to run inference from Python code.
- [REST API server](https://mintlify.wiki/yocxy2/mlx-vlm/inference/server.md): Serve MLX-VLM models over HTTP with an OpenAI-compatible API using mlx_vlm.server.
- [Video understanding](https://mintlify.wiki/yocxy2/mlx-vlm/inference/video.md): Analyse, caption, and summarise video files using mlx_vlm.video_generate.
- [Installation](https://mintlify.wiki/yocxy2/mlx-vlm/installation.md): Install MLX-VLM on your Mac with pip. Requires Python 3.10+ and macOS with Apple Silicon for best performance.
- [Introduction](https://mintlify.wiki/yocxy2/mlx-vlm/introduction.md): MLX-VLM is a Python library for inference and fine-tuning of Vision Language Models (VLMs) and Omni Models on Mac using Apple's MLX framework.
- [Model-specific guides](https://mintlify.wiki/yocxy2/mlx-vlm/models/model-specific-guides.md): Special prompt formats, capabilities, and usage examples for models that differ from the standard workflow.
- [Supported models](https://mintlify.wiki/yocxy2/mlx-vlm/models/supported-models.md): All model architectures supported by MLX-VLM, grouped by family, with capability notes.
- [Quickstart](https://mintlify.wiki/yocxy2/mlx-vlm/quickstart.md): Run your first Vision Language Model inference on Mac in minutes using the CLI or Python API.

## OpenAPI Specs

- [openapi](https://mintlify.wiki/yocxy2/mlx-vlm/api-reference/openapi.json)