# MLX-VLM ## Docs - [Model conversion](https://mintlify.wiki/yocxy2/mlx-vlm/advanced/model-conversion.md): Convert Hugging Face vision-language models to MLX format, with optional quantization. - [Quantization](https://mintlify.wiki/yocxy2/mlx-vlm/advanced/quantization.md): Reduce model memory usage and speed up inference by converting model weights to lower bit widths. - [Thinking models](https://mintlify.wiki/yocxy2/mlx-vlm/advanced/thinking-models.md): Use chain-of-thought reasoning models with MLX-VLM, including token budget control. - [convert()](https://mintlify.wiki/yocxy2/mlx-vlm/api/convert.md): Convert a Hugging Face vision-language model to MLX format, with optional quantization and Hugging Face Hub upload. - [generate() and friends](https://mintlify.wiki/yocxy2/mlx-vlm/api/generate.md): Functions for generating text from vision-language models: single-turn, streaming, and batched generation. - [load()](https://mintlify.wiki/yocxy2/mlx-vlm/api/load.md): Load a vision-language model and its processor from a local path or Hugging Face repository. - [Prompt utilities](https://mintlify.wiki/yocxy2/mlx-vlm/api/prompt-utils.md): Functions for formatting prompts and messages in the correct structure for each model type. - [REST API endpoints](https://mintlify.wiki/yocxy2/mlx-vlm/api/server-endpoints.md): HTTP endpoints exposed by the MLX-VLM FastAPI server, including OpenAI-compatible chat completions, responses, model listing, health check, and model unloading. - [Dataset Preparation](https://mintlify.wiki/yocxy2/mlx-vlm/fine-tuning/dataset-preparation.md): Structure your Hugging Face dataset with the correct image and message format for your target model before fine-tuning. - [LoRA & QLoRA Training](https://mintlify.wiki/yocxy2/mlx-vlm/fine-tuning/lora.md): Run parameter-efficient fine-tuning on vision language models using LoRA or QLoRA with the MLX trainer backend. - [Fine-Tuning Overview](https://mintlify.wiki/yocxy2/mlx-vlm/fine-tuning/overview.md): Adapt pre-trained vision language models to your specific tasks and domains using LoRA, QLoRA, or full fine-tuning on Apple Silicon. - [CLI reference](https://mintlify.wiki/yocxy2/mlx-vlm/inference/cli.md): Run inference from your terminal using mlx_vlm.generate, mlx_vlm.chat_ui, mlx_vlm.convert, and mlx_vlm.server. - [Multi-image analysis](https://mintlify.wiki/yocxy2/mlx-vlm/inference/multi-image.md): Pass multiple images in a single request to compare, analyse, or reason across them simultaneously. - [Python API](https://mintlify.wiki/yocxy2/mlx-vlm/inference/python-api.md): Use load, apply_chat_template, and generate to run inference from Python code. - [REST API server](https://mintlify.wiki/yocxy2/mlx-vlm/inference/server.md): Serve MLX-VLM models over HTTP with an OpenAI-compatible API using mlx_vlm.server. - [Video understanding](https://mintlify.wiki/yocxy2/mlx-vlm/inference/video.md): Analyse, caption, and summarise video files using mlx_vlm.video_generate. - [Installation](https://mintlify.wiki/yocxy2/mlx-vlm/installation.md): Install MLX-VLM on your Mac with pip. Requires Python 3.10+ and macOS with Apple Silicon for best performance. - [Introduction](https://mintlify.wiki/yocxy2/mlx-vlm/introduction.md): MLX-VLM is a Python library for inference and fine-tuning of Vision Language Models (VLMs) and Omni Models on Mac using Apple's MLX framework. - [Model-specific guides](https://mintlify.wiki/yocxy2/mlx-vlm/models/model-specific-guides.md): Special prompt formats, capabilities, and usage examples for models that differ from the standard workflow. - [Supported models](https://mintlify.wiki/yocxy2/mlx-vlm/models/supported-models.md): All model architectures supported by MLX-VLM, grouped by family, with capability notes. - [Quickstart](https://mintlify.wiki/yocxy2/mlx-vlm/quickstart.md): Run your first Vision Language Model inference on Mac in minutes using the CLI or Python API. ## OpenAPI Specs - [openapi](https://mintlify.wiki/yocxy2/mlx-vlm/api-reference/openapi.json)