When should I pick fast-axolotl over stock Axolotl?

You hit OOM or long stalls on large Parquet / Arrow / JSONL datasets Your dedupe step is the long pole — parallel SHA256 is a 1.9x win at 100k rows You want fewer custom YAML knobs and prefer drop-in over forking the trainer You are on Python 3.10–3.13 with the standard wheel matrix

When should I pick stock Axolotl over fast-axolotl?

Your dataset is small enough that the FFI overhead on packing / padding hurts You can't install a Rust-built wheel for policy reasons You want zero new third-party dependencies in the trainer

← back home · compare

fast-axolotl vs stock Axolotl

Baseline — Axolotl with no acceleration shim

This is the no-op comparison: fast-axolotl is meant to replace pieces of stock Axolotl, not the whole thing. Most rows here are simply "Python path" vs "Rust path".

Feature	fast-axolotl	stock Axolotl	Advantage
Streaming reader	Rust streaming_dataset_reader (77x at 50k rows, per README)	HuggingFace datasets Python streaming	fast-axolotl
Dedupe / hashing	Multi-threaded SHA256, 1.9x at 100k rows	hashlib in a Python loop	fast-axolotl
Token packing (10k seqs benchmark)	Rust pack_sequences, 0.42x in README benchmark	Python torch.cat loop	stock Axolotl
Batch padding (10k seqs benchmark)	Rust pad_sequences, 0.53x in README benchmark	Python padding loop	stock Axolotl
Compression support	ZSTD + Gzip transparent	Depends on backing library	fast-axolotl
Install shape	pip / uv add fast-axolotl + import	No-op (already installed)	stock Axolotl
Source changes to Axolotl	None — sys.modules shim at import	None	Even
Python / OS matrix	3.10–3.13 on Linux, macOS, Windows	Whatever your Axolotl install supports	Even
License	MIT	Axolotl upstream license	Even

Pick fast-axolotl when

▸You hit OOM or long stalls on large Parquet / Arrow / JSONL datasets
▸Your dedupe step is the long pole — parallel SHA256 is a 1.9x win at 100k rows
▸You want fewer custom YAML knobs and prefer drop-in over forking the trainer
▸You are on Python 3.10–3.13 with the standard wheel matrix

Pick stock Axolotl when

▸Your dataset is small enough that the FFI overhead on packing / padding hurts
▸You can't install a Rust-built wheel for policy reasons
▸You want zero new third-party dependencies in the trainer

Still deciding?

Most fine-tune teams use more than one accelerator at once. Pin fast-axolotl on the data pipeline, keep stock Axolotl wherever its strengths actually move the wall-clock number.

View on GitHub Read the engineering notes