Skip to content
fast-axolotl_

← back home · compare

fast-axolotl vs stock Axolotl

Baseline — Axolotl with no acceleration shim

This is the no-op comparison: fast-axolotl is meant to replace pieces of stock Axolotl, not the whole thing. Most rows here are simply "Python path" vs "Rust path".

Feature fast-axolotl stock Axolotl Advantage
Streaming reader Rust streaming_dataset_reader (77x at 50k rows, per README) HuggingFace datasets Python streaming fast-axolotl
Dedupe / hashing Multi-threaded SHA256, 1.9x at 100k rows hashlib in a Python loop fast-axolotl
Token packing (10k seqs benchmark) Rust pack_sequences, 0.42x in README benchmark Python torch.cat loop stock Axolotl
Batch padding (10k seqs benchmark) Rust pad_sequences, 0.53x in README benchmark Python padding loop stock Axolotl
Compression support ZSTD + Gzip transparent Depends on backing library fast-axolotl
Install shape pip / uv add fast-axolotl + import No-op (already installed) stock Axolotl
Source changes to Axolotl None — sys.modules shim at import None Even
Python / OS matrix 3.10–3.13 on Linux, macOS, Windows Whatever your Axolotl install supports Even
License MIT Axolotl upstream license Even

Pick fast-axolotl when

  • You hit OOM or long stalls on large Parquet / Arrow / JSONL datasets
  • Your dedupe step is the long pole — parallel SHA256 is a 1.9x win at 100k rows
  • You want fewer custom YAML knobs and prefer drop-in over forking the trainer
  • You are on Python 3.10–3.13 with the standard wheel matrix

Pick stock Axolotl when

  • Your dataset is small enough that the FFI overhead on packing / padding hurts
  • You can't install a Rust-built wheel for policy reasons
  • You want zero new third-party dependencies in the trainer

Still deciding?

Most fine-tune teams use more than one accelerator at once. Pin fast-axolotl on the data pipeline, keep stock Axolotl wherever its strengths actually move the wall-clock number.