← back home · compare
fast-axolotl vs stock Axolotl
Baseline — Axolotl with no acceleration shim
This is the no-op comparison: fast-axolotl is meant to replace pieces of stock Axolotl, not the whole thing. Most rows here are simply "Python path" vs "Rust path".
| Feature | fast-axolotl | stock Axolotl | Advantage |
|---|---|---|---|
| Streaming reader | Rust streaming_dataset_reader (77x at 50k rows, per README) | HuggingFace datasets Python streaming | fast-axolotl |
| Dedupe / hashing | Multi-threaded SHA256, 1.9x at 100k rows | hashlib in a Python loop | fast-axolotl |
| Token packing (10k seqs benchmark) | Rust pack_sequences, 0.42x in README benchmark | Python torch.cat loop | stock Axolotl |
| Batch padding (10k seqs benchmark) | Rust pad_sequences, 0.53x in README benchmark | Python padding loop | stock Axolotl |
| Compression support | ZSTD + Gzip transparent | Depends on backing library | fast-axolotl |
| Install shape | pip / uv add fast-axolotl + import | No-op (already installed) | stock Axolotl |
| Source changes to Axolotl | None — sys.modules shim at import | None | Even |
| Python / OS matrix | 3.10–3.13 on Linux, macOS, Windows | Whatever your Axolotl install supports | Even |
| License | MIT | Axolotl upstream license | Even |
Pick fast-axolotl when
- ▸You hit OOM or long stalls on large Parquet / Arrow / JSONL datasets
- ▸Your dedupe step is the long pole — parallel SHA256 is a 1.9x win at 100k rows
- ▸You want fewer custom YAML knobs and prefer drop-in over forking the trainer
- ▸You are on Python 3.10–3.13 with the standard wheel matrix
Pick stock Axolotl when
- ▸Your dataset is small enough that the FFI overhead on packing / padding hurts
- ▸You can't install a Rust-built wheel for policy reasons
- ▸You want zero new third-party dependencies in the trainer
Still deciding?
Most fine-tune teams use more than one accelerator at once. Pin fast-axolotl on the data pipeline, keep stock Axolotl wherever its strengths actually move the wall-clock number.