hyperfine
a benchmarking utility
hyperfine
is a shell benchmarking tool that provides statistics of a command’s execution time. It iterates on the work of previous tools like bench by allowing for pre-run and pre-benchmark commands to reset the state of the system. It’s cross-platform and provides both human and machine-readable output.
I’m covering hyperfine
early in cli.fan so subsequent posts can use it to compare similar tools.
Quick Facts | |
---|---|
Version Control | git on github |
Author | David Peter (sharkdp) |
Language | Rust |
Binary Size | 3.5M |
Installation
hyperfine
is distributed as a statically built binary for x86 Windows, Linux, and Mac OSX systems. Yes, it runs on Windows! No, I haven’t tried it.
It’s also available in the following package managers:
Platform | Installation Command |
---|---|
HomeBrew | brew install hyperfine |
Cargo | cargo install hyperfine |
AUR | yaort -S hyperfine |
Void Linux | xbps-install -S hyperfine |
Usage
The simplest usage of hyperfine
is timing a single command. Here’s the output of hyperfine ls
on the source of this blog.
$ hyperfine ls
Benchmark #1: ls
Time (mean ± σ): 0.7 ms ± 0.1 ms [User: 0.6 ms, System: 0.1 ms]
Range (min … max): 0.4 ms … 2.7 ms 2125 runs
Warning: Command took less than 5 ms to complete. Results might be inaccurate.
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
A couple things to note there- first, you’re only seeing the standard output of this command. A progressbar is displayed while measuring the results:
Second, the two warnings at the bottom tell you about hyperfine
’s statistical methodology. hyperfine
attempts calculation of the mean time of execution and the standard deviation. When a command completes very quickly (like ls
in a directory with only 12 entries), these results may not represent the timing of the command very well. Likewise, outliers can substantially affect these non-stable statistics.
Before discussing how to reduce the influence of outliers, I want to draw your attention to the run count in the snippet above, beneath User: 0.6ms
. For this program’s timing run, hyperfine
decided to run ls
2125 times. hyperfine
will automatically adjust the count of runs depending on the time for each individual run. By default, hyperfine
chooses ten runs as the lower bound and will adjust upward if the command completes quickly. You can override this behavior with the -r
/--runs
flag, which specifies an absolute count of runs of the program to execute. The -m
and -M
flags let you specify a minimum and maximum count of runs to express a range.
$ hyperfine -r 2 ls
Benchmark #1: ls
Time (mean ± σ): 0.8 ms ± 0.1 ms [User: 0.9 ms, System: 0.0 ms]
Range (min … max): 0.7 ms … 0.9 ms 2 runs
Warning: Command took less than 5 ms to complete. Results might be inaccurate.
Benchmarking
This ls
command is a poor example for hyperfine
since it’s not doing any computationally expensive work. Let’s construct a command which opens a lot of files on our disk.
$ hyperfine "find /usr/share/man -type f -name '*.gz' | xargs -L 1 -P 8 zcat"
Benchmark #1: find /usr/share/man -type f -name '*.gz' | xargs -L 1 -P 8 zcat
Time (mean ± σ): 2.592 s ± 0.071 s [User: 6.333 s, System: 1.786 s]
Range (min … max): 2.523 s … 2.705 s 10 runs
That’s more like it! This snippet introduces another feature of hyperfine
: support for arbitrary shell pipelines. Every command is evaluated in its own spawned instance of sh -c
. You can use other shells that support -c
with the hyperfine -S
flag.
This is cool but hey, I have a modern unix machine and there are probably disk caches getting in the way of an accurate measurement. We can run commands before each run to reset the state of the system with the -p
/--prepare
flag.
$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' "find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zcat"
Benchmark #1: find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zcat
Time (mean ± σ): 3.384 s ± 0.496 s [User: 6.807 s, System: 2.184 s]
Range (min … max): 2.981 s … 4.364 s 10 runs
Looks like the disk’s caches were really doing their job- the fastest run with cold caches was slower than the slowest warm iteration. There’s far more standard deviation on this run though, possibly due to the state of the rest of my system. If you are looking to do “real science” with hyperfine
, it’d be prudent to run benchmarks from a system that has very little else running.
In addition to clearing caches via commands before each benchmarking run, hyperfine
allows you to warm caches by running throw-away warmup runs of each command. You can specify the count of warmup runs with -w
/--warmup
.
The shell pipeline we’ve been timing isn’t very useful, unless you want to dump the contents of all man pages stored in /usr/share/man
. Let’s write a shitty man -k
replacment instead:
$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' "find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog"
Benchmark #1: find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog
Error: Command terminated with non-zero exit code. Use the '-i'/'--ignore-failure' option if you want to ignore this. Alternatively, use the '--show-output' option to debug what went wrong.
Whoops! I forgot that if grep
doesn’t match it’ll return a non-zero exit code. We could run hyperfine
with the --ignore-failure
option, but this could mask other possible issues with the rest of the pipeline. Since hyperfine
supports arbitrary shell pipelines I’ll just tack on a || true
:
$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' "find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog || true"
Benchmark #1: find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog || true
Time (mean ± σ): 8.422 s ± 0.456 s [User: 22.107 s, System: 3.823 s]
Range (min … max): 7.844 s … 9.152 s 10 runs
Cool, we just created a really naive man -k
implementation. How fast is it compared to the real thing? hyperfine
can comparing multiple pipelines that are passed as additional arguments, so let’s pass man -k syslog
too:
$ hyperfine -p 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' "man -k syslog" "find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog || true"
Benchmark #1: man -k syslog
Time (mean ± σ): 84.6 ms ± 13.2 ms [User: 35.4 ms, System: 12.4 ms]
Range (min … max): 70.3 ms … 107.9 ms 14 runs
Benchmark #2: find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog || true
Time (mean ± σ): 7.401 s ± 0.056 s [User: 20.915 s, System: 3.334 s]
Range (min … max): 7.315 s … 7.459 s 10 runs
Summary
'man -k syslog' ran
87.51 ± 13.64 times faster than 'find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog || true'
Since these two commands had vastly different timings, the confidence interval is a bit wide. A good thing to note here is that the --prepare
command is run before all benchmarks, you don’t have to pass multiple --prepare
if you pass multiple commands.
Output Formats
hyperfine
supports outputting benchmarks in machine and human-readable formats. With --export-markdown
, you can produce markdown tables from benchmarking runs. The following table (and the rest of the examples in this section) correspond to the last benchmark shown above.
Command | Mean [ms] | Min…Max [ms] |
---|---|---|
man -k syslog | 82.4 ± 8.9 | 70.1…99.6 |
find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog || true | 7177.2 ± 297.1 | 6943.1…7918.3 |
The same basic summary information is available in the CSV output, with --export-csv
:
command,mean,stddev,user,system,min,max
man -k syslog,0.07754731863714286,0.00718055888822199,0.03314085214285714,0.008886727857142855,0.06651294728,0.08886895328000001
find /usr/share/man/ -type f -name '*.gz' | xargs -L 1 -P 8 zgrep syslog || true,7.723293876680001,0.3837451768952995,21.587729494999998,3.5906936849999993,7.35156860228,8.62358789528
If you’re looking to dig into the statistics yourself, the --export-json
option is most useful. The JSON output format includes the timing of each individual run with the summary statistics.
$ jq .results[].times hyperfine_results.json
[
0.08253860528000001,
0.08350332128,
0.08733465628,
0.08886895328000001,
0.08629350028,
0.07690511928,
0.07379255728,
0.07013369328,
0.06651294728,
0.07266715928,
0.07053606928,
0.07827986928000001,
0.07743288128,
0.07086312828
]
[
8.62358789528,
7.9401124772800005,
7.51673595728,
7.45856522628,
7.48676654828,
7.761782671280001,
7.41736586028,
7.69780088228,
7.9786526462800005,
7.35156860228
]
Parameterization
One interesting feature I haven’t found much of a use for is -P
/--parameter-scan
. This allows you to specify an integer range that will be substituted within each program timing run.
For a contrived example, let’s say we wanted to find the man
section which returns results slowest. My hypothesis is that the section with the most results will take the longest to run. A common-enough regex is probably .*tool.*
, as many man page descriptions include the string “tool”.
$ for n in {1..8}; do echo -n "$n "; apropos -s "$n" '.*tool.*' 2>/dev/null | wc -l; done
1 152
2 0
3 181
4 0
5 5
6 0
7 3
8 32
If my hypothesis is correct, section 3 should run the slowest. Let’s set up the command-line using the {SECTION}
placeholder, like so: hyperfine -P SECTION 1 8 --export-markdown apropos.md -i 'apropos -s {SECTION} ".*tool.*"'
Command | Mean [ms] | Min…Max [ms] |
---|---|---|
apropos -s 1 ".*tool.*" | 25.5 ± 3.9 | 23.7…64.3 |
apropos -s 2 ".*tool.*" | 18.1 ± 0.9 | 17.3…21.9 |
apropos -s 3 ".*tool.*" | 31.7 ± 1.3 | 30.5…37.6 |
apropos -s 4 ".*tool.*" | 16.8 ± 0.8 | 16.0…20.6 |
apropos -s 5 ".*tool.*" | 18.1 ± 1.0 | 17.2…23.9 |
apropos -s 6 ".*tool.*" | 17.2 ± 2.8 | 15.9…44.9 |
apropos -s 7 ".*tool.*" | 19.4 ± 3.1 | 17.3…38.8 |
apropos -s 8 ".*tool.*" | 19.6 ± 0.4 | 19.0…23.0 |
The benchmarks match our hypothesis: the slowest sections are 3, 1, 8, and 7- the exact order of result count.
I’m interested in hearing your use-case for -P
if you have one- it’s clearly powerful but difficult to think about in the abstract.
Edit: reader bdesham pointed out that -P
is useful to adjust the amount of parallelism in a command. Programs like make, GNU parallel, and GHC take a -j
style parameter which specifies the count of tasks to run simultaneously. This parameter can be incremented by hyperfine
to produce a parallelism profile. Thanks bdesham!
The other notion of “parameterization” I’ve seen used with hyperfine
is the use of shell variables in the commands you’re running. This is easy to perform but won’t change per-run of the command.
Debugging Failing Commands
The other two flags I’ll mention here help with debugging failing commands. You can view the standard output of the command with --show-output
, which slows down the execution but can pinpoint exactly what input broke your command. If this occurs deep in a sequence of failures, the -i
/--ignore-failure
flag will stop hyperfine
from exiting upon a non-zero exit code.
Personally, I would recommend running the command you intend on benchmarking manually before running it under hyperfine
. If all went well beforehand and failures start cropping up during benchmarking, the -i
and --show-output
flags are useful to know about.
tl;dr
hyperfine
provides a nice balance of features against binary size and ease of installation. The JSON output format and --prepare
feature make it worth trying for your next benchmarking use-case.
ℹ️ Interested in seeing more posts like this? Subscribe to the cli.fan RSS feed linked here and in the top navigation bar.