Java Microbenchmark Harness is a toolkit designed and implemented by Jakob Jenkov for running accurate and valid Java microbenchmarks. For now, it is available in OpenJDK package, but will be part of SDK since Java 9.
Why do I need an external testing tool? Can’t I just use System.currentTimeMillis() or System.nanoTime() methods?
Well, no. Actually, you can, but those are not going to give you accurate values. And in most cases, at least System.currentTimeMillis(), is not precise enough to measure code performance fairly good.
Why?
Apart from the code (your program/application) on the JVM by default, there is also a JIT running that is continuously optimizing the bytecode. Set a JVM -Djava.compiler=NONE startup property to see what happens and how slow code runs with this little amigo disabled. So to get more accurate results it is recommended to use JMH which runs some warmup iterations before proper/measured iterations.
We should start with a little theory
Like I said before, JIT optimizes our code which means that just after startup your code can be executed in x ms, and after another startup, it can be executed in x+10ms or x-10ms. In some way, it depends on startup modes (server, client, mixed) and compilation levels.
And, simply saying, measuring the time of single execution is not the best way to get some info about our code performance. It is much better to test how many iterations code is able to go through in nanoseconds/microseconds/…/days.
Instead of experimenting with some wild separated startups, destroying your keyboard or mouse on the “run” button and getting angry because results vary too much, it is more valuable to just delegate this task to a computer (and we are going to get much more precise scores).
JMH is an annotation configured microbenchmark tool. Generally speaking, it measures how many times our code was executed in specified amount of time, then divides this time by times it executed and gives us a result.
Creating a project
To run some benchmarks, obviously, we need a project.
1. Create a Maven project.
2. Add those configuration lines into your pom.xml file:
<groupId>benchmarking</groupId> <artifactId>benchmark</artifactId> <version>1.0</version> <dependencies> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-core</artifactId> <version>${jmh.version}</version> </dependency> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-generator-annprocess</artifactId> <version>${jmh.version}</version> <scope>provided</scope> </dependency> </dependencies> <properties> <jmh.version>1.17.5</jmh.version> <javac.target>1.8</javac.target> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.1</version> <configuration> <compilerVersion>${javac.target}</compilerVersion> <source>${javac.target}</source> <target>${javac.target}</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.2</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <finalName>${uberjar.name}</finalName> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>org.openjdk.jmh.Main</mainClass> </transformer> </transformers> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> </execution> </executions> </plugin> </plugins> <pluginManagement> <plugins> <plugin> <artifactId>maven-clean-plugin</artifactId> <version>2.5</version> </plugin> <plugin> <artifactId>maven-install-plugin</artifactId> <version>2.5.1</version> </plugin> <plugin> <artifactId>maven-jar-plugin</artifactId> <version>2.4</version> </plugin> </plugins> </pluginManagement> </build
3. Now when you have all set, create a class.
Be aware that this class has to be in some package. Otherwise, the project will not be built.
Mine looks as follows:
package com.goyello; import org.openjdk.jmh.annotations.Benchmark; public class FirstBenchmark { @Benchmark public String firstBenchmark() { int dec = 123456789; return Integer.toBinaryString(dec); } }
Running a project
Now all you need to do is to run in project directory Maven build:
mvn install
After this, go into the target directory and run benchmark jar:
cd target java -jar benchmark-1.0.jar
You should see some output now:
# JMH 1.17.5 (released 22 days ago) # VM version: JDK 1.8.0_92, VM 25.92-b14 # VM invoker: C:\Program Files\Java\jre1.8.0_92\bin\java.exe # VM options: <none> # Warmup: 20 iterations, 1 s each # Measurement: 20 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: com.goyello.FirstBenchmark.firstBenchmark
After it is finished you will see some stats:
Result "com.goyello.FirstBenchmark.firstBenchmark": 27755669,342 ?(99.9%) 987027,288 ops/s [Average] (min, avg, max) = (17285337,443, 27755669,342, 36147589,932), stdev = 4179135,343 CI (99.9%): [26768642,054, 28742696,630] (assumes normal distribution) # Run complete. Total time: 00:06:46 Benchmark Mode Cnt Score Error Units FirstBenchmark.firstBenchmark thrpt 200 27755669,342 ? 987027,288 ops/s
It is also possible to run benchmark tests directly in your favourite IDE.
You just have to add this code into main method:
Options opt = new OptionsBuilder() .include(Benchmark.class.getSimpleName()) .forks(1) .build(); new Runner(opt).run();
Warmup
Those are iterations which are run just to invoke all standard processes in JVM i.e. JIT optimisations, GC markups. The results are ignored in the overall, final score. The amount of warmup iterations or time for each of them can be customized by Warmup annotation:
@Warmup(iterations = 20, time = 1, timeUnit = TimeUnit.SECONDS)
iterations – sets amount of warmup iterations for each benchmark
time – sets how much time every warmup iteration will take in specified timeUnit
timeUnit – almost every possible unit starting from TimeUnit.NANOSECONDS up to TimeUnit.DAYS (SI standard)
This annotation is applicable for methods and for classes.
Measurement
Those are iterations which really do matter in final results. Their amount is counted and summarized after all runs.
Parameters configuration is the same as for warmup iterations, except we use @Measurement annotation in here.
This annotation is applicable for methods and for classes.
Timeout
Sets the timeout for each benchmark iteration (no matter if warm up or measurement one) in specified timeUnit.
Customized by annotation:
@Timeout(time = 10, timeUnit = TimeUnit.SECONDS)
This annotation is applicable for methods and for classes.
If time is set as very low or lower than iteration time JMH will warn you:
# Timeout: 10 ns per iteration, ***WARNING: The timeout might be too low!
Threads
Sets how many threads are concurrently executing benchmarks. Results are synchronized.
Customized by annotation:
@Threads(value = 4)
This annotation is applicable for methods and for classes.
Benchmark mode
Generally speaking – defines what do we really measure.
Customized by annotation:
@BenchmarkMode(Mode.AverageTime)
Allows multiple arguments:
@BenchmarkMode({Mode.AverageTime, Mode.SampleTime, Mode.SingleShotTime, Mode.Throughput, Mode.All})
- • Mode.AverageTime
Calculate an average running time - Mode.SampleTime
Calculate how long it takes for a method to run (including percentiles) - Mode.SingleShotTime
Just run a method once (useful for cold-testing mode). Or more than once if you have specified a batch size for your iterations (see @Measurement annotation below) – in this case JMH will calculate the batch running time (total time for all invocations in a batch). - Mode.Throughput
Calculate number of operations in a time unit - Mode.All
All mentioned above, together, one after another. This annotation is applicable for methods and for classes.
Output Time Unit
If you expect that your code will be really fast or extremely slow, it is possible to change/format output time unit (those visible in result section):
@OutputTimeUnit(TimeUnit.NANOSECONDS)
This annotation is applicable for methods and for classes.
Pre-run prepared data
There are two possible ways to inject data into your benchmark:
@Param annotation
This way is suitable for simple data, especially when you know a desired value before run. You can even inject an array of values into this annotation if you want. Keep in mind that by the amount of param values, amount of run benchmarks will grow too. Actually it will be multiplied.
Setting the value looks as follows:
@Param("123456789") int dec; @Param({"123456789", “987654321”, “1234”, “4321”}) int decMultiple; @Benchmark public String decToBinSDKWay() { return Integer.toBinaryString(dec); }
Why should I use this way instead of just assigning a value to a class field?
Because this annotation is able to gather multiple arguments and will run a new benchmark for each data entry.
Dependency Injection
It is possible to prepare data before benchmark startup and inject it via dependency injection.
Available scopes:
- Scope.Benchmark
An instance will be shared across all threads running the same test. Could be used to test multithreaded performance of a state object (or just mark your benchmark with this scope). - Scope.Group
An instance will be allocated per thread group (see Groups section down below) - Scope.Thread
This is a default state. An instance will be allocated for each thread running the given test.
It is allowed to write separated class for data preparation, but generally, you have to stick to these two rules:
#1 – class has to be declared as ‘public’ and ‘static’ one
#2 – class has to have default constructor also with public access
For explanation in practice, see code below Blackhole section.
Blackhole
Blackhole objects injected into benchmarks are used to avoid dead code elimination, which is done by JIT.
For example, code:
int doSomeCalculations(int x) { return 2*x; } @Benchmark public int deadCodeEliminationBenchmarkExample() { int result; for(int x = 0; x < 100; ++x) { result = doSomeCalculations(x); } return result; }
would probably be optimized and only the last iteration would be executed (in this case doSomeCalculations(99)).
To avoid this, it is recommended to use Blackhole::consume method which forces JVM to really execute every iteration inside benchmark.
It is also important to be sure that the benchmark method returns something.
A Blackhole class offers also a method for CPU state change/refresh (Blackhole::consumeCPU).
So how particularly should this benchmark look like?
int doSomeCalculations(int x) { return 2*x; } @Benchmark public int noDeadCodeEliminationBenchmarkExample(Blackhole hole) { int result; for(int x = 0; x < 100; ++x) { result = doSomeCalculations(x); hole.consume(result); } return result; }
If you are curious what happens in Blackhole::consume method, you can see it right here:
Benchmark class:
package com.goyello; import org.openjdk.jmh.annotations.*; import java.util.concurrent.TimeUnit; import java.util.Random; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Thread) public class DecToBinBenchmark { @State(Scope.Benchmark) public static class Data { Integer testValue; public Data() { this.testValue = new Random().nextInt(); } } @Warmup(iterations = 10) @Measurement(iterations = 10) @Benchmark public String decToBinSDKWay(Data data) { return Integer.toBinaryString(data.testValue); } @Warmup(iterations = 10) @Measurement(iterations = 10) @Benchmark public String decToBinBitwiseWay(Data data) { char[] bin = new char[32]; for(int i = 31; i > -1; --i) { bin[i] = (char) ((data.testValue&1)+48); data.testValue >>= 1; } return new String(bin); } } > java -jar benchmark-1.0.jar # JMH 1.17.5 (released 23 days ago) # VM version: JDK 1.8.0_92, VM 25.92-b14 # VM invoker: C:\Program Files\Java\jre1.8.0_92\bin\java.exe # VM options: <none> # Warmup: 5 iterations, 1 s each # Measurement: 5 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: com.goyello.DecToBinBenchmark.decToBinBitwiseWay
And the results:
# Run complete. Total time: 00:03:29 Benchmark Mode Cnt Score Error Units DecToBinBenchmark.decToBinBitwiseWay avgt 50 58,140 ? 4,748 ns/op DecToBinBenchmark.decToBinSDKWay avgt 50 74,641 ? 6,310 ns/op
Results analysis
Score
Average (in this case) time needed for a single iteration to execute code in units configured by @OutputTimeUnit, shown in the last column.
Error
Maximum deviation in benchmarks
Comments
It is best to leave your PC alone during benchmark tests.
- No Spotify
- No YouTube
- No project deployment
- No StackOverflow
- No Reddit
- No Hackerrank challenges
- And so on…
I think 5 iterations for warmup and for measurements while testing complex code are not enough. By default, it is 20, but with 4-5 benchmarks (and more) it can take hours.
It is all about the balance (i.e. if you see that error ratio is significant, try increasing the amount of warmup iterations). Although it is possible, I do not think we should go below 10 iterations in more complex cases than my example class, just for the sake of results.
If you are working on a laptop, keep in mind to set it into the max performance mode.