Testing your code performance with JMH tool - Aspire Systems Poland Blog

Java Microbenchmark Harness is a toolkit designed and implemented by Jakob Jenkov for running accurate and valid Java microbenchmarks. For now, it is available in OpenJDK package, but will be part of SDK since Java 9.

Why do I need an external testing tool? Can’t I just use System.currentTimeMillis() or System.nanoTime() methods?

Well, no. Actually, you can, but those are not going to give you accurate values. And in most cases, at least System.currentTimeMillis(), is not precise enough to measure code performance fairly good.

Why?

Apart from the code (your program/application) on the JVM by default, there is also a JIT running that is continuously optimizing the bytecode. Set a JVM -Djava.compiler=NONE startup property to see what happens and how slow code runs with this little amigo disabled. So to get more accurate results it is recommended to use JMH which runs some warmup iterations before proper/measured iterations.

We should start with a little theory

Like I said before, JIT optimizes our code which means that just after startup your code can be executed in x ms, and after another startup, it can be executed in x+10ms or x-10ms. In some way, it depends on startup modes (server, client, mixed) and compilation levels.

And, simply saying, measuring the time of single execution is not the best way to get some info about our code performance. It is much better to test how many iterations code is able to go through in nanoseconds/microseconds/…/days.

Instead of experimenting with some wild separated startups, destroying your keyboard or mouse on the “run” button and getting angry because results vary too much, it is more valuable to just delegate this task to a computer (and we are going to get much more precise scores).

JMH is an annotation configured microbenchmark tool. Generally speaking, it measures how many times our code was executed in specified amount of time, then divides this time by times it executed and gives us a result.

Creating a project

To run some benchmarks, obviously, we need a project.

1. Create a Maven project.

2. Add those configuration lines into your pom.xml file:

<groupId>benchmarking</groupId>
<artifactId>benchmark</artifactId>
<version>1.0</version>

<dependencies>
    <dependency>
        <groupId>org.openjdk.jmh</groupId>
        <artifactId>jmh-core</artifactId>
        <version>${jmh.version}</version>
    </dependency>
    <dependency>
        <groupId>org.openjdk.jmh</groupId>
        <artifactId>jmh-generator-annprocess</artifactId>
        <version>${jmh.version}</version>
        <scope>provided</scope>
    </dependency>
</dependencies>

<properties>
    <jmh.version>1.17.5</jmh.version>
    <javac.target>1.8</javac.target>
</properties>

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <compilerVersion>${javac.target}</compilerVersion>
                <source>${javac.target}</source>
                <target>${javac.target}</target>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.2</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <finalName>${uberjar.name}</finalName>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>org.openjdk.jmh.Main</mainClass>
                            </transformer>
                        </transformers>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
    <pluginManagement>
        <plugins>
            <plugin>
                <artifactId>maven-clean-plugin</artifactId>
                <version>2.5</version>
            </plugin>
            <plugin>
                <artifactId>maven-install-plugin</artifactId>
                <version>2.5.1</version>
            </plugin>
            <plugin>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.4</version>
            </plugin>
        </plugins>
    </pluginManagement>
</build

3. Now when you have all set, create a class.

Be aware that this class has to be in some package. Otherwise, the project will not be built.
Mine looks as follows:

package com.goyello;

import org.openjdk.jmh.annotations.Benchmark;

public class FirstBenchmark {
      @Benchmark
      public String firstBenchmark() {
            int dec = 123456789;
            return Integer.toBinaryString(dec);
      }
}

Running a project

Now all you need to do is to run in project directory Maven build:

mvn install

After this, go into the target directory and run benchmark jar:

cd target
java -jar benchmark-1.0.jar

You should see some output now:

# JMH 1.17.5 (released 22 days ago)
# VM version: JDK 1.8.0_92, VM 25.92-b14
# VM invoker: C:\Program Files\Java\jre1.8.0_92\bin\java.exe
# VM options: <none>
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.goyello.FirstBenchmark.firstBenchmark

After it is finished you will see some stats:

Result "com.goyello.FirstBenchmark.firstBenchmark":
  27755669,342 ?(99.9%) 987027,288 ops/s [Average]
  (min, avg, max) = (17285337,443, 27755669,342, 36147589,932), stdev = 4179135,343
  CI (99.9%): [26768642,054, 28742696,630] (assumes normal distribution)
# Run complete. Total time: 00:06:46

Benchmark                       Mode  Cnt         Score        Error  Units
FirstBenchmark.firstBenchmark  thrpt  200  27755669,342 ? 987027,288  ops/s

It is also possible to run benchmark tests directly in your favourite IDE.
You just have to add this code into main method:

Options opt = new OptionsBuilder()
                .include(Benchmark.class.getSimpleName())
                .forks(1)
                .build();
new Runner(opt).run();

Warmup

Those are iterations which are run just to invoke all standard processes in JVM i.e. JIT optimisations, GC markups. The results are ignored in the overall, final score. The amount of warmup iterations or time for each of them can be customized by Warmup annotation:

@Warmup(iterations = 20, time = 1, timeUnit = TimeUnit.SECONDS)

iterations – sets amount of warmup iterations for each benchmark
time – sets how much time every warmup iteration will take in specified timeUnit
timeUnit – almost every possible unit starting from TimeUnit.NANOSECONDS up to TimeUnit.DAYS (SI standard)

This annotation is applicable for methods and for classes.

Measurement

Those are iterations which really do matter in final results. Their amount is counted and summarized after all runs.
Parameters configuration is the same as for warmup iterations, except we use @Measurement annotation in here.
This annotation is applicable for methods and for classes.

Timeout

Sets the timeout for each benchmark iteration (no matter if warm up or measurement one) in specified timeUnit.
Customized by annotation:

@Timeout(time = 10, timeUnit = TimeUnit.SECONDS)

This annotation is applicable for methods and for classes.
If time is set as very low or lower than iteration time JMH will warn you:

# Timeout: 10 ns per iteration, ***WARNING: The timeout might be too low!

Threads

Sets how many threads are concurrently executing benchmarks. Results are synchronized.
Customized by annotation:

@Threads(value = 4)

This annotation is applicable for methods and for classes.

Benchmark mode

Generally speaking – defines what do we really measure.
Customized by annotation:

@BenchmarkMode(Mode.AverageTime)

Allows multiple arguments:

@BenchmarkMode({Mode.AverageTime, Mode.SampleTime, Mode.SingleShotTime, Mode.Throughput, Mode.All})

• Mode.AverageTime
Calculate an average running time
Mode.SampleTime
Calculate how long it takes for a method to run (including percentiles)
Mode.SingleShotTime
Just run a method once (useful for cold-testing mode). Or more than once if you have specified a batch size for your iterations (see @Measurement annotation below) – in this case JMH will calculate the batch running time (total time for all invocations in a batch).
Mode.Throughput
Calculate number of operations in a time unit
Mode.All

All mentioned above, together, one after another. This annotation is applicable for methods and for classes.

Output Time Unit

If you expect that your code will be really fast or extremely slow, it is possible to change/format output time unit (those visible in result section):

@OutputTimeUnit(TimeUnit.NANOSECONDS)

This annotation is applicable for methods and for classes.

Pre-run prepared data

There are two possible ways to inject data into your benchmark:

@Param annotation

This way is suitable for simple data, especially when you know a desired value before run. You can even inject an array of values into this annotation if you want. Keep in mind that by the amount of param values, amount of run benchmarks will grow too. Actually it will be multiplied.

Setting the value looks as follows:

@Param("123456789")
int dec;

@Param({"123456789", “987654321”, “1234”, “4321”})
int decMultiple;

@Benchmark
public String decToBinSDKWay() {
      return Integer.toBinaryString(dec);
}

Why should I use this way instead of just assigning a value to a class field?
Because this annotation is able to gather multiple arguments and will run a new benchmark for each data entry.

Dependency Injection

It is possible to prepare data before benchmark startup and inject it via dependency injection.

Available scopes:

Scope.Benchmark
An instance will be shared across all threads running the same test. Could be used to test multithreaded performance of a state object (or just mark your benchmark with this scope).
Scope.Group
An instance will be allocated per thread group (see Groups section down below)
Scope.Thread
This is a default state. An instance will be allocated for each thread running the given test.

It is allowed to write separated class for data preparation, but generally, you have to stick to these two rules:
#1 – class has to be declared as ‘public’ and ‘static’ one
#2 – class has to have default constructor also with public access

For explanation in practice, see code below Blackhole section.

Blackhole

Blackhole objects injected into benchmarks are used to avoid dead code elimination, which is done by JIT.
For example, code:

int doSomeCalculations(int x) {
      return 2*x;
}

@Benchmark
public int deadCodeEliminationBenchmarkExample() {
      int result;
      for(int x = 0; x < 100; ++x) {
            result = doSomeCalculations(x);
      }
      return result;
}

would probably be optimized and only the last iteration would be executed (in this case doSomeCalculations(99)).
To avoid this, it is recommended to use Blackhole::consume method which forces JVM to really execute every iteration inside benchmark.
It is also important to be sure that the benchmark method returns something.

A Blackhole class offers also a method for CPU state change/refresh (Blackhole::consumeCPU).

So how particularly should this benchmark look like?

int doSomeCalculations(int x) {
      return 2*x;
}

@Benchmark
public int noDeadCodeEliminationBenchmarkExample(Blackhole hole) {
      int result;
      for(int x = 0; x < 100; ++x) {
            result = doSomeCalculations(x);
            hole.consume(result);
      }
      return result;
}

If you are curious what happens in Blackhole::consume method, you can see it right here:

Benchmark class:

package com.goyello;

import org.openjdk.jmh.annotations.*;

import java.util.concurrent.TimeUnit;
import java.util.Random;


@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class DecToBinBenchmark {

      @State(Scope.Benchmark)
      public static class Data {
            Integer testValue;

            public Data() {
                  this.testValue = new Random().nextInt();
            }
      }

      @Warmup(iterations = 10)
      @Measurement(iterations = 10)
      @Benchmark
      public String decToBinSDKWay(Data data) {
            return Integer.toBinaryString(data.testValue);
      }

      @Warmup(iterations = 10)
      @Measurement(iterations = 10)
      @Benchmark
      public String decToBinBitwiseWay(Data data) {
            char[] bin = new char[32];

            for(int i = 31; i > -1; --i) {
                  bin[i] = (char) ((data.testValue&1)+48);
                  data.testValue >>= 1;
            }

            return new String(bin);
      }
}
> java -jar benchmark-1.0.jar

# JMH 1.17.5 (released 23 days ago)
# VM version: JDK 1.8.0_92, VM 25.92-b14
# VM invoker: C:\Program Files\Java\jre1.8.0_92\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.goyello.DecToBinBenchmark.decToBinBitwiseWay

And the results:

# Run complete. Total time: 00:03:29

Benchmark                             Mode  Cnt   Score   Error  Units
DecToBinBenchmark.decToBinBitwiseWay  avgt   50  58,140 ? 4,748  ns/op
DecToBinBenchmark.decToBinSDKWay      avgt   50  74,641 ? 6,310  ns/op

Results analysis

Score

Average (in this case) time needed for a single iteration to execute code in units configured by @OutputTimeUnit, shown in the last column.

Error

Maximum deviation in benchmarks

Comments

It is best to leave your PC alone during benchmark tests.

No Spotify
No YouTube
No project deployment
No StackOverflow
No Reddit
No Hackerrank challenges
And so on…

I think 5 iterations for warmup and for measurements while testing complex code are not enough. By default, it is 20, but with 4-5 benchmarks (and more) it can take hours.

It is all about the balance (i.e. if you see that error ratio is significant, try increasing the amount of warmup iterations). Although it is possible, I do not think we should go below 10 iterations in more complex cases than my example class, just for the sake of results.
If you are working on a laptop, keep in mind to set it into the max performance mode.

Tags: Testing