Java Micro Benchmark with JMH

Last modified on July 27th, 2015 by Joe.

This Java tutorial is to learn about what is a micro benchmark and how to do a micro benchmark using JMH. Java Microbenchmark Harness (JMH) is a Java tool by OpenJDK for creating benchmarks. We can use JMH to benchmark programs written in Java and other languages that targets the Java JVM. Benchmark is a tricky thing and it can more often go wrong than right. It is better to use an existing tool to do micro benchmarks instead of doing our own calculations and statistics. In this tutorial, we will see how to get started using JMH using an example micro benchmark.

What is a Micro Benchmark?

Benchmark is the process of recording the performance of a system. (Macro) benchmarks are done between different platforms to compare the efficiency between them. Micro benchmarks are done within the same platform for a small snippet of code.

Micro benchmarks are generally done for two reasons.

To compare different approaches of code which implements the same logic and choose the best one to use.
To identify any bottlenecks in a suspected area of code during performance optimization.

benchmark

I would like to cite couple of resources to learn micro or nano or whatever benchmarks.

A talk titled, The art of Java Benchmarking by Aleksey Shipilev is worth watching if you want to learn about micro benchmarking. This presentation made me scared about benchmarks and highlighted me the iceberg beneath. He is the person behind JMH and has great credentials and I strongly recommend you to watch the talk. Slides for that talk are available here.
Anatomy of a flawed micro benchmark, a paper by Brian Goetz. Explains in how many ways a micro benchmark can fail using an example. At first look, the example given is simple and looks fine. Then based on the explanation, we can understand how we overlook critical things as trivial.This will also make us understand, why we should go with a tool like JMH for benchmarks.

How to do a Micro Benchmark?

We often do not worry about the performance requirements. We start with building functionality and concentrate on making things work and don’t focus on how. Once the software goes to production, we will be facing the inevitable. It is good to have the performance objectives written down before writing the code. We should weigh the importance of performance with respect to the functional requirements and should come up with a balance between them.

Stages of a micro benchmark:

Every stage of a micro benchmark is important. Stage are, benchmark design, benchmark code implementation, benchmark test execution and results interpretation. Each of these stages are critical and many things can go wrong on these. At the design stage iron out the objective for benchmark. There are too many parameters to consider when designing the benchmark methodology. Based on the objective, you can choose the parameters that you need to focus. Go through the following key points and it highlights the essential points to remember to do a micro benchmark.

Key Points to Remember

Be aware of Java compiler optimizations like dead code elimination, loop unrolling, lock coalescing and in-lining. You might be benchmarking a different code than what you are thinking.
Compiler may feel that the code written for benchmarking is not required (dead code) for the program’s objective and prune or optimize it. We should cheat the compiler and make it understand that the benchmarking code is part of the program, so that it doesn’t optimize it.
Know your hardware platform whether its is single core or multi-core or hyper threaded and its impact on your program to benchmark.
Be aware of your benchmark environment. Comparisons of two code snippets should be done on same environment. Even a minor change in the configuration can affect the benchmark largely.
If you are not comparing different set of code snippets and just trying to find out the elapsed time, you should mimic the production environment exactly. It should be an exact replica and if not, the benchmark will not give the right results. You should even concentrate on things like power profile of the OS. This is one example and you should think of similar lines.
When running on the same environment, remember to switch off all other programs. Machine should be silent. Background processes can compete for resources and cause delay. This delay may not be a constant one to adjust. So the best possibility is to remove all other programs, keep the system fresh and do the benchmark.
Before recording the numbers, do multiple runs of the code snippet to warm up the environment. This is to initialize the environment. Java JIT takes time to analyze and optimize the code on initial runs. We should give enough number of iterations for it to stabilize otherwise we will end up adding the JIT overheads to the performance. Similarly we may not get the caching benefits that happens at different levels.
How many times the benchmark should be done? Benchmark is not a single run that we know absolutely. But how many times we should run and for what different parameters/configurations we should run? Should we consider all the test run results and average it or should we omit an initial set of run?
Understanding the benchmark results is important, for example should we consider time/iteration or iteration/time as a statistic? This essentially depends on our benchmark objective.
JVM arguments. There are JVM argument like –server which changes the behavior of the JVM drastically. Be aware of the arguments you use in the benchmarks.
Learn about JVM, compilation process and Java JIT optimizations.

Java Micro Benchmark Tools

So if you are considering to write your own micro or nano benchmark implementation code, then my recommendation is a BIG NO. Having discussed all the above, we know how it is difficult to get a micro benchmark right. How to do a warm-up of the environment? Managing the OS power balance process, equal priority for threads, repeating the task cycle, recording observations and reporting stats, etc; all these steps should be done in an accurate and repeatable way. Our stopwatch benchmarking methodology is surely not sufficient for this.

Better way is to use an existing tool to do the benchmark implementation. Still we can go wrong in other stages of a micro benchmark, at least we get the implementation part correct depending on the tool we use. Caliper by Google and JMH by OpneJDK are two popular tools to do Java micro benchmarks. I have not done any comparison study between these two tools. In this tutorial I have taken JMH and in some future tutorial I will write a how to use Caliper. If you are aware of some good tools to do Java micro benchmarks, add that in comments and I wish to explore them too.

How to Benchmark with JMH

I will help you to get started with JMH. Let me take an example Java code and show you how to benchmark it using JMH. It is suggested that it is better to execute the JMH benchmark tool from command line instead of an IDE as it may influence the benchmark results.

I am going to use Maven and Eclipse to create a project to prepare the program / benchmark code. Then run it from the command line.

JMH Hello World:

Create a new Maven project with the following pom.xml There are two dependency jars jmh-core and jmh-generator-annprocess, their names are self explanatory.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.javapapers.java.benchmark.jmh</groupId>
  <artifactId>java-benchmark-jmh</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <name>Java Benchmark JMH</name>
	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<jmh.version>1.10.3</jmh.version>
		<javac.target>1.7</javac.target>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.openjdk.jmh</groupId>
			<artifactId>jmh-core</artifactId>
			<version>${jmh.version}</version>
		</dependency>
		<dependency>
			<groupId>org.openjdk.jmh</groupId>
			<artifactId>jmh-generator-annprocess</artifactId>
			<version>${jmh.version}</version>
			<scope>provided</scope>
		</dependency>
	</dependencies>
	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.3</version>
				<configuration>
					<compilerVersion>${javac.target}</compilerVersion>
					<source>${javac.target}</source>
					<target>${javac.target}</target>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.codehaus.mojo</groupId>
				<artifactId>exec-maven-plugin</artifactId>
				<version>1.4.0</version>
				<configuration>
					<executable>java</executable>
					<arguments>
						<argument>-classpath</argument>
						<classpath />
						<argument>com.javapapers.java.benchmark.jmh.JMHHelloWorld</argument>
					</arguments>
				</configuration>
			</plugin>
		</plugins>
	</build>  
</project>

Benchmark Code

@Benchmark is the annotation that tells the JMH to benchmark the method. This example is just to verify our setup and see if JMH is doing the benchmarking. Next we can see a practical example.

package com.javapapers.java.benchmark.jmh;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

public class JMHHelloWorld {

	@Benchmark
	public void helloWorld() {
		// a dummy method to check the overhead
	}

	/*
	 * It is better to run the benchmark from command-line instead of IDE.
	 * 
	 * To run, in command-line: $ mvn clean install exec:exec
	 */

	public static void main(String[] args) throws RunnerException {
		Options options = new OptionsBuilder()
				.include(JMHHelloWorld.class.getSimpleName()).forks(1).build();

		new Runner(options).run();
	}
}

JMH Benchmark Results

Run the above program with command mvn clean install exec:exec > result.log. I am just piping the benchmark result to a file. I have truncated the middle part of the benchmark results from the below as it is no interest in the given context.

# JMH 1.10.3 (released 9 days ago)
# VM version: JDK 1.8.0_25, VM 25.25-b02
# VM invoker: C:\Program Files\Java\jdk1.8.0_25\jre\bin\java.exe
# VM options: 
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.javapapers.java.benchmark.jmh.JMHHelloWorld.helloWorld

# Run progress: 0.00% complete, ETA 00:06:40
# Fork: 1 of 10
# Warmup Iteration   1: 3485776570.432 ops/s
# Warmup Iteration   2: 3526139956.183 ops/s
# Warmup Iteration   3: 3389091879.262 ops/s
# Warmup Iteration   4: 3414813217.901 ops/s
# Warmup Iteration   5: 3410897785.865 ops/s
# Warmup Iteration   6: 3385717543.102 ops/s
# Warmup Iteration   7: 3396022350.057 ops/s
# Warmup Iteration   8: 3411431621.017 ops/s
# Warmup Iteration   9: 3319938155.273 ops/s
# Warmup Iteration  10: 3332124042.687 ops/s
# Warmup Iteration  11: 3347645883.268 ops/s
# Warmup Iteration  12: 3423214141.285 ops/s
# Warmup Iteration  13: 3441253028.054 ops/s
# Warmup Iteration  14: 3365322186.656 ops/s
# Warmup Iteration  15: 3343972077.089 ops/s
# Warmup Iteration  16: 3415458444.971 ops/s
# Warmup Iteration  17: 3402251665.801 ops/s
# Warmup Iteration  18: 3400978674.044 ops/s
# Warmup Iteration  19: 3413349037.553 ops/s
# Warmup Iteration  20: 3403361543.746 ops/s
Iteration   1: 3399995763.701 ops/s
Iteration   2: 3409391709.668 ops/s
Iteration   3: 3415116214.577 ops/s
Iteration   4: 3414934176.124 ops/s
Iteration   5: 3409917182.415 ops/s
Iteration   6: 3412647929.193 ops/s
Iteration   7: 3388220513.650 ops/s
Iteration   8: 3410504958.106 ops/s
Iteration   9: 3383328646.166 ops/s
Iteration  10: 3406810945.651 ops/s
Iteration  11: 3403383369.919 ops/s
Iteration  12: 3404488881.459 ops/s
Iteration  13: 3407958675.669 ops/s
Iteration  14: 3301403587.174 ops/s
Iteration  15: 3269620734.668 ops/s
Iteration  16: 3295182745.114 ops/s
Iteration  17: 3346192654.216 ops/s
Iteration  18: 3384530003.789 ops/s
Iteration  19: 3414951427.705 ops/s
Iteration  20: 3307876501.996 ops/s

# Run progress: 10.00% complete, ETA 00:06:01

-- result truncated here --

# Fork: 10 of 10
# Warmup Iteration   1: 3465349583.394 ops/s
# Warmup Iteration   2: 3450778631.282 ops/s
# Warmup Iteration   3: 3404258231.034 ops/s
# Warmup Iteration   4: 3218690517.538 ops/s
# Warmup Iteration   5: 3388580053.688 ops/s
# Warmup Iteration   6: 3442469173.353 ops/s
# Warmup Iteration   7: 3331060205.288 ops/s
# Warmup Iteration   8: 3359455385.673 ops/s
# Warmup Iteration   9: 3365960924.293 ops/s
# Warmup Iteration  10: 3349136633.682 ops/s
# Warmup Iteration  11: 3305239679.302 ops/s
# Warmup Iteration  12: 3281576970.936 ops/s
# Warmup Iteration  13: 3334584303.352 ops/s
# Warmup Iteration  14: 3397047063.204 ops/s
# Warmup Iteration  15: 3446585405.562 ops/s
# Warmup Iteration  16: 3462052399.469 ops/s
# Warmup Iteration  17: 3469669282.673 ops/s
# Warmup Iteration  18: 3467353571.025 ops/s
# Warmup Iteration  19: 3432096727.100 ops/s
# Warmup Iteration  20: 3408804942.829 ops/s
Iteration   1: 3419076834.457 ops/s
Iteration   2: 3370802657.673 ops/s
Iteration   3: 3333528017.715 ops/s
Iteration   4: 3279100042.079 ops/s
Iteration   5: 3193301031.193 ops/s
Iteration   6: 3206499353.954 ops/s
Iteration   7: 3210598380.827 ops/s
Iteration   8: 3291320676.502 ops/s
Iteration   9: 3259501914.847 ops/s
Iteration  10: 3290153063.104 ops/s
Iteration  11: 3352878666.533 ops/s
Iteration  12: 3457643740.172 ops/s
Iteration  13: 3327667070.281 ops/s
Iteration  14: 3318658287.072 ops/s
Iteration  15: 3427033859.142 ops/s
Iteration  16: 3430039114.839 ops/s
Iteration  17: 3358692258.583 ops/s
Iteration  18: 3298970252.008 ops/s
Iteration  19: 3324452158.535 ops/s
Iteration  20: 3375736482.747 ops/s


Result "helloWorld":
  3367784618.613 (99.9%) 15478552.094 ops/s [Average]
  (min, avg, max) = (3193301031.193, 3367784618.613, 3488743801.638), stdev = 65537158.782
  CI (99.9%): [3352306066.519, 3383263170.707] (assumes normal distribution)


# Run complete. Total time: 00:06:41

Benchmark                  Mode  Cnt           Score          Error  Units
JMHHelloWorld.helloWorld  thrpt  200  3367784618.613   15478552.094  ops/s

JMH Benchmark for Java API’s Sort

Let us know see a practical example. Assume that we need to sort a list of integer numbers and we should decide what data structure/API should be used for sorting the numbers. We are not going to write our own sort algorithm as Java has provided API. There are two options available either to use primitive int and Arrays or use Integer and Collections.sort. Now let us do a micro benchmarking between these two options and analyze the results. Before going into benchmarking we should understand the following terms.

Throughput:

Rate at which the processing is done. @BenchmarkMode({Mode.Throughput}) calculates the operations per second. The timebound can be configured. Just a count of how many time the method is executed for the given time duration.

Average Time:

Measures the average execution time. @BenchmarkMode({Mode.AverageTime}) calculates seconds by operations. The timebound can be configured. Its the reciprocal of throughput.

JMH has got lots of options using which we can tune our benchmarks. This tutorial is to help you to get started with running the tests and I recommend you to go through the JMH documentation for complete set of APIs and it usages.

package com.javapapers.java.benchmark.jmh;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Random;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@State(Scope.Thread)
public class JMHSortBenchmark {

	List<Integer> arrayList;
	int[] array;
	Random random;

	@Setup(Level.Trial)
	public void init() {
		random = new Random();
		array = new int[25];
		arrayList = new ArrayList<Integer>();
		for (int i = 0; i < 25; i++) {
			int randomNumber = random.nextInt();
			array[i] = randomNumber;
			arrayList.add(new Integer(randomNumber));
		}
	}

	@Benchmark
	public void arraysSort() {
		Arrays.sort(array);
	}

	@Benchmark
	public void collectionsSort() {
		Collections.sort(arrayList);
	}

	public static void main(String[] args) throws RunnerException {

		Options options = new OptionsBuilder()
				.include(JMHSortBenchmark.class.getSimpleName()).threads(1)
				.forks(1).shouldFailOnError(true).shouldDoGC(true)
				.jvmArgs("-server").build();
		new Runner(options).run();

	}
}

Benchmark results:

I ran the test with two different count of numbers. First to sort 25 integers and then to sort 1000 integers and found the following interesting benchmark results.

benchmark1

JMH Benchmark to Sort 25 integers

Benchmark                          Mode  Cnt         Score        Error  Units
JMHSortBenchmark.arraysSort       thrpt   20  28779058.746   446412.014  ops/s
JMHSortBenchmark.collectionsSort  thrpt   20  26070145.869   206077.160  ops/s

JMH Benchmark to Sort 1000 integers

Benchmark                          Mode  Cnt        Score       Error  Units
JMHSortBenchmark.arraysSort       thrpt   20  3795959.757   43679.226  ops/s
JMHSortBenchmark.collectionsSort  thrpt   20   853014.250    6256.061  ops/s

Above is only the summary of the benchmark results. When the number of integers to sort increases, collections sort API decreases in performance. There is no significant difference in the primitive side even when the count varies significantly. This benchmark is to showcase JMH and how to get started with it. This is not to introspect the Java sort APIs and so the results may not exactly reflect the truth.

Download the JMH example Eclipse project and complete benchmark results java-benchmark-jmh

Comments on "Java Micro Benchmark with JMH"

Kenny says:

27/07/2015 at 9:52 am

Excellent article. Especially the what and how of benchmark intro part is very informative. I will give a try for jmh and then post my experiences.
Resse says:

27/07/2015 at 10:24 am

This is the difference between JavaPapers and ordinary tutorials sites. Good quality yet simplified content. Thanks.
Winnie says:

27/07/2015 at 11:10 am

Never knew that there is so much to a micro benchmark. I have been using just the system time and judging the performance.

Thanks for this wonderful tutorial. Now i needto learn jmh.
Vinayak Kumar says:

27/07/2015 at 12:35 pm

Thanks Joe for this nice tutorial.
firoz says:

27/07/2015 at 1:57 pm

i am not in EE platform but really today i get to know something new. But I’ll try this with simple Code. like you told in last point, like ArraySort and collection… Really feeling great aster know about new thing…
Rohan says:

29/07/2015 at 4:01 pm

@Joe,Hi!
please plan to publish some article on Spring boot+angularJS integration as there are a very few resources available online and your teaching style is authentic.
Java String vs StringBuilder vs StringBuffer Concatenation Performance Micro Benchmark - Java Tutorial Blog says:

03/09/2015 at 1:49 am

[…] also wanted to put JMH to use for a proper scenario. JMH is a Java Microbenchmark Harness (JMH), a open source software by OpenJDK to create benchmarks for Java programs. So I wrote a small Java […]

Comments are closed for "Java Micro Benchmark with JMH".