Traveling the road from Idea all the way to OpenJDK

Donald Raab
11 min readJul 20, 2022

--

See something. Say something. Prove something. Do something.

The Tower Bridge on the River Thames, London in May 2013

Starting with an Idea

In May 2013, I was in traveling on business in London. The GS Collections library had been released to open source on GitHub a year and a half earlier (January 2012). I was a member of the JSR 335 Expert Group, and would post in the lambda-lib-spec-experts mail list when I wanted to share an opinion on a topic being discussed.

See Something

While I was in London, I was actively testing GS Collections with the Java Stream library using a binary snapshot of the pre-Java 8 release. I recall spending several hours in my hotel room, across from St. Paul’s Cathedral, puzzling over a peculiar performance difference I was seeing between stream and parallelStream when used with GS Collections’ FastList class.

My hotel in London, May 2013

The Java Microbenchmark Harness project (JMH) did not yet exist in open source at this point in time. The first publicly available version of JMH that I could see in GitHub was tagged as a release in November, 2013. I had written and used my own microbenchmark harness in Java for several years. The tool was fairly simple, but I often questioned the accuracy of results. The differences I was seeing between stream and parallelStream for FastList were so stark and surprising, I believed I had discovered something that needed to be addressed.

I knew that access to Stream would be introduced in Java as default methods named stream() and parallelStream() on the java.util.Collection interface. Any interface or class that extended or implemented java.util.Collection would inherit the default implementation of the stream() and parallelStream() methods. Both stream and parallelStream depended on another default method named spliterator. I knew that FastList and the other mutable containers in GS Collections would immediately be able to leverage Java Streams because they inherited from Collection.

I ran some benchmarks comparing stream and parallelStream with eager iteration methods provided by GS Collections for both FastList and ArrayList. The benchmarks were measuring the average time in milliseconds for each operation, so smaller is better in the charts.

Here are the performance numbers I saw for GS Collections’ FastList.

Filtering 1,000,000 element FastList using stream, parallelStream and GSC eager — Smaller is Better

Here are the performance numbers I saw for ArrayList.

Filtering 1,000,000 element ArrayList using stream, parallelStream and GSC eager methods — Smaller is Better

Eager filtering in GS Collections (green bars) performed better than stream/parallelStream (blue bars) filtering in all cases. This met my expectations, as eager methods will almost always outperform for a single operation like filter. The difference in performance between parallelStream filtering with FastList and parallelStream filtering with ArrayList is the thing that confused and concerned me. I expected FastList performance to be very similar to ArrayList.

The benchmarks were run using pre-release Java 8 binaries on a dual core Windows machine. Seeing close to a 2x performance increase for the parallel cases is what I would have hoped for with filtering. The parallelStream case for FastList, ran slower than the stream (serial) case. This made no sense to me, so I went and looked at the code for stream and parallelStream to try and figure out why I was seeing this unexpected difference.

Say Something

The problem seemed to center around the default implementation of spliterator on Collection that was returning an IteratorSpliterator. ArrayList returns a specialized spliterator implementation called ArrayListSpliterator. There was no spliterator provided for non-JDK RandomAccess List implementations, like FastList. I would propose in the following thread that a RandomAccessSpliterator should be added to the JDK, so non-JDK RandomAccess List implementations would not suffer significant performance penalties when using parallelStream. I reported my findings in the lambda-lib-spec-experts group mailing list. Prior to Java 8, this was the group to communicate anything around Java Streams. The following link shows the thread I started in the mail list the week that I was in London running my tests.

The response to my proposal of a RandomAccessSpliterator was quite receptive. However, just because I shared an idea, didn’t mean I didn’t have A LOT more work to do. I was planning to attend the JVM Language Summit (JVMLS) in July of 2013 and discuss my findings in person with Brian Goetz and other members of the JSR 335 Expert Group. Unfortunately, a month after I initially reported my findings, my life was thrown into absolute turmoil, and I wound up abandoning the idea for several years.

An unexpected detour and a very thoughtful gift

A week before I was going to go to JVMLS 2013, my wife was admitted to the hospital and diagnosed with AML (Leukemia). I wrote about this story for the first time a couple of years ago.

With my wife in the hospital receiving invasive chemo for months at a time, and ultimately a stem cell transplant, I was set into autopilot mode just trying to help family survive. Any findings I had shared a couple of months prior were forgotten (but not lost) in the tumultuous times I was now going through.

My friend and colleague Vlad attended the JVMLS in 2013 without me, and returned with a special gift in the form of a JVMLS 2013 coffee thermos.

JVMLS 2013 Thermos

I keep this incredibly thoughtful gift on my office shelf in front of me where I can see it every day.

Another detour — Eclipse Collections

After Java 8 was released in March 2014, I decided after many discussions that the best path forward for GS Collections was to move to the Eclipse Foundation, where it would become Eclipse Collections. The actual migration work for Eclipse Collections was completed by Hiroshi who would become the first Eclipse Collections Project Lead. Hiroshi Ito is also one of the heroes in this story of RandomAccessSpliterator.

Me and Hiroshi Ito at JavaOne 2015 in San Francisco, where we would present Eclipse Collections by Example

Prove Something

Once Eclipse Collections was migrated to the Eclipse Foundation at the end of 2015, I finally revisited my idea from two years prior. I set out to reconfirm my concerns of the parallel performance of Java Streams with FastList. The initial release of Eclipse Collections 7.0 was feature identical to GS Collections 7.0. Eclipse Collections 7.0 and GS Collections 7.0 were both compiled with Java 7. This would mean neither library would have overrides for stream(), parallelStream() or spliterator() and would inherit any default implementations from Collection and List. The default implementation of spliterator() was still IteratorSpliterator in Java 8. IteratorSpliterator was woefully suboptimal for parallelStream for List implementations that were RandomAccess.

I wrote some benchmarks using JMH (it was now available), and worked with Hiroshi Ito who wrote a first implementation of RandomAccessSpliterator that I used to test with. IIRC, I ran the benchmarks on a quad core Windows machine (apologies, I can’t recall the exact machine/software specs and didn’t write them down). The following were the results of filtering the evens from a list from 1 to 1,000,000 with the default implementation of IteratorSpliterator used for FastList. The results were in operations per second, so bigger is better.

Filtering 1,000,000 elements from ArrayList/FastList using stream/parallelStream — Bigger is Better

When we plugged in the custom RandomAccessSpliterator implementation that Hiroshi was working on, this was the change we saw in performance.

Filtering 1,000,000 elements from ArrayList/FastList using stream/parallelStream — Bigger is Better

This is exactly what we had hoped to see. Once we were confident that the performance differences we were seeing between IteratorSpliterator and RandomAccessSpliterator were both compelling and reproducible, Hiroshi began communicating with the core-libs-dev mail group.

Onward to OpenJDK

Do Something

This was the initial email with patch sent to the core-libs-dev mail group by Hiroshi Ito. The email was dated May 11, 2016 which was three years and two days after I originally shared the idea with the lambda-lib-spec-experts mail group.

The implementation work on RandomAccessSpliterator was a joint effort between Hiroshi Ito and Paul Sandoz. I was just the idea guy and performance testing person. Hiroshi and Paul did all of the real heavy lifting and unit testing for the implementation of RandomAccessSpliterator. I am extremely thankful for the efforts of both of them.

The following bug exists as a reference in the OpenJDK bug system for this feature.

Now that OpenJDK is on GitHub, the source for RandomAccessSpliterator can be viewed in the AbstractList source where it resides as a static class.

Java 9 and RandomAccessSpliterator

The RandomAccessSpliterator code was delivered in time to be released with Java 9 (September 2017). RandomAccessSpliterator would become the default spliterator for FastList and any other non-JDK RandomAccess List in any other library, that did not provide their own spliterator implementation.

The Long Wait

While RandomAccessSpliterator made it in time for Java 9, many companies will have waited for Java 11 before they would have seen any benefits of the optimization. It’s only been fairly recently that a large number of applications have made the migration effort from Java 8 to Java 11 or Java 17.

By the time Java 9 was released, Eclipse Collections had upgraded FastList (as of EC 8.1) and ImmutableArrayList (as of EC 9.0) to have custom spliterator overrides. However there are still classes today in Eclipse Collections that will wind up using the default implementation of spliterator that returns RandomAccessSpliterator, including a class named ArrayListAdapter.

I decided to run the same benchmarks I ran over 6 years ago only this time using Java 17 with java.util.ArrayList and Eclipse Collections’ ArrayListAdapter to see if a parallelStream has reasonably similar performance characteristics.

Filtering evens from a 1,000,000 element list using stream / parallelStream — Bigger is Better

The results were pretty much as I expected. The ArrayListAdapter is wrapping the ArrayList instance, so I did expect it to have some extra cost due to having to delegate all calls to the underlying ArrayList. That said, RandomAccessSpliterator appears to be doing its job pretty well. ArrayList has a specialized spliterator override which returns an ArrayListSpliterator and it is not that much different from RandomAccessSpliterator in this benchmark run.

I ran these benchmarks on a 2.7 Ghz quad core Intel Core I7 MacBook Pro (Early 2013) running with the following JDK and version of JMH.

# JMH version: 1.35
# VM version: JDK 17, OpenJDK 64-Bit Server VM, 17+35–2724

The source code for the benchmarks is here.

Reflections on a road less traveled

Be prepared

Proposing and getting a change into the OpenJDK requires a serious commitment of patience, time and effort. This story demonstrates that it is entirely possible to contribute to the OpenJDK, but may not be on the timeframe you would expect. I’ve also had ideas rejected several times. It’s not personal. The JDK is used by millions of developers for over a quarter century to write and run millions of applications. The developers maintaining this amazing code base are experts and we should be careful not to waste their time.

You should submit your ideas as a proposal to the core-libs-dev mail list. Search first to see if your idea has already been proposed and/or implemented.

It’s possible you may have a great idea, and it still might get rejected. There is a cost/benefit tradeoff that has to be understood and weighed by the OpenJDK developers. Your amazing idea may be deemed not practical and too expensive.

Many ideas don’t make it, and that is perfectly okay

Here is an example of an idea that I proposed, that has not been implemented yet, and may never be, as it may just not be practical or worth the cost. The developer who responded to my initial proposal, Claes Redestad, was extremely thoughtful and helpful in his responses in the mail list. He even built a proof of concept of the method in a pull request with some benchmarks. I am very thankful for the time he took to look into my idea.

There are perspectives that folks who work on the OpenJDK codebase all the time have that regular developers coding in Java may not fully understand or appreciate. Be prepared to have your idea challenged, and to possibly learn that there is something more complex or complicated hiding in that simple code example you provided.

I learned in the String.join case that the code I was looking at in the JDK 17 release was not actually the code that was currently in the OpenJDK repository being prepared for JDK 18. You can now see the difference for String.join between JDK 18 and JDK 17.

Start small and stay focused

My recommendation for proposing changes, improvements, bug fixes to the JDK is to start small and stay focused. If you see something, and then work up the courage to say something, be prepared to prove something, and if you get past that point get ready to commit yourself to doing something. Once you are successful with a few small wins, you may be able to move on to bigger changes.

The road might take longer than you expect, but the final destination may be worth it for you. I can tell you after nine years of waiting, and finally experimenting with the performance of RandomAccessSpliterator in Java 17 and seeing the results I expected, the time I spent and waited was well worth it.

Thank you, Hiroshi, Paul, Brian, Doug, Claes, the entire OpenJDK team and all the members of the JSR 335 Expert Group.

I am the creator of and a Committer for the Eclipse Collections OSS project which is managed at the Eclipse Foundation. Eclipse Collections is open for contributions.

--

--

Donald Raab

Java Champion. Creator of the Eclipse Collections OSS Java library (https://github.com/eclipse/eclipse-collections). Inspired by Smalltalk. Opinions are my own.