The slides that didn’t make the 50 minute time limit for our talk.
No time? No problem.
While working on a performance talk for QCon New York, my co-speaker Rustam Mehmandarov and I had more material than we had time for during our presentation. Our solution was simple. Don’t delete the slides. Move them to the Appendix.
The slides are available as AsciiDoc in this GitHub repo. The talk was about memory-efficiency, and the Appendix contains some more examples folks might find interesting.
I also wrote a prequel blog for the talk, which goes into much more detail about the historical context for the talk. The following is the link to the prequel blog titled “Sweating the small stuff in Java.”
Sweating the small stuff in Java
The story of small FixedSizeCollection types in Eclipse Collections
Writing the prequel blog saved about 15 minutes from the talk.
Does anyone ever look at the Appendix?
I know I do occasionally. Here’s the Appendix for our talk. You will find some links to resources on the first page, but there is more. The following sections of the blog will show the slides as they would appear in IntelliJ which is what we used along with AsciiDoc in the live presentation.
Appendix 0 — Resources
The first page has some useful links to resources we used or referenced in the talk.
- Eclipse Collections (creator: Donald Raab)
- DataFrame-EC (creator: Vladimir Zakharov)
- Jackson Dataformat CSV (creator: @cowtowncoder )
- Jackson Datatypes Collections
We had referenced the Java Object Layout tool earlier in our talk, which is the tool we used for measuring memory footprints. Here’s a link to the slide with the references to JOL that will help explain how we came up with some of the example slides that follow. The following image shows the slide as it appeared in our talk.
Appendix 1 — Boxed vs. Primitive Lists
We didn’t have time to show every memory cost comparison during the talk that we did, so here’s the one where we compared a
Integer with an
List contains integer values 1 through 10.
Note, the extra cost here of 160 bytes for
ArrayList is due to the boxing of
int values as
Appendix 2 — Mutable vs. Immutable Lists
The JDK provides both Mutable and Immutable
List implementations now. They both implement the
List interface. Most folks won’t realize that the Immutable List implementations are more memory efficient than their Mutable counterparts. This is because they are trimmed-to-size since they don’t change. There are
ImmutableCollections$List12 implementations. The latter should be read as
ListTwelve, which is how I read it when I first saw the class. This class contains either one or two elements.
In this example, we created a
List with two Integer instances. The first class we used is
ArrayList and then we created a copy of the
ArrayList into an Immutable List using
The boxing cost is the same between the Mutable and Immutable List implementations in the JDK, but the
List12 instance does not have a default sized array of size 10 like the
Appendix 3 — Boxed vs. Primitive Map of Long → Set of Long
I was asked on Twitter if there was a more efficient way of creating a
Long for 200,000
Long keys using Eclipse Collections. The short answer is yes, as long as you don’t box the
24 bytes for each
Long object. These can add up quickly depending on your use cases. Don’t box!
Appendix 4 — Caching vs. Pooling
We discussed pooling in our talk, and desxcribed some of the pools built into the JDK like
String.intern() and the boxed Number pools available through
valueOf methods on the integral value types
Long. Caching is subtly different in that lookups for an object are usually provided via some index. Pooling provides uniquing and lookup is based on the instance you are looking for.
Country is implemented as a
record, and we keep a cache of
Country instances indexed by the country name in a
Appendix 5 — Scaling Conferences x50
In the talk, we covered an example that scaled from 1 million
Conference instances to 25 million. A few days before the talk, we tried it again with 50 million and 100 million instances, with the memory tuning done for one of the four row based solutions (Eclipse Collections
ImmutableList). The attempt to load 100 million instances failed with
OutOfMemoryError. I did not have time to research what the cause of the
OutOfMemoryError and see if it was fixable.
Here is the slide with 50 million instances of
The intent here is to show how scaling impacts total memory savings. By manually tuning one of the row based solutions with a savings of 16 bytes per
Conference, we were able to save over 800MB of memory. If you target the multipliers in your data, even small memory savings can become significant.
Thank you and Enjoy!
Rustam and I had a blast presenting at QCon New York this year, and wanted to thank the conference organizers, our track host Neha Sardana, and everyone who attended our talk! I hope you enjoy the bonus slides I shared here that didn’t make the cut for the talk.
Thank you for reading, and Happy Father’s Day!