Bonus Slides from QCon NY 2023
The slides that didn’t make the 50 minute time limit for our talk.
No time? No problem.
While working on a performance talk for QCon New York, my co-speaker Rustam Mehmandarov and I had more material than we had time for during our presentation. Our solution was simple. Don’t delete the slides. Move them to the Appendix.
The slides are available as AsciiDoc in this GitHub repo. The talk was about memory-efficiency, and the Appendix contains some more examples folks might find interesting.
I also wrote a prequel blog for the talk, which goes into much more detail about the historical context for the talk. The following is the link to the prequel blog titled “Sweating the small stuff in Java.”
Writing the prequel blog saved about 15 minutes from the talk.
Does anyone ever look at the Appendix?
I know I do occasionally. Here’s the Appendix for our talk. You will find some links to resources on the first page, but there is more. The following sections of the blog will show the slides as they would appear in IntelliJ which is what we used along with AsciiDoc in the live presentation.
Appendix 0 — Resources
The first page has some useful links to resources we used or referenced in the talk.
GitHub Repos
- Eclipse Collections (creator: Donald Raab)
- DataFrame-EC (creator: Vladimir Zakharov)
- Jackson Dataformat CSV (creator: @cowtowncoder )
- Jackson Datatypes Collections
Kata Repos
Articles
We had referenced the Java Object Layout tool earlier in our talk, which is the tool we used for measuring memory footprints. Here’s a link to the slide with the references to JOL that will help explain how we came up with some of the example slides that follow. The following image shows the slide as it appeared in our talk.
Appendix 1 — Boxed vs. Primitive Lists
We didn’t have time to show every memory cost comparison during the talk that we did, so here’s the one where we compared a java.util.ArrayList
of Integer
with an IntArrayList
. Each List
contains integer values 1 through 10.
Note, the extra cost here of 160 bytes for ArrayList
is due to the boxing of int
values as Integer
instances.
Appendix 2 — Mutable vs. Immutable Lists
The JDK provides both Mutable and Immutable List
implementations now. They both implement the List
interface. Most folks won’t realize that the Immutable List implementations are more memory efficient than their Mutable counterparts. This is because they are trimmed-to-size since they don’t change. There are ImmutableCollections$ListN
and ImmutableCollections$List12
implementations. The latter should be read as ListOneTwo
, not ListTwelve
, which is how I read it when I first saw the class. This class contains either one or two elements.
In this example, we created a List
with two Integer instances. The first class we used is ArrayList
and then we created a copy of the ArrayList
into an Immutable List using List.copyOf()
.
The boxing cost is the same between the Mutable and Immutable List implementations in the JDK, but the List12
instance does not have a default sized array of size 10 like the ArrayList
.
Appendix 3 — Boxed vs. Primitive Map of Long → Set of Long
I was asked on Twitter if there was a more efficient way of creating a Map
of Long
to Set
of Long
for 200,000 Long
keys using Eclipse Collections. The short answer is yes, as long as you don’t box the Long
values.
24 bytes for each Long
object. These can add up quickly depending on your use cases. Don’t box!
Appendix 4 — Caching vs. Pooling
We discussed pooling in our talk, and desxcribed some of the pools built into the JDK like String.intern()
and the boxed Number pools available through valueOf
methods on the integral value types Byte
, Short
, Integer
, and Long
. Caching is subtly different in that lookups for an object are usually provided via some index. Pooling provides uniquing and lookup is based on the instance you are looking for.
Country
is implemented as a record
, and we keep a cache of Country
instances indexed by the country name in a Map
.
Appendix 5 — Scaling Conferences x50
In the talk, we covered an example that scaled from 1 million Conference
instances to 25 million. A few days before the talk, we tried it again with 50 million and 100 million instances, with the memory tuning done for one of the four row based solutions (Eclipse Collections ImmutableList
). The attempt to load 100 million instances failed with OutOfMemoryError
. I did not have time to research what the cause of the OutOfMemoryError
and see if it was fixable.
Here is the slide with 50 million instances of Conference
.
The intent here is to show how scaling impacts total memory savings. By manually tuning one of the row based solutions with a savings of 16 bytes per Conference
, we were able to save over 800MB of memory. If you target the multipliers in your data, even small memory savings can become significant.
Thank you and Enjoy!
Rustam and I had a blast presenting at QCon New York this year, and wanted to thank the conference organizers, our track host Neha Sardana, and everyone who attended our talk! I hope you enjoy the bonus slides I shared here that didn’t make the cut for the talk.
Thank you for reading, and Happy Father’s Day!
I am the creator of and committer for the Eclipse Collections OSS project, which is managed at the Eclipse Foundation. Eclipse Collections is open for contributions.