Map-Oriented Programming in Java
Using MOP may be convenient sometimes, but it can also be messy.
To the Batpoles!
I created polls in Twitter(!X) and LinkedIn asking if developers use a Bag/Multiset
type or if they just use Java Map
. Unsurprisingly, java.util.Map
is dominating both polls.
(Apologies for anyone who hasn’t seen the 1960s TV version of Batman… the Batpoles were two poles hidden behind a sliding bookcase and Batman and Robin would use them to slide down to the Batcave.)
The question is not meant to determine whether Bag
or Map
is a better type. Both interfaces serve different purposes, and have different behaviors. A Map
associates keys to values. A Bag
is an unordered non-unique collection that makes it easy to track counts of things, and is usually backed by a Map
to speed up lookups. A Map
can be used to keep track of counts, the same as a hammer can be used to punch a hole through concrete instead of using a jackhammer. No one I know would argue that a hammer is a better tool to punch a hole through concrete, but if you don’t have a jackhammer on hand, you whack the heck out of that concrete with the hammer that you do have.
When your only tool is a Map, everything else is a key-value pair
Anyway, back to the poll and the ultimate question… There are three libraries named in the poll that provide a Bag/Multiset
type in Java — Google Guava, Apache Commons Collections, and Eclipse Collections. Then there is Map
from Java. The question is really are you willing to take a third-party dependency in your application to get a Bag/Multiset
type, or are you happy to stick with Map-Oriented Programming (MOP)? Most developers these days try to limit their third-party dependencies for a variety of reasons (binary size, version conflict resolution, potential vulnerabilities, etc.). This leaves most developers in a position to just leverage the Map-Oriented Programming alternatives sometimes provided by the JDK, or alternatively build their own Bag/Multiset
solution. Most developers I know, go with the Map-Oriented Programming solution.
The following quote is the essence of Map-Oriented Programming (MOP).
We need to get $h!t done now! We’ve got a
Map
and we’re not afraid to use it!
The problem with MOP is that while Map
is super flexible with the data it can contain in the key and value slots is provides, it’s behavior stays the same. It’s just a Map
. You can put things in, and get things out… including null
or any other random type. Over the years, the Java Map
interface has added new Map
specific behavior that enhances the flexibility like being able to merge elements, compute them or get a default value if a key is absent. More specialized behavior, like counting or adding to/removing from a collection in one of the value slots, which is not part of the Map
contract leaks out into your code, or into algorithms tacked on to Stream
and Collector
. You lose the ability to have types and structures that provide augmented behavior on top of the Map
.
I’ve enjoyed the benefits of using both dynamic type systems and static type systems having worked professionally in both Smalltalk and Java. Data structures like Map
sometimes give you the feeling of the benefits of a dynamic type system, without the benefits of the static type system. I like having a static type system for a lot of reasons, even though it sometimes slows me down when I am developing on my own. I’m not a fan of Map-Oriented Programming, but I will confess to having used it on occasion when it provided a short-term convenience where adding new types was a hassle. When I discover the need for a new type, I usually add it. This is sometimes the hard path, but it is usually the right path. There is a cost for every new type we add to our applications, but there are also the benefits of communication, clarity, encapsulation, reduction in code duplication, increased safety, and improved performance.
Shortcuts aren’t
There are three types the JDK is missing, that have been represented with Map
as an alternative. Using Map
is leveraging existing type flexibility to avoid cost, in this case for the framework developers. Any incurred cost is moved instead to the application developers that use Map
as a return type. The following Map
return types used in Collectors
can never be changed. These Map
return types were introduced on Collectors
when Java 8 was released.
// Map<Boolean, List<T>> -> Pair<T, T>
Collectors.partitioningBy()
// Map<T, Long> -> Bag<T>
Collectors.groupingBy(Collectors.counting)
// Map<K, Collection<V>> -> Multimap<K, V>
Collectors.groupingBy()
Pair
, Bag
, and Multimap
are just some of the types missing from the JDK. We can call Pair
something more specific in the case of partitioningBy
like Partition
, but it’s still essentially a Pair
of two things of the same type.
We don’t need a Pair type!
A well-reasoned decision was made to not add a generic Pair
type or support for generic tuples in Java. Instead, the use of named types created via Java Records are recommended for Java developers since Java 16 was released. This is a reasoned decision, that I fully support, even if the open source framework I created (Eclipse Collections) has Pair
and Triple
types. I appreciate creating specialized types for things, and being able to do so with very little ceremony using Java Records is awesome.
Please fasten your seatbelts for what comes next.
We’ve got a Map!
Using a Map
as a generic Pair
type is arguably worse than adding a generic Pair
type. How do you use a Map
as a Pair
? There is an example in the Stream
and Collectors
code in the JDK with partitioningBy
.
Let’s look at the following example of partitioningBy
, where we will one pass filter a Stream
of Integer
into separate List
instances of evens and odds.
@Test
public void partitioningBy()
{
Map<Boolean, List<Integer>> map =
IntStream.rangeClosed(1, 10)
.boxed()
.collect(Collectors.partitioningBy(each -> each % 2 == 0));
List<Integer> evens = map.get(true);
List<Integer> odds = map.get(false);
List<Integer> ummm = map.get(null);
List<Integer> ohno = map.get(new Object());
Assertions.assertEquals(List.of(2, 4, 6, 8, 10), evens);
Assertions.assertEquals(List.of(1, 3, 5, 7, 9), odds);
Assertions.assertNull(ummm);
Assertions.assertNull(ohno);
ummm = map.getOrDefault(null, evens);
Assertions.assertEquals(List.of(2, 4, 6, 8, 10), ummm);
ohno = map.getOrDefault(new Object(), odds);
Assertions.assertEquals(List.of(1, 3, 5, 7, 9), ohno);
}
This code takes a Stream
of Integer
from 1
to 10
and filters the even values into one List
and the odd values into another List
using partitioningBy
. The result is a Map<Boolean, List<Integer>>
. The true
values in the Map
are the ones that filter
inclusively. The false
values in the Map
are the ones that filter
exclusively. The null
values in the Map
are the ones that… wait, why are there null
values in the Map
? Why is there a new Object()
lookup in a Map<Boolean, List<Integer>>
? What is happening here!?! Recall that Map
existed before Java 5 when generics were added to Java, and the get
method on Map
is not generic and accepts any Object
. Ahhhh… Map
.
If you’ve never dug into the result of partitioningBy
before, it returns an instance of type named Partition
that is a an inner class in Collectors
. I knew the partitioningBy
method returned a Map<Boolean, List<Type>>
, but wasn’t aware of the actual implementation until I looked today. The Partition
type is immutable, but still behaves like a Map
as I illustrate above. The get
method on Map
is not generic, so accepts ANY object of ANY type. The Partition
class does not throw on non-boolean
access via get
, but instead returns null
. Lookups with potentially any type will result in null
. The method getOrDefault
or any of the other read-only Map
methods behave consistently with other Map
types. Mutable methods like put
throw exceptions.
What about using a primitive BooleanObjectMap?
For those wondering if I would propose using a primitive version of a BooleanObjectMap
from Eclipse Collections to solve the generic get
problem with Map
… I wouldn’t, and I can’t. A BooleanObjectMap
type doesn’t exist in Eclipse Collections. When we designed the primitive Map
hierarchy, we made a conscious decision to remove ALL combinations of primitive maps with boolean as a key. There are no boolean to anything maps in Eclipse Collections.
Why?
Having a Map
of boolean to anything had a design smell of using Map
as a hammer. If you need two values, one for true
and one for false
, then use two variables to hold the values and put those values in a specific type. The variables in this new type can have intention revealing names (e.g. selected
and rejected
, in Eclipse Collections PartitionIterable
) instead of less meaningful names like ifTrue
and ifFalse
, or Boolean
values in a Map
. If you want to pass these values around together in a single generic instance of something, because you can’t or don’t want to add a new type, then use a generic Pair
. Buyer beware. You will get less meaningful names for your contained values with a Pair
as well (one
and two
, or left
and right
).
What if we used an Enum for the key type instead of Boolean?
Another option using a Map
to represent a pair of the same type would be to use an Enum
for the key, where the names in the Enum
have intention revealing names (eg. Filter.SELECTED
, Filter.REJECTED
). Then we could write map.get(Filter.SELECTED)
instead of map.get(true)
.
Why not?
This solution requires a new type of Enum
to be created to hold these key names. If we already have to add a new type, and it would just be better to define the specific type we need with the named variables and types. (e.g. Partition
type with the selected
and rejected
variables). The better names in the Enum
also wouldn’t solve the generic problems with the get
method on Map
. In fact, you could still write map.get(true)
and it would return null
.
Stop Hammer time!
I think it is better for the JDK to leverage its static typing benefits and return specific types instead of returning Map
, whenever possible. I think returning the Partition
type for partitioningBy
would have made more sense to return instead of a Map
. This would have meant exposing a new public type. The Partition
type is private
static
. The new public type wouldn’t need to be a completely generic type like Pair
. Eclipse Collections partition
method on RichIterable
returns a PartitionIterable
type. There is a cost to adding/maintaining this type and all of its subtypes that is handled by the Eclipse Collections developers. This gives developers who use the library the safest most specific alternative at various levels in the type hierarchy.
@Test
public void partition()
{
PartitionMutableList<Integer> partition =
Interval.oneTo(10)
.partition(each -> each % 2 == 0);
MutableList<Integer> selected = partition.getSelected();
MutableList<Integer> rejected = partition.getRejected();
Assertions.assertEquals(List.of(2, 4, 6, 8, 10), selected);
Assertions.assertEquals(List.of(1, 3, 5, 7, 9), rejected);
}
There are two other places where Collectors
returns Map
as a type that would have been better off as more specific types. The problem is convenience and cost. It was more convenient in the Java 8 release to return a Map
, instead of a Bag
or Multimap
, because it would have meant introducing Bag
and Multimap
types and implementations, which would have delayed the Java 8 release, potentially significantly. Having seen these types created in Eclipse Collections many years ago, I can confirm they are both expensive to build and to test. Unfortunately, we are stuck with the decision to go with convenience and return type of Map
forever in Collectors
.
I have blogged previously about Map
vs. Bag
and Map
vs. Multimap
. Read the blogs at the links below if you would like to understand more.
For some other examples and details of partition
in Eclipse Collections, there is a blog for that below. The most interesting thing for some developers in this blog may be the PartitionIterable
hierarchy that was implemented to support the covariant nature of the partition
method.
Whither Map-Oriented Programming
Map
is a hammer. It’s a very useful and convenient tool, but we fall back on Map
as an all purpose tool and flexible return type too much. Java Records give us a new level of convenience with the benefit of static typing. Additional Collection
types like Bag
and Multimap
augment the capabilities of Map
with different specialized behaviors for developers to leverage.
In the data-oriented programming space, I prefer solutions like Dataframe
libraries which are much more specific than row-based Collection
of Map
about their features and purpose. I think Java Record
gives a nice low-ceremony alternative for creating Collection
of Record types that can be statically type checked. This helps provide type safety, memory efficiency, and performance.
I hope to see additional Collection
types incorporated in the JDK, instead of continuing to use Map
as a convenient but messy alternative. I believe the JDK should have Partition
, Bag
, and Multimap
types. Partition
is already there as an implementation. Partition
just needs to stop pretending to be a Map
and made public or represented by a more specific and constrained interface. Unfortunately, since partitioningBy
already returns a Map
, this method will not likely ever be changed, but could be instead deprecated and replaced by an alternative with a better return type.
I hope this blog made you think about the cost/benefit of using Map
as an all purpose return type. My recommendation — don’t! Use Map
as a return type only when it is the absolute best option available for a method. If another type would be a better option as a return type, then create it or use it if it already exists.
Thank you for reading!
I am the creator of and committer for the Eclipse Collections OSS project, which is managed at the Eclipse Foundation. Eclipse Collections is open for contributions.