Lazy and inexhaustible
Laziness is a virtue. Sometimes you want it to be repeatable.
In programming, laziness can be a very good thing. Lazy initialization allows us to avoid creating expensive resources until they are needed. Lazy iteration allows us to avoid creating temporary data structures and potentially to reduce the total amount of work needed in order to perform a computation. Two built-in paradigms existed for lazy iteration in Java before Java 8. They are Iterable
and Iterator
. An Iterable
can be used over and over again, as it can continue to give you a brand new Iterator
. An Iterator
can only be used once, as there is no way to reset it once you’ve gotten to the last element via next()
.
In Java 8, Streams were added with methods that are lazy (e.g. map
, filter
, etc.). Stream
is like an Iterator
, in that it can only be used once with a terminal operation like forEach
or collect
. This means you have to be careful not to exhaust a Stream
and then try and use it again.
This is what will happen at runtime if you try and use a Stream
more than once.
java.lang.IllegalStateException: stream has already been operated upon or closed
I am going to use the getAgeStatisticsOfPets
test in Exercise 4 of the Eclipse Collections Pet Kata to illustrate how you can work with a Stream
without getting an IllegalStateException
. I will also show you some other alternatives that are lazy using Eclipse Collections.
First, here’s the code I would like to write for the test in Exercise 4 of the Pet Kata. I am using an IntStream
(obtained via mapToInt
) in order to avoid boxing int
as Integer
. This code compiles but will fail upon execution.
@Test
public void getAgeStatisticsOfPets()
{
IntStream petAges = this.people
.stream()
.flatMap(person -> person.getPets().stream())
.mapToInt(Pet::getAge); Set<Integer> uniqueAges =
petAges.boxed().collect(Collectors.toSet()); IntSummaryStatistics stats = petAges.summaryStatistics(); Assert.assertEquals(Sets.mutable.with(1, 2, 3, 4), uniqueAges);
Assert.assertEquals(stats.getMin(), petAges.min().getAsInt());
Assert.assertEquals(stats.getMax(), petAges.max().getAsInt());
Assert.assertEquals(stats.getSum(), petAges.sum());
Assert.assertEquals(stats.getAverage(),
petAges.average().getAsDouble(), 0.0);
Assert.assertEquals(stats.getCount(), petAges.count());
Assert.assertTrue(petAges.allMatch(i -> i > 0));
Assert.assertFalse(petAges.anyMatch(i -> i == 0));
Assert.assertTrue(petAges.noneMatch(i -> i < 0));
}
The code will run until this line attempts to execute.
IntSummaryStatistics stats = petAges.summaryStatistics();
That’s when the IllegalStateException
is thrown. The call to collect in the previous line caused the Stream
to become exhausted.
One option I have to make the code work is to pre-calculate the pets as a flattened List
and then recreate the IntStream
for the ages as I need them.
@Test
public void getAgeStatisticsOfPets()
{
List<Pet> petAges = this.people
.stream()
.flatMap(person -> person.getPets().stream())
.collect(Collectors.toList()); Set<Integer> uniqueAges =
petAges.stream()
.mapToInt(Pet::getAge)
.boxed()
.collect(Collectors.toSet()); IntSummaryStatistics stats =
petAges.stream()
.mapToInt(Pet::getAge)
.summaryStatistics(); Assert.assertEquals(Sets.mutable.with(1, 2, 3, 4), uniqueAges);
Assert.assertEquals(stats.getMin(), petAges.stream()
.mapToInt(Pet::getAge).min().getAsInt());
Assert.assertEquals(stats.getMax(), petAges.stream()
.mapToInt(Pet::getAge).max().getAsInt());
Assert.assertEquals(stats.getSum(), petAges.stream()
.mapToInt(Pet::getAge).sum());
Assert.assertEquals(stats.getAverage(), petAges.stream()
.mapToInt(Pet::getAge).average().getAsDouble(),
0.0);
Assert.assertEquals(stats.getCount(), petAges.size());
Assert.assertTrue(
petAges.stream()
.mapToInt(Pet::getAge)
.allMatch(i -> i > 0));
Assert.assertFalse(
petAges.stream()
.mapToInt(Pet::getAge)
.anyMatch(i -> i == 0));
Assert.assertTrue(
petAges.stream()
.mapToInt(Pet::getAge)
.noneMatch(i -> i < 0));
}
This works but I had to write a lot of duplicate code. I have to call this code over and over again to recreate the IntStream
of pet ages.
petAges.stream().mapToInt(Pet::getAge)
Since I do not like duplicating code, I want to find a solution for this problem. One solution would be to put this duplicate code in a Supplier
and calculate it on demand by calling the get()
method on the Supplier
.
@Test
public void getAgeStatisticsOfPets()
{
List<Pet> pets = this.people
.stream()
.flatMap(person -> person.getPets().stream())
.collect(Collectors.toList()); Supplier<IntStream> petAges =
() -> pets.stream().mapToInt(Pet::getAge); Set<Integer> uniqueAges =
petAges.get().boxed().collect(Collectors.toSet()); IntSummaryStatistics stats =
petAges.get().summaryStatistics(); Assert.assertEquals(Sets.mutable.with(1, 2, 3, 4), uniqueAges);
Assert.assertEquals(stats.getMin(),
petAges.get().min().getAsInt());
Assert.assertEquals(stats.getMax(),
petAges.get().max().getAsInt());
Assert.assertEquals(stats.getSum(),
petAges.get().sum());
Assert.assertEquals(stats.getAverage(),
petAges.get().average().getAsDouble(),
0.0);
Assert.assertEquals(stats.getCount(),
petAges.get().count());
Assert.assertTrue(petAges.get().allMatch(i -> i > 0));
Assert.assertFalse(petAges.get().anyMatch(i -> i == 0));
Assert.assertTrue(petAges.get().noneMatch(i -> i < 0));
}
This reduces the amount of duplicate code I had to write. I can go one step further and make the flatCollect
not have to collect into a List
, by having the Supplier
do more of the work.
@Test
public void getAgeStatisticsOfPets()
{
Supplier<IntStream> petAges =
() -> this.people
.stream()
.flatMap(person -> person.getPets().stream())
.mapToInt(Pet::getAge); Set<Integer> uniqueAges =
petAges.get().boxed().collect(Collectors.toSet()); IntSummaryStatistics stats =
petAges.get().summaryStatistics(); Assert.assertEquals(Sets.mutable.with(1, 2, 3, 4), uniqueAges);
Assert.assertEquals(stats.getMin(),
petAges.get().min().getAsInt());
Assert.assertEquals(stats.getMax(),
petAges.get().max().getAsInt());
Assert.assertEquals(stats.getSum(),
petAges.get().sum());
Assert.assertEquals(stats.getAverage(),
petAges.get().average().getAsDouble(),
0.0);
Assert.assertEquals(stats.getCount(),
petAges.get().count());
Assert.assertTrue(petAges.get().allMatch(i -> i > 0));
Assert.assertFalse(petAges.get().anyMatch(i -> i == 0));
Assert.assertTrue(petAges.get().noneMatch(i -> i < 0));
}
This almost feels like creating a lazy Iterable
, where each time we need to do something, we create an Iterator
to perform an additional function. In Eclipse Collections, there is a LazyIterable
type, that can be created from any RichIterable
. A LazyIterable
can be used safely as many times as you want. It may be expensive to recalculate the functions over and over again, but it will allow you to do so and will not become exhausted after the first time you use it.
The following shows how you can solve this problem using a LazyIntIterable
with Eclipse Collections.
@Test
public void getAgeStatisticsOfPets()
{
LazyIntIterable petAges = this.people
.asLazy()
.flatCollect(Person::getPets)
.collectInt(Pet::getAge); IntSet uniqueAges = petAges.toSet(); IntSummaryStatistics stats = petAges.summaryStatistics(); Assert.assertEquals(
IntSets.mutable.with(1, 2, 3, 4),
uniqueAges);
Assert.assertEquals(stats.getMin(), petAges.min());
Assert.assertEquals(stats.getMax(), petAges.max());
Assert.assertEquals(stats.getSum(), petAges.sum());
Assert.assertEquals(stats.getAverage(), petAges.average(), 0.0);
Assert.assertEquals(stats.getCount(), petAges.size());
Assert.assertTrue(petAges.allSatisfy(i -> i > 0));
Assert.assertFalse(petAges.anySatisfy(i -> i == 0));
Assert.assertTrue(petAges.noneSatisfy(i -> i < 0));
}
Once I have a LazyIntIterable
, I do not need to box the unique ages into a Set
of Integer
. I can instead store them in an IntSet
as I have above, simply by calling toSet()
on the LazyIntIterable
.
Because LazyIntIterable
is lazy, it does not pre-calculate and store the pet ages. It has to execute the flatCollect()
and collectInt()
each time you call a terminal method like toSet
, summaryStatistics
, min
, max
, sum
, average
, size
, any
/all
/noneSatisfy
. If I want the code to be more efficient, I can pre-calculate the pet ages and store them in an IntList
or IntBag
. I will use an IntBag
here, as there are duplicate ages but order doesn’t matter.
@Test
public void getAgeStatisticsOfPets()
{
IntBag petAges = this.people
.asLazy()
.flatCollect(Person::getPets)
.collectInt(Pet::getAge)
.toBag(); IntSet uniqueAges = petAges.toSet(); IntSummaryStatistics stats = petAges.summaryStatistics(); Assert.assertEquals(
IntSets.mutable.with(1, 2, 3, 4),
uniqueAges);
Assert.assertEquals(stats.getMin(), petAges.min());
Assert.assertEquals(stats.getMax(), petAges.max());
Assert.assertEquals(stats.getSum(), petAges.sum());
Assert.assertEquals(stats.getAverage(), petAges.average(), 0.0);
Assert.assertEquals(stats.getCount(), petAges.size());
Assert.assertTrue(petAges.allSatisfy(i -> i > 0));
Assert.assertFalse(petAges.anySatisfy(i -> i == 0));
Assert.assertTrue(petAges.noneSatisfy(i -> i < 0));
}
All I had to change in the code to make this work was to call the method toBag()
after calling collectInt()
and change the type of petAges from LazyIntIterable
to IntBag
. No other code needed to change. This is because our primitive collections and primitive lazy iterables in Eclipse Collections have good symmetry. Notice how there is no boxing of int
to Integer
objects in either the LazyIntIterable
or IntBag
solution.
I can easily change the type from IntBag
to IntList
, just by changing the toBag()
method call to toList()
.
@Test
public void getAgeStatisticsOfPets()
{
IntList petAges = this.people.asLazy()
.flatCollect(Person::getPets)
.collectInt(Pet::getAge)
.toList(); IntSet uniqueAges = petAges.toSet(); IntSummaryStatistics stats = petAges.summaryStatistics(); Assert.assertEquals(
IntSets.mutable.with(1, 2, 3, 4),
uniqueAges);
Assert.assertEquals(stats.getMin(), petAges.min());
Assert.assertEquals(stats.getMax(), petAges.max());
Assert.assertEquals(stats.getSum(), petAges.sum());
Assert.assertEquals(stats.getAverage(), petAges.average(), 0.0);
Assert.assertEquals(stats.getCount(), petAges.size());
Assert.assertTrue(petAges.allSatisfy(i -> i > 0));
Assert.assertFalse(petAges.anySatisfy(i -> i == 0));
Assert.assertTrue(petAges.noneSatisfy(i -> i < 0));
}
Once again, nothing else needs to change.
When you call min
, max
and average
on an IntStream
, you will get an OptionalInt
or OptionalDouble
. This is a good thing if you have the potential to have an empty result. OptionalInt
and OptionalDouble
will allow you to handle the cases where the result is empty. With Eclipse Collections, there is a different option for these three methods to help in the case where the Iterable
or Collection
is empty.
Assert.assertEquals(stats.getMin(), petAges.minIfEmpty(0));
Assert.assertEquals(stats.getMax(), petAges.maxIfEmpty(0));
Assert.assertEquals(stats.getSum(), petAges.sum());
Assert.assertEquals(stats.getAverage(), petAges.averageIfEmpty(0.0), 0.0);
The methods minIfEmpty
, maxIfEmpty
and averageIfEmpty
allow you to specify a default value to use in the case of an empty result. In the future, we may also add minOptional
, maxOptional
and averageOptional
if there is a need for them.
If you use Streams, and want them to be re-usable, then consider using them in conjunction with a Supplier
. This will reduce the amount of duplicate code you will have to write. If you want inexhaustible laziness out of the box, then consider using Eclipse Collections, as you will get a lot of additional options that you can use in addition to Streams.
I hope this blog was useful and informative and showed some options for using Streams and Eclipse Collections LazyIterables effectively to solve the same problems. I also hope that you try out the Eclipse Collections katas on your own. I often teach the katas using both Streams and Eclipse Collections so developers can learn both APIs and understand what options they have available to them.
I am a Project Lead and Committer for the Eclipse Collections OSS project at the Eclipse Foundation. Eclipse Collections is open for contributions. If you like the library, you can let us know by starring it on GitHub.