Streams are the key abstraction in Java 8 for processing collections of values and specifying what you want to have done, leaving the scheduling of operations to the implementation. Furthermore, streams can leverage multi-core architectures without you having to write a single line of multithread code, and simplify the description of aggregate computations, exponing opportunities for optimisation. Streams basically allow us to write collections-processing code at a higher level of abstraction. You can think about stream as a pipeline where we are processing data, what we call the source, and we put it into zero or more intermediate operations; each operation takes an input stream and generates an output stream. That way we can take the output of one intermediate operation and feed it to the following intermedite operation. Once we have done all the intermediate processing that we want, we need to terminate that stream and we do that with a terminal operation; terminal operation takes an input stream and does not generate a stream as an output, but what it produces is either explicit results, such as a value, collections and so on, or a side effect, that is, for example a simple message. This pipeline is depicted in the following image:
The following example illustrates the image above with an aggreagate operation over a collection of Cars:
int sum = cars.stream()
.filter(c -> c.getBrand().equals("TOYOTA"))
.mapToInt(Car::getPrice)
.sum();
A stream pipeline, like the example above, can be viewed as a query on the stream source. The following SQL select would be equivalent:
SELECT SUM(PRICE) FROM CARS WHERE BRAND = 'TOYOTA'
Nevertheless, keep in mind that unless the source was explicitly designed for concurrent modification, unpredictable or erroneous behavior may result from modifying the stream source while it is being queried.
You have to take into account that the intermediate operations do not perform any processing until a terminal operation is invoked on the pipeline. This is because operations can usually be merged and processed into a single pass by the terminal operation. The intermediate operations are merged in order to avoid multiple redundant passes on data. For that reason, for example, very often filter
and map
operations are merged into the same pass, using the technique called loop fusion, or limit operation reduces the number of passes on data using the technique call short-circuiting. Intermediate operations are usually lazy in terms of evaluation tasks. Operations do not use loops explicitly and, as a consequence of that, streams can easily made parallel.
Most stream operations accept lambda expressions or method references as parameters. Unless otherwise specified these parameters must be non-null. These parameters sometimes are primitive types and, as a result of that, conversion between primitive and object representation is often needed and handled by auto-boxing and unboxing. Obviously, these operations are inefficent and in order to improve stream efficiency, JDK 8 offers three primitive stream types: IntStream
, DoubleStream
and LongStream
. It also offers methods like mapToInt()
, mapToDouble()
and mapToLong()
.
Stream sources There are a lot of new methods in Java 8 that return Stream. The next ones are some of them:
Collection Interface:
- stream(): Returns a sequential Stream with the collection as its source.
- parallelStream(): Provides a parallel stream of elements in the collection, using the fork-join framework for implementation. This is the only method that provides a parallel stream.
Files interface:
- find(Path start, int maxDepth, BiPredicate matcher, FileVisitOption... options): Return a Stream that is lazily populated with Path by searching for files in a file tree rooted at a given starting file.
- lines(Path path): Read all lines from a file as a Stream. Bytes from the file are decoded into characters using the UTF-8 charset.
- list(Path dir): Return a lazily populated Stream, the elements of which are the entries in the directory.
- walk(Path start, FileVisitOption... options): Return a Stream that is lazily populated with Path by walking the file tree rooted at a given starting file.
- walkFileTree(Path start, FileVisitor visitor): Walks a file tree
Stream interface:
- concat(Stream a, Stream b): Creates a lazily concatenated stream whose elements are all the elements of the first stream followed by all the elements of the second stream.
- generate(Supplier s): Returns an infinite sequential unordered stream where each element is generated by the provided Supplier.
- iterate(T seed, UnaryOperator f): Returns an infinite sequential ordered Stream produced by iterative application of a function f to an initial element seed, producing a Stream consisting of seed, f(seed), f(f(seed)), etc.
- of(T... values): Returns a sequential ordered stream whose elements are the specified values
IntStream interfaces:
- range(int startInclusive, int endExclusive): Returns a sequential ordered IntStream from startInclusive (inclusive) to endExclusive (exclusive) by an incremental step of 1.
- rangeClosed(int startInclusive, int endInclusive): Returns a sequential ordered IntStream from startInclusive (inclusive) to endInclusive (inclusive) by an incremental step of 1.
- Arrays.stream(): Returns a sequential IntStream with the specified array as its source. There are overloaded methods for different types: IntStream, DoubleStream, LongStream.
General streams:
BufferedReader:lines(): Returns a Stream, the elements of which are lines read from this BufferedReader.
Pattern:splitAsStream(): Creates a stream from the given input sequence around matches of this pattern.
BitSet:stream(): Returns a stream of indices for which this BitSet contains a bit in the set state.
Intermediate operations Common intermediate methods are:
distinct()
: Returns a stream with no duplicate elements.filter(IntPredicate predicate)
: Returns a stream consisting of the elements of this stream that match the given predicate.map(Function f)
: Returns a stream consisting of the results of applying the given function to the elements of this stream. There are other methods to produce streams of primitive rather objects, in order to improve the performace, such as,mapToInt()
,mapToDouble()
andmapToLong()
.flatMap(Function> mapper)
: Returns a stream consisting of the results of replacing each element of this stream with the contents of a mapped stream produced by applying the provided mapping function to each element. The flatMap method lets you replace each value of a stream with another stream and then concatenates all the generated streams into a single stream.skip(long n)
: Returns a stream consisting of the remaining elements of this stream after discarding the first n elements of the stream.limit(long maxSize)
:Returns a stream consisting of the elements of this stream, truncated to be no longer than maxSize in length.peek(Consumer action)
: Returns a stream consisting of the elements of this stream, additionally performing the provided action on each element as elements are consumed from the resulting stream. It is useful for debugging and doing more than one thing with a stream.sorted(Comparator comparator)
: Returns a stream consisting of the elements of this stream, sorted according to the provided Comparator. Without argument, it returns a stream consisting of the elements of this stream, sorted according to natural order.unordered()
: Returns an equivalent stream that is unordered. May return itself, either because the stream was already unordered, or because the underlying stream state was modified to be unordered. It can improve efficiency of operations likedistinct()
andgroupingBy()
.
Terminal operations Common terminal methods are:
reduce(BinaryOperator accumulator)
: Performs a reduction on the elements of this stream, using an associative accumulation function, and returns an Optional describing the reduced value, if any. The accumulator takes a partial result and the next element, and returns a new partial result. There are two overloaded methods more: one that takes an initial value (does not return an Optional), and the other one that takes an initial value and BiFunction (equivalent to a fused map and reduce).collect(Collector collector)
: Performs a mutable reduction operation on the elements of this stream using a Collector. We have to use this method in combination with methods of Collectors class. For example:
List<String> list = people.stream()
.map(Person::getName)
.collect(Collectors.toList());
String joined = things.stream()
.map(Object::toString)
.collect(Collectors.joining(", "));
Map<Department, List<Employee>> byDept
= employees.stream()
.collect(Collectors.groupingBy(Employee::getDepartment));
Map<Boolean, List<Student>> passingFailing =
students.stream()
.collect(Collectors.partitioningBy(s -> s.getGrade() >= PASS\_THRESHOLD));
int total = employees.stream()
.collect(Collectors.summingInt(Employee::getSalary)));
toArray()
: Returns an array containing the elements of this stream.- Numerical results:
count()
,max(Comparator c)
,min(Comparator c)
,average()
,sum()
. Some of these methods return Optional, since the stream may be empty. - Iteration:
forEach(Consumer c)
andforEachOrdered(Consumer c)
, perfom an action for each element of this stream; in the case of the latter, the action is performed in the encounter order of the stream if the stream has a defined encounter order. - Matching elements:
findFirst(Predicate p)
,findAny(Predicate p)
,allMatch(Predicate p)
,anyMatch(Predicate p)
,noneMatch(Predicate p)
.
Optional is a container object which may or may not contain a non-null value. If a value is present, isPresent() will return true and get() will return the value. It helps to eliminate the NullPointerExecption and can be used in powerful ways to provide complex conditional handling.
You can get more information about Streams in Java 8 reading the books Java 8 in Action and Java 8 Lambdas, Oracle Learning Library and Java 8 Stream API. This post is partially based on these sources.