[infinispan-dev] Fwd: [concurrency-interest] ConcurrentHashMap bulk parallel operations
Galder Zamarreño
galder at redhat.com
Tue Aug 14 07:38:31 EDT 2012
Interesting functional programming additions for concurrent collections (i.e. caches) coming up in JDK8 too.
Begin forwarded message:
> From: Doug Lea <dl at cs.oswego.edu>
> Subject: [concurrency-interest] ConcurrentHashMap bulk parallel operations
> Date: August 13, 2012 6:08:50 PM GMT+02:00
> To: "Concurrency-interest at cs.oswego.edu" <Concurrency-interest at cs.oswego.edu>
>
>
> As most of you know, lambda-based bulk operations with optional
> parallelism are planned for JDK8. We've been working with JSR335 folks
> on these. However, in addition we plan to offer access to bulk
> operations designed for use on truly concurrent data structures, in
> particular ConcurrentHashMaps. These methods are designed for use on
> "live" data, registeries, caches, in-memory data stores, chunks of
> Hadoop-style "big data" sets. and so on that might be concurrently
> updated during the course of computations. The form and style of this
> API are targeted to support operations that can be sensibly applied in
> such contexts. This boils down to only three basic forms: forEach,
> search, and reduce (each with multiple variations), expressed in an
> imperative style -- no fluency, stateful Streams, etc that are planned
> for the JDK8 java.util-based framework. There is no sense in
> compromising support for either of these kinds of target usages, so we
> won't. However, the functionality coverage is essentially identical
> for those operations that do apply, so we anticipate that the JDK8
> java.util-based framework will be able to layer on top of this when
> applicable.
>
> The API is in two layers. Nested (static) class "ForkJoinTasks"
> returns task objects that, when invoked, provide the given
> functionality, but may also be used in other ways. Nested (inner)
> class "Parallel" provides an API for using them with a given
> ForkJoinPool. The class-level javadocs for CHM(V8).Parallel are pasted
> below. There will surely be some further API changes in the course of
> JDK8 integration. However, in the mean time, we are releasing a
> stand-alone form, intended to be usable by both current
> ConcurrentHashMapV8 users running JDK7, as well as those experimenting
> with current JDK8 preview lambda builds (at
> http://jdk8.java.net/lambda/) The current javadocs don't have any
> usage examples, because they look vastly different in JDK7 vs JDK8.
>
> Doing this forces a bit of disruption on everyone though.
>
> 1. To avoid FJ version mismatches, the current jsr166y FJ classes are
> duplicated into jsr166e.
>
> 2. To avoid JDK version mismatches, the j.u.c version (plain
> "ConcurrentHashMap" without the "V8") is committed in main repository,
> while keeping its "V8" in package jsr166e. (This also required an
> initial merge of jsr166e.LongAdder and related classes.)
>
> 3. To avoid current and future naming problems, a set of function
> interfaces are nested within ConcurrentHashMap, with names
> intentionally different than those currently used in JDK8 previews
> (for example "Action" instead of "Block"). For lambda-enabled
> JDK8-preview users, this won't much matter because lambda expressions
> will still match as expected. However, others tediously using this
> with emulated-lambdas via static instances of classes implementing the
> interfaces will have to bear with future name changes of these
> interfaces. This forbearance starts immediately, because the
> previously named nested MappingFunction and RemappingFunction are
> already changed so as to be applicable across the extended
> APIs. Sorry.
>
>
> ... pasting from
> http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/jsr166e/ConcurrentHashMapV8.Parallel.html
>
>
> public class ConcurrentHashMapV8.Parallel
>
>
> An extended view of a ConcurrentHashMap supporting bulk parallel operations. These operations are designed to be be safely, and often sensibly, applied even with maps that are being concurrently updated by other threads; for example, when computing a snapshot summary of the values in a shared registry. There are three kinds of operation, each with four forms, accepting functions with Keys, Values, Entries, and (Key, Value) arguments and/or return values. Because the elements of a ConcurrentHashMap are not ordered in any particular way, and may be processed in different orders in different parallel executions, the correctness of supplied functions should not depend on any ordering, or on any other objects or values that may transiently change while computation is in progress; and except for forEach actions, should ideally be side-effect-free.
>
> * forEach: Perform a given action on each element. A variant form applies a given transformation on each element before performing the action.
> * search: Return the first available non-null result of applying a given function on each element; skipping further search when a result is found.
> * reduce: Accumulate each element. The supplied reduction function cannot rely on ordering (more formally, it should be both associative and commutative). There are five variants:
> o Plain reductions. (There is not a form of this method for (key, value) function arguments since there is no corresponding return type.)
> o Mapped reductions that accumulate the results of a given function applied to each element.
> o Reductions to scalar doubles, longs, and ints, using a given basis value.
>
> The concurrency properties of the bulk operations follow from those of ConcurrentHashMap: Any non-null result returned from get(key) and related access methods bears a happens-before relation with the associated insertion or update. The result of any bulk operation reflects the composition of these per-element relations (but is not necessarily atomic with respect to the map as a whole unless it is somehow known to be quiescent). Conversely, because keys and values in the map are never null, null serves as a reliable atomic indicator of the current lack of any result. To maintain this property, null serves as an implicit basis for all non-scalar reduction operations. For the double, long, and int versions, the basis should be one that, when combined with any other value, returns that other value (more formally, it should be the identity element for the reduction). Most common reductions have these properties; for example, computing a sum with basis 0 or a minimum with basis MAX_VALUE.
>
> Search and transformation functions provided as arguments should similarly return null to indicate the lack of any result (in which case it is not used). In the case of mapped reductions, this also enables transformations to serve as filters, returning null (or, in the case of primitive specializations, the identity basis) if the element should not be combined. You can create compound transformations and filterings by composing them yourself under this "null means there is nothing there now" rule before using them in search or reduce operations.
>
> Methods accepting and/or returning Entry arguments maintain key-value associations. They may be useful for example when finding the key for the greatest value. Note that "plain" Entry arguments can be supplied using new AbstractMap.SimpleEntry(k,v).
>
> Bulk operations may complete abruptly, throwing an exception encountered in the application of a supplied function. Bear in mind when handling such exceptions that other concurrently executing functions could also have thrown exceptions, or would have done so if the first exception had not occurred.
>
> Parallel speedups compared to sequential processing are common but not guaranteed. Operations involving brief functions on small maps may execute more slowly than sequential loops if the underlying work to parallelize the computation is more expensive than the computation itself. Similarly, parallelization may not lead to much actual parallelism if all processors are busy performing unrelated tasks.
>
> All arguments to all task methods must be non-null.
>
> jsr166e note: During transition, this class uses nested functional interfaces with different names but the same forms as those expected for JDK8.
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
More information about the infinispan-dev
mailing list