Monday, August 2, 2021

How to use ConcurrentHashMap in Java - Example Tutorial and Working

ConcurrentHashMap in Java is introduced as an alternative of Hashtable in Java 1.5 as part of the Java concurrency package. Prior to Java 1.5 if you need a Map implementation, which can be safely used in a concurrent and multi-threaded Java program, then, you only have Hashtable or synchronized Map because HashMap is not thread-safe. With ConcurrentHashMap, now you have a better choice; because not only it can be safely used in the concurrent multi-threaded environment but also provides better performance over Hashtable and synchronizedMap.

ConcurrentHashMap performs better than the earlier two because it only locks a portion of Map, instead of the whole Map, which is the case with Hashtable and synchronized Map. CHM allows concurred to read operations and at the same time maintains integrity by synchronizing write operations.

 We have seen the basics of ConcurrentHashMap on Top 5 Java Concurrent Collections from JDK 5 and 6 and in this Java tutorial, we will learn:

Ø       How ConcurrentHashMap works in Java or how it is implemented in Java.
Ø       When to use ConcurrentHashMap in Java
Ø       ConcurrentHashMap examples in Java
Ø       And some important properties of CHM.


And, If you are new to the Java world then I also recommend you go through these free online Java courses from Udemy to learn Java in a better and more structured way. This is one of the best and up-to-date courses to learn Java online.

How ConcurrentHashMap is implemented in Java

ConcurrentHashMap is introduced as an alternative of Hashtable and provided all functions supported by Hashtable with an additional feature called "concurrency level", which allows ConcurrentHashMap to partition Map. ConcurrentHashMap allows multiple readers to read concurrently without any blocking

This is achieved by partitioning Map into different parts based on concurrency level and locking only a portion of Map during updates. The default concurrency level is 16, and accordingly, Map is divided into 16 part and each part is governed with a different lock. 

This means, 16 threads can operate on Map simultaneously until they are operating on a different part of Map. This makes ConcurrentHashMap high performance despite keeping thread-safety intact.  Though, it comes with a caveat. Since update operations like put(), remove(), putAll() or clear() is not synchronized, concurrent retrieval may not reflect most recent change on Map.





In the case of putAll() or clear(), which operates on the whole Map, the concurrent read may reflect insertion and removal of only some entries. Another important point to remember is iteration over CHM, Iterator returned by keySet of ConcurrentHashMap are weekly consistent and they only reflect the state of ConcurrentHashMap and certain point and may not reflect any recent change. Iterator of ConcurrentHashMap's keySet area also fail-safe and doesn’t throw ConcurrentModificationExceptoin..

Default concurrency level is 16 and can be changed, by providing a number that makes sense and works for you while creating ConcurrentHashMap. Since concurrency level is used for internal sizing and indicates a number of concurrent update without contention, so, if you just have few writers or thread to update Map keeping it low is much better. ConcurrentHashMap also uses ReentrantLock to internally lock its segments.

Internal implementation of ConcurrentHashMap in Java

And, If you are new to the Java world then I also recommend you go through these best Java online courses from Udemy and Coursera to learn Java in a better and more structured way. This is one of the best and up-to-date courses to learn Java online.



ConcurrentHashMap putifAbsent example in Java

ConcurrentHashMap examples are similar to Hashtable examples, we have seen earlier,  but worth knowing is the use of putIfAbsent() method. Many times we need to insert entry into Map if it's not present already, and we wrote following kind of code:

synchronized(map){
  if (map.get(key) == null){
      return map.put(key, value);
  } else{
      return map.get(key);
  }
}

Though this code will work fine in HashMap and Hashtable, This won't work in ConcurrentHashMap; because, during put operation whole map is not locked, and while one thread is putting value, other thread's get() call can still return null which results in one thread overriding value inserted by other thread. 

Of course, you can wrap the whole code in the synchronized block and make it thread-safe but that will only make your code single-threaded. ConcurrentHashMap provides putIfAbsent(key, value) which does the same thing but atomically and thus eliminates the above race condition. 


See Core Java for Inpatients for more details about how to use this method effectively:


ConcurrentHashMap putIfAbsent() example in Java



When to use ConcurrentHashMap in Java

Java ConcurrentHashMap Example Tutorial and internal working
ConcurrentHashMap is best suited when you have multiple readers and a few writers. If writers outnumber the reader, or the writer is equal to the reader, then the performance of ConcurrentHashMap effectively reduces to synchronized map or Hashtable. Performance of CHM drops, because you got to lock all portions of Map, and effectively each reader will wait for another writer, operating on that portion of Map. ConcurrentHashMap is a good choice for caches, which can be initialized during application startup and later accessed by many request processing threads. 

As JavaDoc states, CHM is also a good replacement of Hashtable and should be used whenever possible, keeping in mind, that CHM provides a slightly weaker form of synchronization than Hashtable.



Summary

Now we know What is ConcurrentHashMap in Java and when to use ConcurrentHashMap, it’s time to know and revise some important points about CHM in Java.

1. ConcurrentHashMap allows concurrent read and thread-safe update operation.

2. During the update operation, ConcurrentHashMap only locks a portion of Map instead of whole Map.

3. The concurrent update is achieved by internally dividing Map into the small portion which is defined by concurrency level.

4. Choose concurrency level carefully as a significantly higher number can be a waste of time and space and the lower number may introduce thread contention in case writers over number concurrency level.

5. All operations of ConcurrentHashMap are thread-safe.

6. Since ConcurrentHashMap implementation doesn't lock the whole Map, there is chance of read overlapping with update operations like put() and remove(). In that case, result returned by get() method will reflect the most recently completed operation from there start.

7. Iterator returned by ConcurrentHashMap is weekly consistent, fail-safe and never throw ConcurrentModificationException. In Java.

8. ConcurrentHashMap doesn't allow null as key or value.

9. You can use ConcurrentHashMap in place of Hashtable but with caution as CHM doesn't lock the whole Map.

10. During putAll() and clear() operations, the concurrent read may only reflect the insertion or deletion of some entries.

That’s all on What is ConcurrentHashMap in Java and when to use it. We have also seen a little bit about the internal working of ConcurrentHashMap and how it achieves it’s thread-safety and better performance over Hashtable and synchronized Map. Use ConcurrentHashMap in Java program, when there will be more reader than writers and it’s a good choice for creating cache in Java as well.


Related Java Concurrency and Collection Tutorials from this blog

17 comments :

SARAL SAXENA said...

one more thing I want to share Javin...

interviewer was possibly expecting a simple answer, such as:
•if the whole map is synchronized for get/put operations, adding threads won't improve throughput because the bottleneck will be the synchronized blocks. You can then write a piece of code with a synchronizedMap that shows that adding threads does not help
•because the map uses several locks, and assuming you have more than one core on your machine, adding threads will improve throughput

The example below outputs the following:


Synchronized one thread: 30
Synchronized multiple threads: 96
Concurrent one thread: 219
Concurrent multiple threads: 142

So you can see that the synchronized version is more than 3 times slower under high contention (16 threads) whereas the concurrent version is almost twice as fast with multiple threads as with a single thread.

It is also interesting to note that the ConcurrentMap has a non-negligible overhead in a single threaded situation.

This is a very contrived example, with all the possible problems due to micro-benchmarking (first results should be discarded anyway). But it gives a hint at what happens.


public class Test1 {
static final int SIZE = 1000000;
static final int THREADS = 16;
static final ExecutorService executor = Executors.newFixedThreadPool(THREADS);

public static void main(String[] args) throws Exception{

for (int i = 0; i < 10; i++) {
System.out.println("Concurrent one thread");
addSingleThread(new ConcurrentHashMap ());
System.out.println("Concurrent multiple threads");
addMultipleThreads(new ConcurrentHashMap ());
System.out.println("Synchronized one thread");
addSingleThread(Collections.synchronizedMap(new HashMap ()));
System.out.println("Synchronized multiple threads");
addMultipleThreads(Collections.synchronizedMap(new HashMap ()));
}
executor.shutdown();
}

private static void addSingleThread(Map map) {
long start = System.nanoTime();
for (int i = 0; i < SIZE; i++) {
map.put(i, i);
}
System.out.println(map.size()); //use the result
long end = System.nanoTime();
System.out.println("time with single thread: " + (end - start) / 1000000);
}

private static void addMultipleThreads(final Map map) throws Exception {
List runnables = new ArrayList<> ();
for (int i = 0; i < THREADS; i++) {
final int start = i;
runnables.add(new Runnable() {

@Override
public void run() {
//Trying to have one runnable by bucket
for (int j = start; j < SIZE; j += THREADS) {
map.put(j, j);
}
}
});
}
List futures = new ArrayList<> ();
long start = System.nanoTime();
for (Runnable r : runnables) {
futures.add(executor.submit(r));
}
for (Future f : futures) {
f.get();
}
System.out.println(map.size()); //use the result
long end = System.nanoTime();
System.out.println("time with multiple threads: " + (end - start) / 1000000);
}
}



Javin @ Java Classloder Working said...

@Saral Saxena, great comment. Yes, you are right on single threaded environment, ConcurrentHashMap is tough slow than Hashtable and HashMap. On scalability front, it achieves using concurrency level by dividing Map into segment, with each segment having its own lock.

SARAL SAXENA said...

Thanks Javin,..As I have shown in the above example the two methods simply add 1 million entries to the map. the first one is run in the main thread, the second one is run with an executor service that has 16 threads. To try make to help the CHM, I count with a step of 16 (so one thread will put 0, 15, 31 etc, the second thread 1, 16, 32 etc) which should reduce lock contention and associate each thread with one lock (there are 16 locks, based on the hashcode of the key modulo 16). Note that the effect is more pronounced with more threads (with 100 threads, the sychronized version really suffers).

Anonymous said...

In your code snippet, doesn't the "synchronized(map)" part actually lock the map? If you want your example to be relevant, shouldn't the if {} else {} be without the synchronized() block?

Javin @ String to int in Java said...

Hi @mattnguyen, if you don't synchronized then that code snippet will create data race, where one thread calling get() getting null, just before another thread putting value, which result in overriding that value. synchronized is required to get the desired functionality, i.e. only insert,if not present.

suresh said...

Did any one really faced questions like How ConcurrentHashMap works or internal implementation of ConcurrentHashMap in Java interview? is that a real question or hypothetical?

Arsh said...

8. ConcurrentHashMap doesn't allow null as key or value.!!
Then how the putIfAbsent will work, the example you gave..if(map.get(key)==null)

Anonymous said...

@suresh, yes I faced these questions in an interview recently

Anonymous said...

Does ConcurrentHashMap uses ReadWriteLock internally for allowing multiple thread to read and block only for writers? I am not sure, but thought this may be a good question here.

Gauri said...

@Anonymous, ConcurrentHashMap doesn't use ReadWriteLock, instead it uses multiple tables (arrays) known as Segment, each with there own lock. This concept is known as lock-stripping, this means, you don't need to lock whole map during write operation, only relevant segment gets locked. This is optimized for more reading than updating activity. Even a call to size() may perform poor because, in worst case a call to size() may lock all segment for calculating entries in each segment and then adding them up. As I said, ConcurrentHashMap is optimized for multiple readers with few writers.

Suman said...

Just want to high light one important limitation of ConcurrentHashMap, client side locking is not possible as there is no lock, which can guard whole map. Though this limitation is somewhat addressed by adding ConcurrentMap interface and providing atomic function for compound operations e.g. putIfAbsent(), remove() or replace(key, oldValue, newValue)

Anonymous said...

Don't use ConcurrentHashMap, instead create your own concurrent Map by using ReentrantReadWriteLock and high performance Map based on your need e.g. EnumMap, primitive maps from trove library.

Anonymous said...

"concurrent retrieval may not reflect most recent change on Map"

That's imprecise and misleading.

From the Javadoc:

"Retrievals reflect the results of the most recently completed update operations" (for put() and remove()).

"For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries"

Unknown said...

on one hand, you have said that CHM has concurrent reads and THREAD SAFE writes, on the other hand you said "Since update operations like put(), remove(), putAll() or clear() is not synchronized".
if updates are not synchronized then how come it has thread safe writes?

Ashish Magar said...

Hi, I have littlebit confusion about this.
You mentioned that only portion of map is locked while put operation is going on. I assume the segment in which map is putting entry is getting locked.
Does it allow get() (read) operation on that segment?
Because you mentioned:

"because, during put operation whole map is not locked, and while one thread is putting value, other thread's get() call can still return null"

this means get operation is allowed on locked region?

you also mentioned:

"because you got to lock all portion of Map, and effectively each reader will wait for another writer, operating on that portion of Map"

which means read operation is blocked by write operation.

I found these two statements ambiguous and I started getting confused.

Unknown said...

hey can you please explain how it actually work. What is tree been node, how it allow multiple updates, how it increase size, why we need to declare initial capacity for performance.

javin paul said...

Hello Navrattan, rest of things about ConcurrentHashMap is same as HashMap except that it divides the map into segments and lock those segments instead of locking the whole Map. You specify size at the start to prevent resizing because that is expensive as it involves creating a new array and copying content from old array to new one.

Post a Comment