Parallel processing with Virtual Threads - A comparative analysis
Background
My previous article focussed on comparing solutions for performing parallel execution using Spring Core Reactor and JDK 21. This article will follow my previous article where I will provide comparative analysis of Virtual Threads based execution for Spring Core Reactor and JDK 21 based implementation.
Keeping the same use case that we referred to for comparing Spring Core Reactor & JDK based implementation, this article will be focussing on:
- Limitations of OS / Platform threads
- What are Virtual Threads
- When to use Virtual Threads
- Solutions with Virtual Thread based implementation
- Comparative analysis of Virtual Thread based implementations
1. Limitations of OS / Platform threads
Constraints of OS / Platform threads:
- Lifecycle of Platform threads is resource intensive
- They can handle only limited number of concurrent threads without degrading performance, as they rely on underlying OS capabilities
- Blocking operations hold up OS threads, limiting the scalability of application
- Platform thread management incurs expensive context-switching which in turn reduces the efficiency of the application
Contemporary applications that are throughput and latency sensitive, demand high concurrency without overwhelming system resources
2. What are Virtual Threads
Virtual Threads, a feature of Project Loom in JDK 21 offers lightweight, scalable threads that are managed by JVM. Unlike OS threads, they are cheap to create, block, and manage, making them ideal for high-concurrency applications.
Salient Features from performance engineering standpoint:
- Low Cost : Millions of virtual threads can be created without significant memory / CPU overhead
- Blocking friendly - Blocking a virtual thread doesn't block OS threads, allowing more efficient resource utilization
3. When to use Virtual Threads
For any thread within an application, its primary task is to perform CPU or I/O bound operations. While virtual threads can be used for both type of operations, one should be mindful of the fact that Virtual Threads are more appropriate for I/O intensive operations. Having said that, it can still be used for CPU intensive operations, but the benefits might not be proportionate to the gain we may see in I/O bound operations.
4. Solutions with Virtual Threads based implementation
4.1 JDK Based implementation with Virtual Threads
Excerpt from actual code
1 public static void withFlatMapUsingJDK() {
2 ...
3 var virtualThreadExecutor = Executors.newThreadPerTaskExecutor(
4 Thread
5 .ofVirtual()
6 .name("jdk21-vt-", 0)
7 .factory()
8 );
9
10 try (virtualThreadExecutor) {
11 // Submit tasks for parallel processing
12 List<CompletableFuture<Void>> futures =
13 users
14 .stream()
15 .map(user -> CompletableFuture.runAsync(() -> {
16 try {
17 log.info("Processing user: {}", user);
18 processSomeBizLogic(user);
19 successCount.incrementAndGet();
20 } catch (Exception e) {
21 log.error("Error occurred while processing user {}: {}", user, e.getMessage());
22 failureCount.incrementAndGet();
23 }
24 }, virtualThreadExecutor))
25 .toList(); // Collect CompletableFuture<Void> for each user
26
27 // Wait for all tasks to complete
28 CompletableFuture<Void> allOf = CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]));
29 try {
30 allOf.join();
31 } catch (Exception e) {
32 log.error("Error waiting for all tasks to complete: {}", e.getMessage());
33 }
34 }
35 ...
36 }
4.2 Spring Core Reactor Based implementation with Virtual Threads
Excerpt from actual code
1 public static void withFlatMapUsingJDK() {
2 ...
3 // Custom executor with virtual threads
4 var virtualThreadExecutor = Executors.newThreadPerTaskExecutor(
5 Thread
6 .ofVirtual()
7 .name("rx-vt-", 0)
8 .factory()
9 );
10
11 try (virtualThreadExecutor) {
12 Flux
13 .fromIterable(objectList)
14 .flatMap(obj ->
15 Mono
16 .fromCallable(() -> {
17 log.info("Entering processUser in virtual thread: {}", obj);
18 processSomeBizLogic(obj);
19 log.info("Leaving processUser in virtual thread: {}", obj);
20 successCount.incrementAndGet();
21 return obj;
22 })
23 .doOnError(error -> {
24 log.error("Error occurred while processing user {}: {}", obj, error.getMessage());
25 failureCount.incrementAndGet();
26 })
27 .onErrorResume(error -> {
28 log.info("Skipping user due to error: {}", obj);
29 return Mono.empty(); // Skip errored objects
30 })
31 .subscribeOn(Schedulers.fromExecutor(virtualThreadExecutor)) // Use virtual threads
32 )
33 .doOnComplete(() -> {
34 log.info("Processing completed");
35 log.info("Success count: {}", successCount.get());
36 log.info("Failure count: {}", failureCount.get());
37 })
38 .blockLast();
39 }
40 ...
41 }
5. Comparative analysis of Virtual Thread based implementations
To ensure that the comparative analysis of Virtual Thread based solutions Platform thread based solutions is fair and wholistic, we will be keeping the same criteria as we had in my previous article : Concurrent processing of below number of objects from a list
- 1,00,000 objects
- 2,50,000 objects
- 5,00,000 objects
5.1 Comparing Virtual Thread based solution with Spring Core Reactor & JDK constructs
5.1.1 Total time spent in processing the entire list
As we can see in the above graph - Virtual Threads with JDK based implementation are super fast when compared with Spring Core Reactor. Also, the percentage increase in time for processing the entire list within Spring Core Reactor based application is exponentially increasing with an increase in no. of items in list.
5.1.2 Memory footprint
From the above graph one can infer -
- For 5 lac objects, JVM is required to allocate 33 times more memory in Old Gen for JDK based implementation when compared with Spring Reactor based implementation
- For 5 lac objects, peak memory utilized within Old Gen is 81 times more for JDK based implementation when compared with Spring Reactor based implementation
5.1.3 GC Metrics
As we have seen in my previous blogs, GC has a significant impact on the performance of an application. Here's how they compare:
5.1.4 GC Pauses
This mainly indicates the amount of time consumed by STW GCs.
Comparatively speaking, in most of the cases GC pauses are higher for JDK based implementation. In spite of longer GC pauses for JDK based implementation, it does not have any significant impact on the latency of application.
5.1.5 CPU Time
This shows the total CPU time consumed by the garbage collector.
It is evident from the above graph that even though JDK based implementation requires higher CPU time for its GC activities, it does not have any negative effect on application performance.
5.1.6 Object Metrics
This shows rate at which objects are created within JVM heap and rate at which they are promoted from Young to Old region.
An interesting behavior that can be inferred - Even though object creation rate and promotion rate is significantly higher for JDK based implementation, it has minimal impact on performance of application.
5.2 Comparing Spring Core Reactor & JDK based solutions with / without Virtual Threads
With JDK 21, developers have 2 options viz. Platform / Virtual threads when it comes to implementing asynchronous / concurrent workflows. Hence it becomes very important to understand - which would fare better when it comes to performance engineering. Logically it is quite evident, but below graph factually shows that Virtual threads based implementation are much more faster than Platform threads based implementation.
Conclusion
After objectively comparing both the solutions from this and the previous article, it is quite apparent that :
- For Virtual threads based implementation, JDK should be one's obvious choice as they are significantly faster than Spring Core Reactor.
- For Platform threads based implementation, Spring Core Reactor is relatively faster than JDK based implementation
P.S - All the memory and GC related graphs shown above are prepared by using data from GC report generated by GCEasy
comments powered by Disqus