'How to reduce CPU usage for Java 11 with G1GC on GCP
I have:
- the application multi thread, low latency
- OpenJDK 64-Bit Server VM version 11.0.7+10
- configuration is Xms=6g Xmx=12g
- machine is 12CPU
- started in GCP (Google Cloud Platform) in Kubernetes
Problem is:
- when I migrate from java8 to java11 I observe situation: the application consume much more CPU and memory then the same application with java8. I suspect GC. because when i use ParallelGC (deprecated) everything looks better (more stable)
-XX:+UseParallelGC -Xmx12g -Xms6g -XX:ParallelGCThreads=23
than when I use G1GC (recommended):
-XX:+UseG1GC -Xmx12g -Xms6g -XX:ParallelGCThreads=23 -XX:ConcGCThreads=4
as far as I see response time and number of processed messages are similar (or almost the same). JConsole say ParallelGC spend much more time in GC than G1GC.
The question is: how to reduce CPU usage with G1GC(?) on GCP Kubernetes
Maybe it is not an issue with GC - maybe it is something else - any avdice?
Here you can see stats from JConsole - on left site is ParallelGC on right site is G1GC (watch CPU usage - this is the same application - traffic is similar ParallelGC processed 22k messages vs 18k messages for G1GC)
Solution 1:[1]
Parallel GC is not deprecated even in JDK 15 (don't you confuse with CMS GC?) So, if you are satisfied with Parallel GC performance, it's OK to use it further.
It's quite expected that CPU usage with G1 GC is higher. That's the price for shorter stop-the-world pauses.
-XX:MaxGCPauseMillis
is the main tuning option for G1 GC. The larger is target GC pause - the lower is overall GC overhead. The default pause time target is 200ms.It's not usual (and presumably not efficient) to have more GC threads than the number of available CPUs. To begin with, I'd suggest to leave the default number of threads.
From my experience, the manual choice of
-XX:InitiatingHeapOccupancyPercent
(IHOP) together with disabling adaptive IHOP (-XX:-G1UseAdaptiveIHOP
) can make G1 GC run less often and use less resources. However, this requires careful selection of-XX:InitiatingHeapOccupancyPercent
value - this highly depends on the particular application - usually, somewhere between 40 and 80 works.If not sure where JVM spends most CPU time, use a profiler. For example, async-profiler, which measures not only the application code, but also the JVM internals and native code.
Solution 2:[2]
To add to apangin's excellent answer:
You wrote that your machine has 12 cores, and you have ParallelGCThreads=23
. I'm assuming this was done because your processor has Simultaneous Multithreading (SMT) capabilities. Consider lowering ParallelGCThreads
to 12
or even 9
and measuring again. In garbage collection, the memory access pattern may trigger many cache misses, which means multiple threads per physical core in SMT are increasing CPU utilization and may actually hurt your overall performance rather than improving it.
G1 is designed for situations that require a balance between throughput and latency. If latency is less of a concern, or if ParallelGC
happens to give you superior latency performance, then you should use it instead of G1.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | apangin |
Solution 2 | dchristle |