- Garbage Collection Tuning
- GC Attacks By Linux
- I/O Issues
- Memory Issues
- Page Cache
- Debugging memory usage issues and GC pauses
- Getting Heap Dump on Out of Memory Errors
- Detailed Garbage Collection stats
- FlightRecorder Settings
Below are sets of example JVM configurations for applications that might generate high numbers of temporary objects hence triggering long pauses due to garbage collection activities.
JVMs in cluster should be constantly monitored and tuned after profile gathered. GC tunning will very much depend on application and Ignite usage pattern.
For JDK 1.8 we recommend to use G1 garbage collector and below you can see 10GB heap example for a machine with 64 CPUs with G1 being turned on:
-server -Xms10g -Xmx10g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
Use the latest versions of Oracle JDK 8 or Open JDK 8 if you decide to use G1 collector since it has been being constantly improved.
If G1 does not suit your case, or you are using JDK 7, then you can refer to the following CMS based settings as a good starting point for JVM tuning (10GB heap example for machine with 64 CPUs):
-server -Xms10g -Xmx10g -XX:+AlwaysPreTouch -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC
Please note that these settings might not always be ideal so always make sure to rigorously test prior to production deployment
In Linux environment, it may happen that an application faces long GC pauses or loses performance due to I/O or memory starvation because of kernel specific settings. This section gives some guidelines on how to modify kernel settings in order to overcome long GC pauses.
All the shell scripts commands given below were tested under RedHat 7. They may differ for your Linux distribution.
Also be sure to check with system statistics, logs that a problem really valid for your case before applying any kernel based settings.
Finally it's advisable to consult with your IT department before making changes at the Linux kernel level in production.
If GC log shows “low user time, low system time, long GC pause” then a reason could be with GC threads stuck in kernel waiting for I/O. Basically it happens due to journal commits or file system flush of changes by gzip of log rolling.
As a solution you can increase pages flushing to disk from defaul 30 seconds to 5 seconds
sysctl -w vm.dirty_writeback_centisecs=500 sysctl -w vm.dirty_expire_centisecs=500
If GC log shows “low user time, high system time, long GC pause” then most likely memory pressure triggers swapping or scanning for free memory.
- Check and decrease 'swappiness' setting to protect heap and anonymous memory
sysctl -w vm.swappiness=10
- Add –XX:+AlwaysPreTouch to JVM settings on startup
- Turn off NUMA zone-reclaim optimization
sysctl -w vm.zone_reclaim_mode=0
- Turn off Transparent Huge Pages if RedHat distribution is used
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
In cases when an application interacts a lot with an underlying file system this can lead to the situation when RAM is highly utilized by page cache. If
kswapd daemon doesn't keep up with pages reclamation, used by the page cache, in background then an application can face with high latencies due to direct reclamation when it needs a new page. This situation can affect not only the performance of the application but may also lead to long GC pauses.
To get over long GC pauses caused by direct page memory reclaim on Linux with the latest kernel versions you can add extra bytes between
/proc/sys/vm/extra_free_kbytes setting trying to avoid aforementioned latencies.
sysctl -w vm.extra_free_kbytes=1240000
To get more insights on the topic discussed under this section please refer to the following slides slides
The section contains information that may be helpful when you need to debug and troubleshoot issues related to memory usage or long GC pauses.
In case your JVM is throwing an ‘OutOfMemoryException’ and the JVM process should be restarted you may add the following properties to your JVM configuration:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/heapdump -XX:OnOutOfMemoryError=“kill -9 %p” -XX:+ExitOnOutOfMemoryError
In order to capture detailed information about garbage collection and its performance add the following parameters to the JVM configuration:
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Xloggc:/path/to/gc/logs/log.txt
For G1 it's recommended to set the property below that provides many ergonomic details that are purposefully kept out of the -XX:+PrintGCDetails
Make sure you modify the path and file names accordingly and ensure to use a different file name for each invocation in order to avoid overwriting the log files from multiple processes.
In cases when you need to debug performance or memory issues you can rely on Java Flight Recorder tool that allows continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. To enable Flight Recorder use the following settings below:
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
To start recording for a particular Java process use this command as an example
jcmd <PID> JFR.start name=<recordcing_name> duration=60s filename=/var/recording/recording.jfr settings=profile
For complete details on Java Flight Recorder refer to Oracle official documentation.
Updated 7 months ago