Solving Java GC Pause Outages in Production

Java Duke
Just thinking about howto configure HAProxy with two backend Java servers to be HA.

Java programs do pauses for garbage collection, known as “GC Pauses.”

The description “Stop the World” (STW) illustrates their true severity – they are a slow-motion train wreck for incoming requests.

If you’re new to this topic, please read:

the excellent blog on how Netflix visualizes GC pauses.
Willy Tarreau’s comments on “Graceful handling of garbage collecting servers?”

Willy: “I work with people who use a lot of Java applications, and I’ve seen them spend as much time on tuning the JVM as they spend writing the code, and the result is really worth it.” Anybody have some extra time?

My operational requirements for Java in production are:

understand GC pause activity for my application servers
control GC pause activity to a reasonable and bounded extent
configure HAProxy load balancer to not send requests to servers undergoing GC pauses (ie. don’t lose requests)
use an affordable amount of RAM to accomplish the above, preferably 8 or 16 GB in a shared VM environment.

1. Understand GC pause activity for my application servers

Detailed GC logging can be enabled with:

-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

and you can specify a separate GC log with:

-verbose:gc -Xloggc:/tmp/gc.log

See “Understanding Garbage Collection Logs.”

2. Control GC pause activity to a reasonable and known extent

One of the biggest challenges is to control the frequency and duration of GC pauses …

Some configuration approaches:

set heap size and compaction percent only somewhat above need. That will cause GCs to be more frequent, but also faster or the opposite …
set heap size to large amount and compaction to 100%, then trigger GC after hours
investigate alternate JVMs.

An example of some of the tuning options:

java -Xms512m -Xmx1152m -XX:MaxPermSize=256m -XX:MaxNewSize=256m MyClass.java

JRockit JVM: Tuning For a Small Memory Footprint
Tuning Java Virtual Machines (JVMs)
Weblogic Tuning JVM Garbage Collection for Production Deployments

Some programming approaches:

use streaming file IO with Files.lines() instead of reading into a String or hashmap, or use memory-mapped files
rewrite portions of your application to correctly use StringBuffer instead of String
Reduce object copies – if you do not have a problem with thread safety, then you don’t need immutable objects.
call dispose() method when available, such as SWT image class
for HashMaps, call clear() to re-use the memory later, but set to null to GC it
split java server into real-time and batch servers where possible with appropriate heap sizes.

3. Configure HAProxy load balancer requests to not be sent to servers undergoing GC pause events

This is tricky for several reasons:

health checks can be passive or active. Both have check gaps that won’t notice a GC starting before a request is sent
even if GC notifications are enabled and the server health check is red, HAProxy will not know (see above)
even if GC notifications are enabled and the server health check is now green, HAProxy will not know (see above)
the HAProxy options log-health-checks and redispatch may be helpful

a) I think the only 100% reliable way is to coordinate from the HAProxy side:

understand your GC pattern
use HAProxy socket interface to drain, then disable one backend
wait for zero connections
force a GC (easier said than done in Oracle Java since System.gc() is only a request for GC), or restart the Java server
use HAProxy socket interface to enable the Java server.

This method would be risky with two Java servers, since during maintenance on one server, the other could GC pause. (facepalm)

b) Another possible approach would be to handle MemoryPoolMXBean MEMORY_THRESHOLD_EXCEEDED events. Maybe that can be used to update the health check on the server side and send a drain socket request to HAProxy if you reliably had advance notice and could force a GC now, trying the Java Tool Interface ForceGarbageCollection()?

c) And another idea is to write a sentinel file every 250 ms, and if it reaches 750 ms, assume a GC is happening and drain HAProxy. Unfortunately the TI events GarbageCollectionStart() and GarbageCollectionEnd() are sent after the VM is stopped, so you’re limited in what you can do when you need the most flexibility.

Some Java 8 Classes related to GC notifications:

MemoryPoolMXBean – “The memory usage monitoring mechanism is intended for load-balancing or workload distribution use. For example, an application would stop receiving any new workload when its memory usage exceeds a certain threshold. It is not intended for an application to detect and recover from a low memory condition.”
GarbageCollectionNotificationInfo
GarbageCollectorMXBean

Also, investigate mod_jk and AJP. tomcat uses the same heap as your application, so tuning is very important here too.

4. Use an affordable amount of RAM to accomplish the above, preferably 8 or 16 GB in a shared VM environment

If you work in a VM consolidation environment, it’s important to minimize the footprint of your applications. Requesting an entire server to run a bloated app just isn’t going to cut it. See above for rewriting applications to minimize heap and GCs.

Garbage Collection JMX Notifications Example Code
Blade: A Data Center Garbage Collector
How to Tame Java GC Pauses? Surviving 16GiB Heap and Greater
SO: Garbage Collection Notifications
Letting the Garbage Collector Do Callbacks
HAProxyController.java
How to force garbage collection in Java?
SSL Termination, Load Balancers & Java
Github: Measuring Java Memory Consumption – sample code
Java is not “angry” with you.
Set State to DRAIN vs set weight 0
Scalable web applications [with Java]
Examples of forcing freeing of native memory direct ByteBuffer has allocated, using sun.misc.Unsafe?
Lucene ByteBuffer sample code
Improve availability in Java enterprise applications
The Four Month Bug: JVM statistics cause garbage collection pauses
Memory management when failure is not an option

Cassandra-related

CASSANDRA-5345: Potential problem with GarbageCollectorMXBean

Solving Java GC Pause Outages in Production

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112