We experienced a situation where our production Openfire instances crash out with a java.lang.OutOfMemoryError: Metaspace after a few days of operation. We use Openfire 3.10.2 with Java 8 in CentOS 6.8 servers. However, our tests show that this issue is reproducible in Openfire 4.0.4 as 4.1.1. Strangely, we do not see this error when running same Openfire versions under Java 7.
To replicate this issue, I made a simple client application that makes a number of simultaneous connections to Openfire through via ordinary sockets, joins a MUC room, posts a message and sits there. Once all the client connections have joined the room, and after a few seconds, all client connections are closed. When this process is repeated over a period of time, we see Openfire throwing out an Metaspace OOM error. I monitored Openfire Metaspace via VisualVM where I noticed usage grows continuously without any GC. I limited the amount Metaspace memory allocated to the JVM to reproduce the error sooner, but provided sufficient space for normal operations. However, with the same amount memory given to PermGen under Java 7 GC occurs as expected, and therefore there is no OOM error.
I have attached the harness to reproduce. Please rebuild this Maven project and run as follows:
java -jar target/openfire-memory-leak-trigger-1.0.jar <host> <port> <xmpp-domain> <muc-service> <muc-room> <session-count>
where <session-count> is the number of clients to connect and join the room. Make sure the <muc-service> and <muc-room> are created prior to running this code. Also, the credentials used to login is admin/admin, which are hard coded.
I executed the code with the following arguments:
java -jar target/openfire-memory-leak-trigger-1.0.jar chat20.cloud.internal 5222 chat conference.chat beach 200
The Openfire configuration used in my tests are:
1. CentOS 6.8 / Oracle Java Hostspot 64-bit server VM - 1.8.0_92
2. Openfire 4.1.1 installed using official RPM but I removed the "jre" folder after installation. Fresh database on MySQL and created a room called "beach". Room properties are:
3. No plugins / I removed the "search" plugin
4. Added these JVM options OPENFIRE_OPTS environment variable.
export OPENFIRE_OPTS="-Xmx320M -Xms320M -XX:MaxMetaspaceSize=38500K -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintPromotionFailure -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/opt/openfire/logs/gc.log -Dcom.sun.management.jmxremote.port=4444 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
Steps to reproduce are:
1. Launch Openfire after configuring OPENFIRE_OPTS as above
2. Connect VisualVM via JMX port 4444 (optional)
3. Login to Openfire Admin console.
4. Run the attached test harness with the appropriate command line arguments as noted above
5. Leave the harness running until Openfire throws out OutOfMemoryError: Metaspace. In my environment it took about 15 minutes with the Metaspace limit set to 38500K. Ideally you should not navigate to any other pages of the Admin console as it will require more Metaspace memory than what is specified above. The error should look like this:
Dumping heap to java_pid22090.hprof ...
Heap dump file created [30859822 bytes in 0.334 secs]
Here are some screenshots of VisualVM taken during my tests:
Unfortunately, due to limitation of VisualVM we are unable to look at classes that accumulate in the Metaspace area. Perhaps another tool like AppDynamics would be more useful in this regard.
I have attached two zip files. One has the test harness code which is easily built using Maven. The other is a heap dump at the time OOM took place, if that's helpful.
We would greatly appreciate if someone could look into this issue or provide some insights. We are very keen to run our production Openfire servers with Java 8.
UPDATE: Fixed a tiny issue in the test harness where the session is not cleared before moving on to the next iteration. Please download the latest zip.