I am building a messaging system for my application using openfire. Recently we started running load tests on the application and observed that openfire became inresponsive after handling aroud 2000 messages. On analysing the vm profiles after that we saw these issues
- Openfire having high usage of oldgen heap space (~90%).
- We also observed that perm space grows to ~99% and we saw a lot of BOSH(7071) connection in CLOSE_WAIT state. (We reran the tests after bumping the max perm space to 64M after that it is under 60% and the CLOSE_WAIT connections also goes away but other issues remained).
- Failure while sending messages often exhibited by SASL authentication exception and server time outs which is preceded by stray NULL pointers (from PEP service and SASLauthenticationclasses).
In the scenario we have one message generator and 3 message receiver, we have as many nodes as the number of receivers. The receivers will be listening on BOSH-tunnel and will logout from the tunnel after receiving aroud 10 events and will login again. Message sender is using smack library for connection, node creation, subscription and sending messages. The openfire version is the 3.7.1 for linux.
On restarting openfire, things start looking good again but it again becomes unresponsive after handling around 2000 messages.
We have hit a roadblock with this, any help on how to analyze it further or to fix these issues would be of great help.
Thanks and Regards,