AnsweredAssumed Answered

Issues with 2 node cluster + S2S to other domain (git HEAD)

Question asked by Daniel Hams on Jul 6, 2017

Dear Devs,


We are attempting to set up a test cluster of two nodes with a third host talking to the cluster via S2S.


When running the two nodes as a standlone cluster, XMPP clients talking to the cluster do the expected thing when the node they are currently connecting to goes down - i.e. Receive a disconnect and on rejoin everything works as expected.


We've encoutered issues when using a separate XMPP host via S2S - we get dangling / loss of communications when the endpoint of the S2S connection goes down within the cluster -> when we attempt to send further "groupchat" messages (causing the creation of new S2S connections) we are in a bad state.


Example scenario:


  • Client 1 Spark connects as dan@xmpp.domain to "testroom@conference.xmpp.domain" -> directed to cluster node lh01.xmpp.domain
  • Client 2 Spark connects as test@dh01.standalone.domain to "testroom@conference.xmpp.domain" -> direct connection to host, S2S connection created to lh01.xmpp.domain via the load balancer.


At this point, both clients see each other in the room and can exchange group chat messages.


  • Halt of lh01.xmpp.domain node


The server shuts down, the cluster promotes the junior to senior (lh02) and Client 1 Spark is forced to reconnect - and reconnects successfully to the room. No other participants are visible in the room.


Client 2 Spark does not receive any notice or visible indication that an error has occured. The logs of "dh01.standalone.domain" show the disconnection of the S2S connection.


When typing further messages in Client Spark 2, the following is received:


<message id="62YYc-88" to="test@dh01.standalone.domain/Spark" from="testroom@conference.xmpp.domain" type="error">



  <error code="406" type="MODIFY">

    <not-acceptable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/>


  <x xmlns="jabber:x:event">








Version / Setup information:


Openfire version: Git checkout of 3f66493850

Platform: Linux Centos 6.8

Database: Oracle 12.1

Load balancer: HA proxy for 5222, 5269


Cluster Node1 host: lh01.xmpp.domain

Cluster Node1 XMPP domain: xmpp.domain


Cluster Node2 host: lh02.xmpp.domain

Cluster Node2 XMPP domain: xmpp.domain


Node3 host: dh01.standalone.domain

Node3 XMPP domain: dh01.standalone.domain


Relevant DNS entries (others like the oracle host are not shown):


lh01.xmpp.domain.   IN  A

lh02.xmpp.domain.   IN  A


xmpp.domain.        IN  A

conference.xmpp.domain. IN  CNAME   xmpp.domain


dh01.standalone.domain. IN  A

conference.dh01.standalone.domain. IN  CNAME    dh01.standalone.domain.


_xmpp-client._tcp.xmpp.domain.      IN  SRV 0   0   5222    xmpp.domain.

_xmpp-server._tcp.xmpp.domain.      IN  SRV 0   0   5222    xmpp.domain.

_xmpp-server._tcp.conference.xmpp.domain.      IN  SRV 0   0   5222    conference.xmpp.domain.


NOTE: The dh01 IP address as listed above is the HA proxy IP address - so that incoming connections to dh01 look like they are coming from the "xmpp.domain" IP address rather than individual cluster nodes.


I have generated trusted certs that have all appropriate alternate names and imported them into the necessary nodes.


I realise I'm potentially asking for a world of pain using HEAD from git - if there's a specific version I should be trying this with, please let me know.


I have the lab cluster still up and running for further investigations / testing.


Thanks for any pointers that can be given - even if it's "add more debugging to the server connections _here_ and show us the logs".


Kind regards,