Guus der Kinderen

Cluster member node doesn't join the cluster senior node

Discussion created by Guus der Kinderen Champion on Nov 28, 2007
Latest reply on Nov 29, 2007 by Guus der Kinderen

One particular nasty problem has been keeping us busy all day. I'm hoping that someone can help us out, because we're running out of ideas on how to proceed.

 

Our current Openfire setup (two cluster nodes running Openfire 3.4.1 with a couple of patches) runs fine in my test environment. Moving the exact same code to two other machines (which form another XMPP domain) gives us a strange problem: The fist node starts up fine. The second node however fails to join the cluster, ging us this exception:

 

2007-11-28 12:30:08.651 Oracle Coherence 3.3/387 <Info> (thread=pool-12-thread-1, member=n/a): Loaded operational configuration from resource jar:file:/srv/
openfire/plugins/enterprise/lib/coherence.jar!/tangosol-coherence.xml"
2007-11-28 12:30:08.662 Oracle Coherence 3.3/387 <Info> (thread=pool-12-thread-1, member=n/a): Loaded operational overrides from resource "jar:file:/srv/open
fire/plugins/enterprise/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2007-11-28 12:30:08.665 Oracle Coherence 3.3/387 <Info> (thread=pool-12-thread-1, member=n/a): Loaded operational overrides from resource "file:/srv/openfire
/enterprise/tangosol-coherence-override.xml"
 
Oracle Coherence Version 3.3/387
 Grid Edition: Development mode
Copyright (c) 2000-2007 Oracle. All rights reserved.
 
0.1627200 secs]
2007-11-28 12:30:09.537 Oracle Coherence GE 3.3/387 <Warning> (thread=pool-12-thread-1, member=n/a): UnicastUdpSocket failed to set receive buffer size to 14
28 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proc
eeding with the actual value may cause sub-optimal performance.
2007-11-28 12:30:09.876 Oracle Coherence GE 3.3/387 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2007-11-28 12:30:09.892 Oracle Coherence GE 3.3/387 <Error> (thread=Cluster, member=n/a): Assertion failed:
        at com.tangosol.coherence.component.net.Member.configure(Member.CDB:6)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.ClusterService$NewMemberAnnounceReply.onReceived(ClusterService.CDB:66)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.onMessage(Service.CDB:9)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.onNotify(Service.CDB:123)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.ClusterService.onNotify(ClusterService.CDB:3)
        at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:35)
        at java.lang.Thread.run(Thread.java:619)
 
2007-11-28 12:30:09.892 Oracle Coherence GE 3.3/387 <Error> (thread=Cluster, member=n/a): Terminating ClusterService due to unhandled exception: com.tangosol
.util.AssertionException
2007-11-28 12:30:09.892 Oracle Coherence GE 3.3/387 <Error> (thread=Cluster, member=n/a):
com.tangosol.util.AssertionException:
        at com.tangosol.coherence.Component._assertFailed(Component.CDB:12)
        at com.tangosol.coherence.Component._assert(Component.CDB:3)
        at com.tangosol.coherence.component.net.Member.configure(Member.CDB:6)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.ClusterService$NewMemberAnnounceReply.onReceived(ClusterService.CDB:66)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.onMessage(Service.CDB:9)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.onNotify(Service.CDB:123)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.ClusterService.onNotify(ClusterService.CDB:3)
        at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:35)
        at java.lang.Thread.run(Thread.java:619)
2007-11-28 12:30:09.894 Oracle Coherence GE 3.3/387 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster

 

 

We've checked about all that we could think of, which includes:

  • Firewalls are disabled;

  • We used udpcast to make sure that multicasting between the hosts works;

  • We've verified that (one) udp packet arrives at the 'master' node every time we try to join the second node to the cluster;

  • JVMs are identical.

 

Does anyone have suggestions?

Outcomes