16 Replies Latest reply on Feb 16, 2012 6:47 PM by Danman56

    Cluster nodes can't see each other

      Hello guys

       

      I've set up two nodes on different linux machines, and they see each other, and could access 8088 port of each other.

      I've managed to find the coherence 3.3.0, I've also managed to hadle "Unable to access backing cache for Routing User Sessions" exception.

      But I can't get the nodes to see each other. I've enabled clustering on both machines, but see the only node - itself - in nodes list.

       

      I wasn't able to find the conditions under which it should work, so decided to ask.

       

      Should the both nodes be in the same sub net?

      Should be there opened some another ports except 8088?

      Should be there the ICMP or another protocol opened?

      What could be the reasons the nodes can't see each other?

       

      I've also tried to use this configuration : http://community.igniterealtime.org/docs/DOC-1260

       

      but no luck.

      Could you please help me?

        • Cluster nodes can't see each other
          David

          You need to open 32386 for multicast traffic. The systems don't need to be on the same subnet, but they do need to be able to communicate via multicast - Simplest configuration for this is to have them on the same VLAN/subnet.

           

          If you have a firewall between the nodes, you should verify that it is not dropping the multicast traffic.

          1 of 1 people found this helpful
          • Cluster nodes can't see each other

            Guys, one more question:

             

            Should the openfire on each cluster node be confidured for the same domain?

            If so, what domain it should be?

             

            e.g. node1 is hosted on hode1_hostname, node2 on node2_hostname,

            which domain name should be specifyed on both openfires?

             

            and if I use cluster_hostname as domain for both, how the users will be able to access it?

            which IP should be binded to the cluster_hostname domain?

              • Cluster nodes can't see each other
                David

                Assuming you are talking about DNS domain, not XMPP domain:

                 

                1) The domain name of the system running Openfire does not matter. They require unique hostnames, however.

                2) You will need a load balancer or other mechanism to route users to an available cluster member. This is not functionality provided by the clustering plugin.

                 

                Both systems will need the same Openfire configuration, either by using the same database (ideal) or replicating the database between nodes (never tried it). The XMPP domain on both will therefore be identical.

                  • Cluster nodes can't see each other

                    David, thanks for your reply.

                     

                    So, the domain name the OF asks on the second step is means XMPP domain and it should not be the same as the machine the OF runs on domain name?

                     

                    So these xmpp domains should be same for all cluster node?

                     

                    Am I right?

                      • Cluster nodes can't see each other
                        David

                        The XMPP domain can be the same as the DNS domain of the hosts, but it does not have to be. The XMPP domain has to be identical within Openfire across all cluster nodes - You really need the entire Openfire configuration to be the same on all cluster nodes. We accomplish this by pointing all the systems to the same database.

                          • Cluster nodes can't see each other

                            No luck

                             

                            Seems everything is fune, but nodes can't see each other.

                             

                            I get the following exception on node2

                             

                            2011.12.13 16:51:32 org.jivesoftware.openfire.cluster.ClusterManager - No scheme for cache: "PEPServiceManager"               
                                            java.lang.IllegalArgumentException: No scheme for cache: "PEPServiceManager"               
                                                 at com.tangosol.net.DefaultConfigurableCacheFactory.findSchemeMapping(DefaultConfi gurableCacheFactory.java:476)               
                                                 at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurabl eCacheFactory.java:270)               
                                                 at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:689)               
                                                 at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:667)               
                                                 at com.jivesoftware.util.cache.ClusteredCache.<init>(ClusteredCache.java:58)                
                                                 at com.jivesoftware.util.cache.CoherenceClusteredCacheFactory.createCache(Coherenc eClusteredCacheFactory.java:177)               
                                            at org.jivesoftware.util.cache.CacheFactory.joinedCluster(CacheFactory.java:674)               
                                            at org.jivesoftware.openfire.cluster.ClusterManager$1.run(ClusterManager.java:61)               

                             

                             

                             

                            I've set it up as described here:

                             

                            Oracle coherence 3.3 jar files for openfire 3.7.0

                            http://community.igniterealtime.org/message/213184#213184

                             

                            - clustering.jar installed from the openfire admin panel

                             

                            - Oracle coherence jars put inside openfire lib dir

                             

                            - locate coherence-cache-config.xml inside the clustering.jar

                            and put it inside coherence.jar (replacing the default one)

                             

                            - make sure all servers that are being clustered connect to the same external database (same xmpp domain)

                             

                            - start the servers

                             

                             

                            ports 32386 and 8088 are opened.

                              • Cluster nodes can't see each other
                                David

                                So both nodes are running, but as individual clusters? Can you post the nohup.out from each node - That is where Coherence logs all the cluster member information.

                                 

                                If you use tcpdump or another package capture tool, do you see the multicast traffic from the other node on each box?

                      • Cluster nodes can't see each other

                        Here is the nohup.out from the node 1

                         

                        Openfire 3.7.1 [Dec 13, 2011 5:24:51 PM]

                        Admin console listening at:

                          http://cluster1.mytransfire.com:9090

                          https://cluster1.mytransfire.com:9091

                        Starting Clustering Plugin

                        2011-12-13 17:24:53.944 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational configuration from resource "jar:file:/opt/openfire/lib/coherence.jar!/tangosol-coherence.xml"

                        2011-12-13 17:24:53.948 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational overrides from resource "jar:file:/opt/openfire/lib/coherence.jar!/tangosol-coherence-override-dev.xml"

                        2011-12-13 17:24:53.949 Oracle Coherence 3.3.1/389 <D5> (thread=pool-8-thread-1, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified

                         

                        Oracle Coherence Version 3.3.1/389

                        Grid Edition: Development mode

                        Copyright (c) 2000-2007 Oracle. All rights reserved.

                         

                        2011-12-13 17:24:54.516 Oracle Coherence GE 3.3.1/389 <Warning> (thread=pool-8-thread-1, member=n/a): UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.

                        2011-12-13 17:24:54.683 Oracle Coherence GE 3.3.1/389 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a

                        2011-12-13 17:24:57.887 Oracle Coherence GE 3.3.1/389 <Info> (thread=Cluster, member=n/a): Created a new cluster with Member(Id=1, Timestamp=2011-12-13 17:24:54.523, Address=10.34.158.216:8088, MachineId=46808, Location=process:4742@ip-10-34-158-216, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) UID=0x0A229ED8000001343874A07BB6D81F98

                        2011-12-13 17:24:57.935 Oracle Coherence GE 3.3.1/389 <Info> (thread=pool-8-thread-1, member=1): Loaded cache configuration from resource "jar:file:/opt/openfire/lib/coherence.jar!/coherence-cache-config.xml"

                        2011-12-13 17:24:58.069 Oracle Coherence GE 3.3.1/389 <D5> (thread=ReplicatedCache, member=1): Service ReplicatedCache joined the cluster with senior service member 1

                        2011-12-13 17:24:58.109 Oracle Coherence GE 3.3.1/389 <D5> (thread=Invocation:OpenFire Cluster Service, member=1): Service OpenFire Cluster Service joined the cluster with senior service member 1

                        2011-12-13 17:24:58.299 Oracle Coherence GE 3.3.1/389 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1

                         

                         

                        here is the node2:

                         

                        Openfire 3.7.1 [Dec 13, 2011 5:30:37 PM]

                        Admin console listening at:

                          http://cluster1.mytransfire.com:9090

                          https://cluster1.mytransfire.com:9091

                        Starting Clustering Plugin

                        2011-12-13 17:30:39.015 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational configuration from resource "jar:file:/opt/openfire/lib/coherence.jar!/tangosol-coherence.xml"

                        2011-12-13 17:30:39.019 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational overrides from resource "jar:file:/opt/openfire/lib/coherence.jar!/tangosol-coherence-override-dev.xml"

                        2011-12-13 17:30:39.020 Oracle Coherence 3.3.1/389 <D5> (thread=pool-8-thread-1, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified

                         

                        Oracle Coherence Version 3.3.1/389

                        Grid Edition: Development mode

                        Copyright (c) 2000-2007 Oracle. All rights reserved.

                         

                        2011-12-13 17:30:39.499 Oracle Coherence GE 3.3.1/389 <Warning> (thread=pool-8-thread-1, member=n/a): UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.

                        2011-12-13 17:30:39.699 Oracle Coherence GE 3.3.1/389 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a

                        2011-12-13 17:30:42.902 Oracle Coherence GE 3.3.1/389 <Info> (thread=Cluster, member=n/a): Created a new cluster with Member(Id=1, Timestamp=2011-12-13 17:30:39.567, Address=10.80.177.200:8088, MachineId=60360, Location=process:4121@ip-10-80-177-200, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) UID=0x0A50B1C8000001343879E44FEBC81F98

                        2011-12-13 17:30:42.947 Oracle Coherence GE 3.3.1/389 <Info> (thread=pool-8-thread-1, member=1): Loaded cache configuration from resource "jar:file:/opt/openfire/lib/coherence.jar!/coherence-cache-config.xml"

                        2011-12-13 17:30:43.076 Oracle Coherence GE 3.3.1/389 <D5> (thread=ReplicatedCache, member=1): Service ReplicatedCache joined the cluster with senior service member 1

                        2011-12-13 17:30:43.117 Oracle Coherence GE 3.3.1/389 <D5> (thread=Invocation:OpenFire Cluster Service, member=1): Service OpenFire Cluster Service joined the cluster with senior service member 1

                        2011-12-13 17:30:43.296 Oracle Coherence GE 3.3.1/389 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1

                         

                        will look for the multicast traffic.

                          • Re: Cluster nodes can't see each other
                            David

                            Are you sure the two systems can communicate via multicast? They are on totally different networks (unless it's a /8, which I doubt).

                             

                            If you start up Openfire on one node and tcpdump on the other, you should see something like this:

                             

                            12:36:56.730222 IP 169.254.200.2.32386 > 224.3.7.0.32386: UDP, length 109

                             

                            If you don't after a few minutes, I would assume multicast is not being routed properly.

                             

                            If multicast is not supported in your environment, take a look at this documentation.

                             

                            http://community.igniterealtime.org/docs/DOC-1260

                             

                            btw, you need to copy tangosol-coherence-override.xml from the plugin into / on box boxes (or symlink it).

                          • Cluster nodes can't see each other

                            Yes, they are in different networks, so I asked whether they should be in one.

                             

                            I will check the multicast routing and be back, thanks for the tips!

                              • Cluster nodes can't see each other

                                Seems that there some issue with multicasting.

                                 

                                I've configured the nodes as described in http://community.igniterealtime.org/docs/DOC-1260, but still no luck.

                                 

                                Here is the nohup from the servers:

                                 

                                 

                                Admin console listening at:

                                  http://cluster1.mytransfire.com:9090

                                  https://cluster1.mytransfire.com:9091

                                Starting Clustering Plugin

                                2011-12-14 15:57:17.521 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational configuration from resource "jar:file:/opt/openfire/lib/coherence.jar!/tangosol-coherence.xml"

                                2011-12-14 15:57:17.526 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational overrides from resource "jar:file:/opt/openfire/lib/coherence.jar!/tangosol-coherence-override-dev.xml"

                                2011-12-14 15:57:17.527 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational overrides from resource "file:/opt/openfire/lib/tangosol-coherence-override.xml"

                                 

                                Oracle Coherence Version 3.3.1/389

                                Grid Edition: Development mode

                                Copyright (c) 2000-2007 Oracle. All rights reserved.

                                 

                                2011-12-14 15:57:18.082 Oracle Coherence GE 3.3.1/389 <Warning> (thread=pool-8-thread-1, member=n/a): UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.

                                2011-12-14 15:57:18.270 Oracle Coherence GE 3.3.1/389 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a

                                2011-12-14 15:57:21.473 Oracle Coherence GE 3.3.1/389 <Info> (thread=Cluster, member=n/a): Created a new cluster with Member(Id=1, Timestamp=2011-12-14 15:57:18.087, Address=10.34.158.216:8088, MachineId=46808, Location=process:5874@ip-10-34-158-216.ec2.intern, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) UID=0x0A229ED8000001343D4AC787B6D81F98

                                2011-12-14 15:57:21.523 Oracle Coherence GE 3.3.1/389 <Info> (thread=pool-8-thread-1, member=1): Loaded cache configuration from resource "jar:file:/opt/openfire/lib/coherence.jar!/coherence-cache-config.xml"

                                2011-12-14 15:57:21.650 Oracle Coherence GE 3.3.1/389 <D5> (thread=ReplicatedCache, member=1): Service ReplicatedCache joined the cluster with senior service member 1

                                2011-12-14 15:57:21.687 Oracle Coherence GE 3.3.1/389 <D5> (thread=Invocation:OpenFire Cluster Service, member=1): Service OpenFire Cluster Service joined the cluster with senior service member 1

                                2011-12-14 15:57:21.870 Oracle Coherence GE 3.3.1/389 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1

                                 

                                 

                                Admin console listening at:

                                  http://cluster1.mytransfire.com:9090

                                  https://cluster1.mytransfire.com:9091

                                Starting Clustering Plugin

                                2011-12-14 15:57:26.672 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational configuration from resource "jar:file:/opt/openfire/lib/coherence.jar!/tangosol-coherence.xml"

                                2011-12-14 15:57:26.676 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational overrides from resource "jar:file:/opt/openfire/lib/coherence.jar!/tangosol-coherence-override-dev.xml"

                                2011-12-14 15:57:26.676 Oracle Coherence 3.3.1/389 <Info> (thread=pool-8-thread-1, member=n/a): Loaded operational overrides from resource "file:/opt/openfire/lib/tangosol-coherence-override.xml"

                                 

                                Oracle Coherence Version 3.3.1/389

                                Grid Edition: Development mode

                                Copyright (c) 2000-2007 Oracle. All rights reserved.

                                 

                                2011-12-14 15:57:27.319 Oracle Coherence GE 3.3.1/389 <Warning> (thread=pool-8-thread-1, member=n/a): UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.

                                2011-12-14 15:57:27.574 Oracle Coherence GE 3.3.1/389 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a

                                2011-12-14 15:57:30.776 Oracle Coherence GE 3.3.1/389 <Info> (thread=Cluster, member=n/a): Created a new cluster with Member(Id=1, Timestamp=2011-12-14 15:57:27.323, Address=10.80.177.200:8088, MachineId=60360, Location=process:4982@ip-10-80-177-200.ec2.intern, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) UID=0x0A50B1C8000001343D4AEB9BEBC81F98

                                2011-12-14 15:57:30.823 Oracle Coherence GE 3.3.1/389 <Info> (thread=pool-8-thread-1, member=1): Loaded cache configuration from resource "jar:file:/opt/openfire/lib/coherence.jar!/coherence-cache-config.xml"

                                2011-12-14 15:57:30.888 Oracle Coherence GE 3.3.1/389 <D5> (thread=ReplicatedCache, member=1): Service ReplicatedCache joined the cluster with senior service member 1

                                2011-12-14 15:57:30.989 Oracle Coherence GE 3.3.1/389 <D5> (thread=Invocation:OpenFire Cluster Service, member=1): Service OpenFire Cluster Service joined the cluster with senior service member 1

                                2011-12-14 15:57:31.109 Oracle Coherence GE 3.3.1/389 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1

                                 

                                 

                                Could you please help?

                                  • Cluster nodes can't see each other

                                    I need just a list of conditions under which the clustering should work.

                                     

                                    Or the list why it won't work

                                     

                                     

                                      • Cluster nodes can't see each other
                                        David

                                        What specifically did you configure. If you are doing unicast with the nodes hard coded in the config, are you seeing packets on 8088 between the nodes. Did you update the xml file inside the jar or something else? I've never configured both nodes with unicast before, so I'm not sure what the logs are supposed to look like.

                                         

                                        Can you move both nodes to be on the same subnet and just do it using multicast?

                                  • Re: Cluster nodes can't see each other

                                    Guys, David,

                                     

                                    That is what I found:

                                     

                                    I've compiled the xml as described in DOC-1260, here it is:

                                     

                                    <coherence>

                                      <cluster-config>

                                        <unicast-listener>

                                          <well-known-addresses>

                                            <socket-address id="1">

                                              <address system-property="tangosol.coherence.wka1">xx.xx.158.216</address>

                                              <port system-property="tangosol.coherence.wka1.port">8088</port>

                                            </socket-address>

                                            <socket-address id="2">

                                              <address system-property="tangosol.coherence.wka2">xx.xx.177.200</address>

                                              <port system-property="tangosol.coherence.wka2.port">8088</port>

                                            </socket-address>

                                          </well-known-addresses>

                                        </unicast-listener>

                                      </cluster-config>

                                    </coherence>

                                    The coherence see that file and leave a msg in nohup.out on both nodes:

                                    Loaded operational overrides from resource "file:/opt/openfire/lib/tangosol-coherence-override.xml"

                                     

                                    But the nodes can't see each other.

                                     

                                    Also I've used the embedded in coherence lib clustering tests as described here:

                                    http://docs.oracle.com/cd/E14447_01/coh.330/coh33ug/instalcoherence.htm

                                     

                                    I've run

                                    java -Dtangosol.coherence.override=tangosol-coherence-override.xml -jar coherence.jar

                                     

                                    on both machines, and that console test reports that the nodes have found each other.

                                     

                                    SO that means that the coherence itself is able to locate the node. The problem seems in the openfire.

                                     

                                    UPD::

                                     

                                    Sorry, coherence test applications can see the launched openfire, that's whi I decided that it could see another app on second node.

                                     

                                    It seems that coherence can't see other node too.

                                    But why? The 8088 ports are accessible, the iptables is stopped. Nodes can ping each other, but the coherence can't see nodes?

                                    • Cluster nodes can't see each other

                                      Hello community

                                       

                                      Finally I've managed to set it up.

                                      The short answer is - UDP.

                                      And the recommendation is look at the coherence docs, not at the openfire. As the openfire clustering feature is completely based on coherence. Find the links in the ende of post.

                                      Actually I wasn't able to find the information re the infrastructure requirements for clustering (What ports, what protocols, wtf).

                                      Probably I just was searching bad. But this is a point for OF documentation.

                                       

                                      Also, I was too dummy to get the answers you gave me there

                                      I've just compiled the doc how-to (UNICAST) for dummies like I was, here it is:

                                       

                                      Openfirecluster setup

                                      We are going to use openfire unicastmanual configuration as it is faster and safer than multicast.

                                       

                                      That approach requires the pre-definingof all servers in the tangosol-coherence-override.xml file. That prevents unauthorized servers from joining cluster. That also reducesthe multicast flood and take load off from the network.

                                       

                                      Assuming the oracle java and openfirehas already been installed.

                                      Make sure that 8088 ports on allcluster servers is opened for UDP traffic.

                                       

                                      To test the server accessibility oraclecoherence lib has an embedded testing feature.

                                       

                                      1 Unpack the coherence distribution in some%COHERENCE_HOME% folder.

                                      unzip coherence.zip

                                      2 Prepare the tangosol-coherence-override.xml

                                      <coherence>

                                      <logging-config>

                                        <destination>stdout</destination>

                                      <severity-levelsystem-property="tangosol.coherence.log.level">9</severity-level >

                                      </logging-config>

                                         <cluster-config>

                                           <unicast-listener>

                                             <well-known-addresses>

                                               <socket-address id="1">

                                                <address>host1</address>

                                                 <port>8088</port>

                                               </socket-address>

                                               <socket-address id="2">

                                                <address>host2</address>

                                                 <port>8088</port>

                                               </socket-address>

                                             </well-known-addresses>

                                           </unicast-listener>

                                           <host-address>

                                      host1

                                           </host-address>

                                            <host-address>

                                      host2

                                           </host-address>

                                         </cluster-config>

                                      </coherence>

                                       

                                      To diagnose problems you can also usethe

                                       

                                      <logging-config>

                                      <destination>stdout</destination>

                                      <severity-levelsystem-property="tangosol.coherence.log.level">9</severity-level>

                                      </logging-config>

                                       

                                      inside the <coherence> elementh.

                                      Where 9 is the max verbose logging.Levels are -1 (no log msgs) to 9 (max level), default is 3.

                                       

                                      <well-known-addresses>

                                      By default Coherence uses a multicastprotocol to discover other nodes when forming a cluster. If multicastnetworking is undesirable, or unavailable in your environment, theWell Known Addresses feature may be used to eliminate the need formulticast traffic. When in use the cluster is configured with arelatively small list of nodes which are allowed to start thecluster, and which are likely to remain available over the clusterlifetime. There is no requirement for all WKA nodes to besimultaneously active at any point in time. This list is used by allother nodes to find their way into the cluster without the use ofmulticast, thus at least one well known node must be running forother nodes to be able to join.

                                       

                                      <socket-address> isrequired elementh, specifies a list of "well known"addresses (WKA) that are used by the cluster discovery protocol inplace of multicast broadcast. If one or more WKA is specified, for amember to join the cluster it will either have to be a WKA or therewill have to be at least one WKA member running. Additionally, allcluster communication will be performed using unicast. If empty orunspecified multicast communications will be used.

                                       

                                       

                                      <authorized-hosts>

                                      If specified, restricts clustermembership to the cluster nodes specified in the collection ofunicast addresses, or address range. The unicast address is theaddress value from the authorized cluster nodes' unicast-listenerelement. Any number of host-address and host-range elements may bespecified.

                                       

                                      <host-address>

                                      Specifies an IP address or hostname. Ifany are specified, only hosts with specified host-addresses or withinthe specified host-ranges will be allowed to join the cluster.

                                      The content override attributes id canbe optionally used to fully or partially override the contents ofthis element with XML document that is external to the base document.

                                       

                                      Also you can use host-range instead ofhost-address.

                                       

                                      <host-range>

                                      Specifies a range of IP addresses. Ifany are specified, only hosts with specified host-addresses or withinthe specified host-ranges will be allowed to join the cluster.

                                      The content override attributes id canbe optionally used to fully or partially override the contents ofthis element with XML document that is external to the base document.

                                       

                                      3 Installation the clustering plugin.

                                      The clustering plugin adds support forrunning multiple redundant Openfire servers together in a cluster. Byrunning Openfire in a cluster, you can distribute the load amongst anumber of servers, as well as having some form of redundency in theevent that one of your servers dies. This plugin requires a validOracle Coherence license.

                                      Follow steps 1 through 4 for addingOracle Coherence libraries to Openfire. Step 5 explains how to addthis plugin to your Openfire setup.

                                      1). Get Oracle Coherence for JavaVersion.

                                      2). Unzip the coherence file andlocate coherence.jar, coherence-work.jar and tangosol.jar in foldercoherence/lib.

                                      3). Copy coherence.jar,coherence-work.jar and tangosol.jar to [openfire_home]/lib.

                                      4). Restart Openfire server.

                                      Navigate to theopenfire_url/available-plugins.jsp

                                      in browser.

                                      In the list of plugins find theclustering plugin and install it.

                                       

                                      4 Add additional parameters into the JVM openfireconfiguration on host A.

                                      sudo nano /etc/sysconfig/openfire

                                       

                                      -Djava.net.preferIPv4Stack=true

                                      -Dtangosol.coherence.localhost=<localhost>-Dtangosol.coherence.machineid=10

                                      -Dtangosol.coherence.wka2=<localhost>-Dtangosol.coherence.wka1=<hostB>

                                       

                                      Substitute the <localhost> with the hostname. The same filewith substitutions should be presented on the second host (host b).

                                       

                                      5 Increasing the socket buffer size.

                                      To help minimization of packet loss, the OS socket buffers need to be large enough to handle the incoming network traffic while your Java application is paused during garbage collection. By default Coherence will attempt to allocate a socket buffer of 2MB. If your OS is not configured to allow for large buffers Coherence will utilize smaller buffers. Most versions of Unix have a very low default buffer limit, which should be increased to at least 2MB.

                                       

                                      In openfire_home/logs/nohup.out file you will receive the following warning if the OS failed to allocate the full size buffer.

                                      UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.

                                      Though it is safe to operate with the smaller buffers it isrecommended that you configure your OS to allow for larger buffers.

                                      On Linux execute (as root):

                                      sysctl -w net.core.rmem_max=2096304sysctl -w net.core.wmem_max=2096304

                                      6 Put the wka file into the openfire_home/lib folder.

                                      Copy compiled on the step 2 tangosol-coherence-override.xml

                                      into the openfire/lib folder.

                                       

                                      7 Start openfire server.

                                      Start both openfire servers.

                                      sudo /etc/init.d/openfire start

                                       

                                      Check the openfire_home/logs/nohup.outfor possible errors.

                                      Navigate to the Server-Clustering menuon the admin panel and check that you see all nodes.

                                       

                                      8 Troubleshooting.

                                       

                                      1) Navigate tolib folder and runthe accessibility test

                                      java-Dtangosol.coherence.override=tangosol-coherence-override.xml-Dtangosol.coh erence.ttl=255 -jar coherence.jar

                                      Which means run embedded test fromcoherence.jar

                                      using the unicast configuration and tangosol-coherence-override.xml as config file. That test is startingin dev configuration which is supposed to be used by developers whenthey are have couple of cluster nodes on the same machine. So the TTLof frames is set to 0 in this configuration. To ovveride this settingwe use -Dtangosol.coherence.ttl=255 parameter. Actually you can youTTL value just enough to reach from one node to another. In our caseit is 61.

                                      Pay attention onActualMemberSet, it shows the cluster members.

                                      Then run the same command on another node and you should see ActualMemberSet Size =2.

                                      If it is not, then there is problems inserver conversation.

                                      2) Coherence datagram test

                                       

                                      Included withCoherence is a Datagram Test utility which can be used to test andtune network performance between two or more machines.  The Datagramtest operates in one of three modes, either as a packet publisher, apacket listener, or both.  When run a publisher will transmit UDPpackets to the listener who will measure the throughput, successrate, and other statistics.

                                       

                                      Syntax

                                      The Datagram testsupports a large number of configuration options, though only a feware required for basic operation. To run the Datagram Test utilityuse the following synctax from the command line:

                                       

                                      javacom.tangosol.net.DatagramTest <command value ...> <addr:port...>

                                       

                                      Run

                                      java-server com.tangosol.net.DatagramTest -local localhost:8088-packetSize 1468

                                       

                                      on one of thenodes to listen for UDP pockets on 8088 port

                                       

                                      And on anothernode

                                       

                                      java -server com.tangosol.net.DatagramTest -local localhost:8088 -packetSize 1468 <node1host>:8088

                                       

                                       

                                      to produce the udppockets.

                                       

                                      If the everythingis ok, then producer will produce UDP pockets, and show consoleoutput like

                                      oooooooooOoooooooooO

                                       

                                      The series of "o"and "O" tick marks appear as data is (O)utput on thenetwork. Each "o" represents 1000 packets, with "O"indicators at every 10,000 packets.

                                       

                                       

                                       

                                      On Node2 youshould see a corresponding set of "i" and "I"tick marks, representing network (I)nput. This indicates that the twotest instances are communicating.

                                       

                                      If you see theonly ooooO and no iiiiIii then the servers could not see each other,and you should check networking issues. For the more informationrefer to the

                                      http://coherence.oracle.com/display/COH33UG/Datagram+Test

                                       

                                       

                                       

                                      9) Useful links.

                                       

                                      In case of any problems use thecoherence user guide:

                                      http://coherence.oracle.com/display/COH33UG/Coherence+3.3+Home

                                       

                                      WKA tuning

                                      http://coherence.oracle.com/display/COH33UG/well-known-addresses

                                       

                                      Production check-list

                                      http://coherence.oracle.com/display/COH33UG/Production+Checklist

                                        • Cluster nodes can't see each other

                                          Thanks for this in depth instruction guide i think i can use it for my servers. I am runnng 2 servers with windows 2008 R2 and an external MySQL DB. I have been trying to get the clustering setup between these two servers for a few days. Has anyone been able to get the clustering to work on windows?