The MultiNICA agent represents a set of network  interfaces and provides failover capabilities between them.
  Each interface in a MultiNICA resource has a base IP  address, which can be the same or different. The MultiNICA agent configures one  interface at a time. If it does not detect activity on the configured interface,  it configures a new interface and migrates IP aliases to it.
  It's always a good idea to test the resource after  configuring it and before putting the cluster into a production  environment.
  To illustrate the correct and incorrect way to test a  MultiNICA resource, let's consider the following example with a simple service  group:
  group test_multiNIC (
          SystemList = { node1 = 1, node2  }
          )
          IPMultiNIC IPMNIC (
                  Address =  "192.200.99.5"
                  MultiNICResName =  MNIC
                  )
          MultiNICA MNIC (
                  Device @node1 = { hme0 =  "192.200.100.1", qfe0 = "192.200.100.1" }
                  Device @node2 = { hme0 =  "192.200.100.2", qfe0 = "192.200.100.2" }
                  )
          IPMNIC requires MNIC
          // resource dependency  tree
          //
          //      group  test_multiNIC
          //      {
          //      IPMultiNIC  IPMNIC
          //          {
          //          MultiNICA  MNIC
          //          }
          //      }
  Correct way to test the NIC  failover:
  1.  Bring the service group ONLINE:
  # hagrp -online test_multiNIC -sys  node1
  2. Verify that the primary NIC (first NIC in the Device  attribute) is properly set up:
  # ifconfig -a (on node1)
  The output should include 2 lines for hme0 (physical and  virtual IP addresses), looking like:
  hme0:  flags=1000843 mtu 1500 index  12 
          inet 192.200.100.1 netmask ffffff00 broadcast  192.200.100.255
          ether 8:0:20:b0:a8:1f 
  hme0:1:  flags=1000843 mtu 1500 index  12 
          inet 192.200.99.5 netmask ffffff00 broadcast  192.200.99.255
  3. In a shell window, launch a command to monitor the  state of the nodes, service groups and resources:
  # hastatus
  4. In another shell window, launch a command to monitor  the main VCS log file:
  # tail -f  /var/VRTSvcs/log/engine_A.log
  5. Pull the cable from hme0 or switch off the  hub or switch this NIC is attached to (as long as the other NIC in the  Device attribute is not connected to this same network  component).
  The MultiNICA agent will perform the NIC failover (from  hme0 to qfe0) after a 2-3 minute interval. This delay occurs because the  MultiNICA agent tests the failed NIC several times before doing the NIC  failover.
  6. Check the engine_A.log log file to see the  failover occurring. You should see lines like:
  TAG_C 2002/01/31 09:26:27 (node1)  VCS:136502:monitor:MNIC:MultiNICA: Device hme0 FAILED 
  TAG_C 2002/01/31 09:26:27 (node1)  VCS:136503:monitor:MNIC:MultiNICA: Acquired a WRITE Lock
  TAG_C 2002/01/31 09:26:27 (node1)  VCS:136504:monitor:MNIC:MultiNICA: Bringing down IP  addresses
  TAG_C 2002/01/31 09:26:27 (node1)  VCS:136505:monitor:MNIC:MultiNICA: Trying to online Device qfe0  
  TAG_C 2002/01/31 09:26:29 (node1)  VCS:136506:monitor:MNIC:MultiNICA: Sleeping 5 seconds
  TAG_C 2002/01/31 09:26:34 (node1)  VCS:136507:monitor:MNIC:MultiNICA: Pinging Broadcast address 192.200.100.255 on  Device qfe0, iteration 1
  TAG_C 2002/01/31 09:26:34 (node1)  VCS:136514:monitor:MNIC:MultiNICA: Migrated to Device qfe0
  TAG_C 2002/01/31 09:26:34 (node1)  VCS:136515:monitor:MNIC:MultiNICA: Releasing Lock 
  7. In the meantime, verify the hastatus output hasn't  changed. The test_multiNIC should still be ONLINE on node1; no resources were  affected. That is the expected behavior.
  Incorrect way to test the NIC  failover:
  Some people can be tempted to unplumb the NIC via a  command line to test the MultiNICA failover.
  If you unplumb a NIC with a command line (for example:  "ifconfig hme0 down unplumb"), VCS will notice that it hasn't put both  MultiNICA and IPMultiNIC resources down itself. In other words, these resources  will become OFFLINE, not being initiated by the agent monitor  procedures.
  The engine_A.log log files  shows:
  TAG_D 2002/01/31 09:32:53 (node1) VCS:13067:Agent is  calling clean for resource(IPMNIC) because the resource became OFFLINE  unexpectedly, on its own.
  TAG_D 2002/01/31 09:32:54 (node1)  VCS:13068:Resource(IPMNIC) - clean completed successfully.
  TAG_E 2002/01/31 09:32:55 VCS:10307:Resource IPMNIC  (Owner: unknown Group: test_multiNIC) is offline on node1
          (Not initiated by VCS.)
  Then the IPMultiNIC resource gets faulted on node1 and,  as it is a critical resource, the whole service group will failover. That is  obviously not the expected behavior.
service group in main.cf containing MultiNICA and IPMultiNIC resources is shown below:
  
  
  
  
  
  
  
  
  
  
  
  
  
  service group in main.cf containing MultiNICA and IPMultiNIC resources is shown below:
group mnic_test ( SystemList = { csvcs3 = 1, csvcs4 }  )
          IPMultiNIC mIP (
  Address  = "166.98.21.173" 
  MultiNICResName  = mnic
  )
          MultiNICA mnic (
  Device  @csvcs3 = { qfe2 = "166.98.21.197", qfe3 = "166.98.21.197" }
  Device  @csvcs4 = { qfe2 = "166.98.21.198", qfe3 = "166.98.21.198" }
  ArpDelay  = 5
  IfconfigTwice  = 1
  PingOptimize  = 0
  Handshake-Interval  = 10
  )  
          mIP requires mnic
          // resource dependency tree
          //
          //      group mnic_test
          //      {
          //      IPMultiNIC mIP
          //                MultiNICA  mnic
          //      }
  Plumb qfe2 on each machine with its respective base IPs.  In the example above, the base IP on csvcs3 is 166.98.21.197,  while that on  csvcs4 is 166.98.21.198. The virtual IP is 166.98.21.173 as shown in the  IPMultiNIC resource. Then create the mnic_test group as shown above.  
  In the sample configuration given above, the following  additional attributes are set for MultiNICA resource on a very active network.  ArpDelay set to 5 secs, to induce 5 second sleep between configuring an  interface and sending out a broadcast to inform routers about base IP address.  Default is 1 second. IfconfigTwice is set to cause the IP address to be plumbed  up twice, using an ifconfig up-down-up sequence. This increases the probability  of gratuitous arps (local broadcast) reaching the clients. Default is 0 (not  set).
  The following attributes for MultiNICA can be set to  decrease the agent detection/failover time :
  PingOptimize set to 0 to perform a broadcast ping each  monitor cycle and detect the inactive interface within the cycle. Default value  of 1 requires 2 monitor cycles.
  Handshake-Interval set to the least value of 10, from  the default value of 90. This makes the agent attempt 1 time (as opposed to 9  times from default), either to ping a host (from the NetworkHosts attribute) or  to ping the default broadcast address depending on the attribute configured,  when it fails over to a new NIC.
  Also it is to be noted that setting PingOptimize and  Handshake-Interval to the above values would certainly improve the response  time, but also would increase the chance for spurious failovers. So, essentially  it is a tradeoff between performance and reliability. 
  To test the configuration pull the cable from qfe2 on  csvcs3. It will fail over to qfe3 along with the virtual IP on the first node.  Then pull the cable off qfe3. After a 2-3  minute interval, the mIP resource on  csvcs3 will become faulted and the whole mnic_test group will go online on  csvcs4. This delay occurs because the MultiNICA agent tests the NIC several  times before marking the resource offline. 
 

No comments:
Post a Comment