Standardize IPMP Network Configuration


One of the biggest issue in maintain a huge production plant is you are going to need a number of administrators to manage it. And each of the  administrator comes on-board with his own set of naming conventions, tastes, liking and past experience. However, this is not good for the overall production plant maintenance. We witnessed this typically with IPMP configuration on each of our boxes. All of them used to work but someone used to name the IPs like IP-1, IP-2, IP3 or for that matter IP-A, IP-B, IP-C.
During outages, it used to become a daunting task to figure this out and we used to loose precious time. So we came up with a plan that the problem needs to be fixed at the root level.

  1. We came up with a clean naming convention system around it.
  2. Next, we created a script that used to source a file called net.txt (generated for our plant by network administrators) to setup IPMP on individual boxes.
  3. We ensured that all the administrators only use this script to configure networks on the system and provide necessary artifacts.
Here is the script ipmp.sh and net.txt -
$ cat ipmp.sh
#!/usr/bin/ksh
# IPMP Auto Configuration
# net.txt is Generated by Network Engineers
usage() {
echo "usage: $0"
exit 1
}
cont() {
[ $MODE = force ] && return
echo "Continue? yes/[no] \c"
read answer
[ "$answer" != yes ] && exit 0
}
netmasks() {
cat <<EOF
  # table of netmasks
  # 32       255.255.255.255         ffffffff
  # 31       255.255.255.254         fffffffe
  # 30       255.255.255.252         fffffffc
  # 29       255.255.255.248         fffffff8
  # 28       255.255.255.240         fffffff0
  # 27       255.255.255.224         ffffffe0
  # 26       255.255.255.192         ffffffc0
  # 25       255.255.255.128         ffffff80
  # 24       255.255.255.0           ffffff00     "CLASS C"
  # 23       255.255.254.0           fffffe00
  # 22       255.255.252.0           fffffc00
  # 21       255.255.248.0           fffff800
  # 20       255.255.240.0           fffff000
  # 19       255.255.224.0           ffffe000
  # 18       255.255.192.0           ffffc000
  # 17       255.255.128.0           ffff8000
  # 16       255.255.0.0             ffff0000     "CLASS B"
EOF
}
clean_net() {
[ -z $1 ] && myhost=`uname -n` || myhost=$1
grep "^$myhost" $NET |
   sed -e 's#non_tcp/ip#n/a#g' |
   sed -e 's#N/A#n/a#g' |
   grep -v NET_MGT \
  >$NET.local.$myhost
for i in `cut -f3 $NET.local.$myhost`; do
grep "^$i " $NET >>$NET.local.$myhost
done
NET=$NET.local.$myhost
}
check_net_txt() {
# a few basic checks of the net.txt file
# check that net.txt exists
if [ ! -a $NET ]; then
   echo "ERROR: can't find $NET"
   exit 1
fi
# check net.txt for valid data
nettxterror=0
cat $NET |
while read line
do
   cols=`echo $line |wc -w`
   if [ $cols -ne 14 -a $cols -ne 4 ]; then
      echo "WARNING: the following line of $NET has `echo $line |wc -w` columns:"
      echo "  $line"
      nettxterror=1
   elif [ $cols -eq 14 ]; then
      echo $line |read ibmhostdiscard telhostdiscard vlanid vip if1name if1addr if1mode if1port if2name if2addr if2mode if2port primaryipdiscard defroutdiscard
      syntaxerror=0
      case $vip-$if1addr-$if2addr in
      # error codes:
      # 1 - interface wrong
      # 21 - mode wrong (VCS)
      # 22 - mode wrong (no_IPMP)
      # 23 - mode wrong (IPMP)
      # 3 - IP address wrong
      # 4 - switch port wrong
      # 10 - can't interpret, VCS, IPMP or non-IPMP.
         n/a-n/a-n/a )
            # VCS vlan
            echo $if1name |egrep "^(ce|e1000g|nxge)[0-9][0-9]*$" >/dev/null || syntaxerror=1
            [ "$if1mode" = n/a -o "$if1mode" = active ] || syntaxerror=21
            echo $if1port |grep "^CPOD[0-9][0-9]*_Access_*[0-9][0-9]*,_[0-9][0-9]*/[0-9][0-9]*$" >/dev/null || syntaxerror=4
            [ "$if2name" = n/a ] || syntaxerror=10
            [ "$if2addr" = n/a ] || syntaxerror=10
            [ "$if2mode" = n/a ] || syntaxerror=10
            [ "$if2port" = n/a ] || syntaxerror=10
            ;;
         n/a-* )
            # no-ipmp group
            echo $if1name |egrep "^(ce|e1000g|nxge)[0-9][0-9]*$" >/dev/null || syntaxerror=1
            echo $if1addr |grep "^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$" >/dev/null || syntaxerror=3
            [ "$if1mode" = active -o "$if1mode" = passive ] || syntaxerror=22
            echo $if1port |grep "^CPOD[0-9][0-9]*_Access_*[0-9][0-9]*,_[0-9][0-9]*/[0-9][0-9]*$" >/dev/null || syntaxerror=4
            [ "$if2name" = n/a ] || syntaxerror=10
            [ "$if2addr" = n/a ] || syntaxerror=10
            [ "$if2mode" = n/a ] || syntaxerror=10
            [ "$if2port" = n/a ] || syntaxerror=10
            ;;
         * )
            # ipmp group
            echo $if1name |egrep "^(ce|e1000g|nxge)[0-9][0-9]*$" >/dev/null || syntaxerror=1
            echo $if1addr |grep "^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$" >/dev/null || syntaxerror=3
            [ "$if1mode" = active -o "$if1mode" = passive ] || syntaxerror=23
            echo $if1port |grep "^CPOD[0-9][0-9]*_Access_*[0-9][0-9]*,_[0-9][0-9]*/[0-9][0-9]*$" >/dev/null || syntaxerror=4
            echo $if2name |egrep "^(ce|e1000g|nxge)[0-9][0-9]*$" >/dev/null || syntaxerror=1
            echo $if2addr |grep "^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$" >/dev/null || syntaxerror=3
            [ "$if2mode" = active -o "$if2mode" = passive ] || syntaxerror=23
            echo $if2port |grep "^CPOD[0-9][0-9]*_Access_*[0-9][0-9]*,_[0-9][0-9]*/[0-9][0-9]*$" >/dev/null || syntaxerror=4
            [ "${if1mode}/${if2mode}" = "active/passive" -o "${if1mode}/${if2mode}" = "passive/active" ] || syntaxerror=23
            ;;
      esac
      if [ $syntaxerror -ne 0 ]; then
         echo "WARNING: the following line of net.txt contains syntax errors:"
         case $syntaxerror in
            1 ) message="cannot decode an interface name" ;;
            21 ) message="cannot interpret active/passive mode of an interface (VCS group)" ;;
            22 ) message="cannot interpret active/passive mode of an interface (no-IPMP group)" ;;
            23 ) message="cannot interpret active/passive mode of an interface (IPMP group)" ;;
            3 ) message="cannot interpret an IP number" ;;
            4 ) message="cannot interpret a switch port" ;;
            10 ) message="does not fit into category of VCS, IPMP or non-IPMP group" ;;
         esac
         echo "  $line <-- $message"
         nettxterror=1
      fi
   fi
done
# check that all the VLAN lines are present
for i in $(grep "^`uname -n`" $NET |cut -f3 |sort -u); do
   if ! grep "^$i " $NET >/dev/null; then
      echo "WARNING: cannot find a VLAN line for VLAN $i"
      nettxterror=1
   fi
done
if [ $nettxterror -eq 1 ]; then
   echo "ERROR: the $NET file is corrupt"
   exit 1
fi
# extract expected UMI hostname, Telstra hostname, default gateway & primary IP address
if grep ^$IBMHOST $NET >/dev/null 2>&1; then
   set -- `grep ^$IBMHOST $NET |head -1`
   ibmhost=$1
   telhost=$2
   telhost=`echo $telhost |tr "[:upper:]" "[:lower:]"`
   primaryip=${13}
   defrout=${14}
else
   echo "WARNING: cannot find local hostname in $NET"
   exit 1
fi
# check that a primary IP address & default gateway was extracted
while ! echo $primaryip |egrep "^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$" >/dev/null; do
   echo "WARNING: the primary IP address \"$primaryip\" is invalid"
   echo "Please enter a valid primary IP address for this host: \c"
   read primaryip
done
while ! echo $defrout |egrep "^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$" >/dev/null; do
   echo "WARNING: the default router address \"$defrout\" is invalid"
   echo "Please enter a valid default router address for this host: \c"
   read defrout
done
}
check_input_data() {
kstat -p 2>/dev/null >$KSTAT
# present the data & summarise proposed changes
clear
 (
echo "Please check the following input data against the latest release of the buildsheet:\n"
count=1
hostinfoprinted=0
grep ^$IBMHOST $NET |
while read \
   ibmhostdiscard \
   telhostdiscard \
   vlanid \
   vip \
   if1name \
   if1addr \
   if1mode \
   if1port \
   if2name \
   if2addr \
   if2mode \
   if2port \
   primaryipdiscard \
   defroutdiscard
do
   if1port=`echo $if1port |sed -e 's/_/ /g'`
   if2port=`echo $if2port |sed -e 's/_/ /g'`
   set -- `grep "^$vlanid " $NET` # this line contains a tab
   ntwid=$2
   ntmsk=$3
   ntmsk=`netmasks |grep $ntmsk |awk '{print $3}'`
   [ -z $ntmsk ] && ntmsk=n/a
   descr=$4
   descr=`echo $descr |sed -e 's/_/ /g'`
   if [ $hostinfoprinted -eq 0 ]; then
# these global vars are set in the main section
cat <<EOF
UMI Hostname = $ibmhost
Telstra Hostname = $telhost
Default Router = $defrout
Primary IP Address = $primaryip
EOF
      hostinfoprinted=1
   fi
cat <<EOF |grep -v n/a
$count.
VIP address = $vip
Interface 1 name = $if1name
Interface 1 address = $if1addr
Interface 1 mode = $if1mode
Interface 1 port = $if1port
Interface 2 name = $if2name
Interface 2 address = $if2addr
Interface 2 mode = $if2mode
Interface 2 port = $if2port
Network information for this interface or IPMP group:
   VLAN ID = $vlanid
   Network Address = $ntwid
   Netmask = $ntmsk
   Network Description = $descr
EOF
   (( count = count + 1 ))
done
echo "\nPress 'q' to continue."
 ) |less
cont
}
check_link_speeds() {
# check the link speeds
echo "\nChecking link speeds and duplex settings:\n"
grep ^$IBMHOST $NET |
while read \
   ibmhostdiscard \
   telhostdiscard \
   vlanid \
   vip \
   if1name \
   if1addr \
   if1mode \
   if1port \
   if2name \
   if2addr \
   if2mode \
   if2port \
   primaryipdiscard \
   defroutdiscard
do
   if1type=`echo $if1name |sed -e 's/[0-9][0-9]*$//'`
   if1num=`echo $if1name |sed -e 's/^.*\([0-9][0-9]*\)$/\1/'`
   if2type=`echo $if2name |sed -e 's/[0-9][0-9]*$//'`
   if2num=`echo $if2name |sed -e 's/^.*\([0-9][0-9]*\)$/\1/'`
   if [ "$vip" = n/a ]; then
      echo "Checking $if1name (VLAN $vlanid)"
      egrep "^${if1type}.*link_(speed|duplex)" $KSTAT |grep "$if1num:"
   else
      echo "Checking $if1name (VLAN $vlanid)"
      egrep "^${if1type}.*link_(speed|duplex)" $KSTAT |grep "$if1num:"
      echo "Checking $if2name (VLAN $vlanid)"
      egrep "^${if2type}.*link_(speed|duplex)" $KSTAT |grep "$if2num:"
   fi
done
cont
}
unplumb_all_interfaces() {
ifs=`cat $KSTAT |egrep "^(ce|e1000g|nxge):" |awk -F: '{print $1$2}' |uniq`
if [ $MODE != force ]; then
   echo "\nAbout to unplumb the following interfaces:"
   echo "$ifs"
   echo "\nWARNING: this is your LAST chance to abort without making changes!"
   cont
fi
for i in $ifs; do
if ! grep "mtu  *9000" /etc/hostname.$i >/dev/null 2>&1; then
   [ -a /etc/hostname.$i ] && cp -ip /etc/hostname.$i /var/tmp/hostname.$i.$DATE
   [ -a /etc/hostname.$i ] && rm /etc/hostname.$i
   ifconfig $i unplumb 2>/dev/null
else
   echo "Skipping EBR interface /etc/hostname.$i..."
fi
done
}
configure_interfaces() {
echo "\nConfiguring ethernet interfaces ..."
grep ^$IBMHOST $NET |
while read \
   ibmhostdiscard \
   telhostdiscard \
   vlanid \
   vip \
   if1name \
   if1addr \
   if1mode \
   if1port \
   if2name \
   if2addr \
   if2mode \
   if2port \
   primaryipdiscard \
   defroutdiscard
do
   shost=`echo $ibmhost |sed 's/..\(........\)...../\1/'`
   group=${shost}-${vlanid}
   if1port=`echo $if1port |sed -e 's/_/ /g'`
   if2port=`echo $if2port |sed -e 's/_/ /g'`
   set -- `grep "^$vlanid " $NET` # this line contains a tab
   ntwid=$2
   [ $ntwid = non_tcp/ip ] && ntwid=n/a
   [ $ntwid = N/A ] && ntwid=n/a
   ntmsk=$3
   ntmsk=`netmasks |grep $ntmsk |awk '{print $2}'`
   descr=$4
   descr=`echo $descr |sed -e 's/_/ /g'`
   case $vip-$if1addr-$if2addr in
      n/a-n/a-n/a )
         echo "Skipping VCS interface $if1name ..."
         ;;
      n/a-* )
         echo "Configuring $if1name with $if1addr/$ntmsk ..."
         no_ipmp $if1name $if1addr $ntmsk
         ;;
      * )
         echo "Configuring $if1name with $if1addr/$ntmsk [$if1mode] and $if2name with $if2addr/$ntmsk [$if2mode] and assigning VIP address $vip/$ntmsk [group name is $group] ..."
         ipmp $if1name $if1addr $if1mode $if2name $if2addr $if2mode $vip $group $ntmsk
         ;;
   esac
done
}
no_ipmp() {
if=$1
ip=$2
ntmsk=$3
   if ! grep "mtu  *9000" /etc/hostname.$if >/dev/null 2>&1; then
cat <<EOF >/etc/hostname.$if
$ip/$ntmsk broadcast + mtu 1500 up
EOF
ifconfig $if plumb
ifconfig $if $ip/$ntmsk broadcast + mtu 1500 up
   else
echo "WARNING: interface /etc/hostname.$i appears to be configured for EBR."
echo "Skipping interface /etc/hostname.$i..."
   fi
}
ipmp() {
if1name=$1
if1addr=$2
if1mode=$3
if2name=$4
if2addr=$5
if2mode=$6
vipip=$7
group=$8
ntmsk=$9
   if ! egrep "mtu  *9000" /etc/hostname.$if1name >/dev/null 2>&1 && \
      ! egrep "mtu  *9000" /etc/hostname.$if2name >/dev/null 2>&1; then
   case $if1mode in
   active )
cat <<EOF >/etc/hostname.$if1name
$if1addr/$ntmsk broadcast + group $group deprecated -failover mtu 1500 up
addif $vipip/$ntmsk broadcast + failover mtu 1500 up
EOF
cat <<EOF >/etc/hostname.$if2name
$if2addr/$ntmsk broadcast + group $group deprecated -failover mtu 1500 up
EOF
ifconfig $if1name plumb
ifconfig $if1name $if1addr/$ntmsk broadcast + group $group deprecated -failover mtu 1500 up addif $vipip/$ntmsk broadcast + failover mtu 1500 up
ifconfig $if2name plumb
ifconfig $if2name $if2addr/$ntmsk broadcast + group $group deprecated -failover mtu 1500 up
   ;;
   passive )
cat <<EOF >/etc/hostname.$if2name
$if2addr/$ntmsk broadcast + group $group deprecated -failover mtu 1500 up
addif $vipip/$ntmsk broadcast + failover mtu 1500 up
EOF
cat <<EOF >/etc/hostname.$if1name
$if1addr/$ntmsk broadcast + group $group deprecated -failover mtu 1500 up
EOF
ifconfig $if2name plumb
ifconfig $if2name $if2addr/$ntmsk broadcast + group $group deprecated -failover mtu 1500 up addif $vipip/$ntmsk broadcast + failover mtu 1500 up
ifconfig $if1name plumb
ifconfig $if1name $if1addr/$ntmsk broadcast + group $group deprecated -failover mtu 1500 up
   ;;
   esac
   else
echo "WARNING: one of /etc/hostname.$if1name or /etc/hostname.$if2name appears to be configured for EBR."
echo "Skipping IPMP group /etc/hostname.$if1name and /etc/hostname.$if2name..."
   fi
}
hup_mpath() {
pid=`ps -ef |grep /usr/lib/inet/in.mpathd |grep -v grep |awk '{print $2}'`
if [ ! -z $pid ]; then
   echo "Sending HUP signal to /usr/lib/inet/in.mpathd ..."
   kill -HUP $pid
else
   echo "WARNING: in.mpathd does NOT appear to be running, skipping ..."
fi
}
recreate_hosts_file() {
# build the hosts file
echo "Reconfiguring /etc/inet/hosts ..."
cp -p /etc/inet/hosts /var/tmp/hosts.$DATE
header=0
grep ^$IBMHOST $NET |
while read \
   ibmhostdiscard \
   telhostdiscard \
   vlanid \
   vip \
   if1name \
   if1addr \
   if1mode \
   if1port \
   if2name \
   if2addr \
   if2mode \
   if2port \
   primaryipdiscard \
   defroutdiscard
do
   if1port=`echo $if1port |sed -e 's/_/ /g'`
   if2port=`echo $if2port |sed -e 's/_/ /g'`
   set -- `grep "^$vlanid " $NET` # this line contains a tab
   ntwid=$2
   [ $ntwid = non_tcp/ip ] && ntwid=n/a
   [ $ntwid = N/A ] && ntwid=n/a
   ntmsk=$3
   ntmsk=`netmasks $ntmsk`
   descr=$4
   descr=`echo $descr |sed -e 's/_/ /g'`
   # abbreviated IBM hostname
   shost=`echo $ibmhost |sed 's/..\(........\)...../\1/'`
   if [ $header -eq 0 ]; then
cat <<EOF >/etc/inet/hosts
#
# Internet host table
#
127.0.0.1$(echo \\t)localhost
$primaryip$(echo \\t)$ibmhost ${ibmhost}.in.telstra.com.au $telhost $shost loghost
EOF
      header=1
   fi
   case $vip-$if1addr-$if2addr in
      n/a-n/a-n/a )
         echo "# $if1name is connected to $if1port on VLAN $vlanid ($descr)" >>/var/tmp/hosts.tmp
         ;;
      n/a-* )
         echo "$if1addr\\t$shost-$vlanid"                                    >>/etc/inet/hosts
         echo "# $if1name is connected to $if1port on VLAN $vlanid ($descr)" >>/var/tmp/hosts.tmp
         ;;
      * )
         echo "$vip\\t$shost-$vlanid"                                        >>/etc/inet/hosts
         echo "# $if1name is connected to $if1port on VLAN $vlanid ($descr)" >>/var/tmp/hosts.tmp
         echo "# $if2name is connected to $if2port on VLAN $vlanid ($descr)" >>/var/tmp/hosts.tmp
         ;;
   esac
done
cat /var/tmp/hosts.tmp >>/etc/inet/hosts
rm /var/tmp/hosts.tmp
cat <<EOF >>/etc/inet/hosts
# administration servers
192.74.189.172$(echo \\t)nus808.in.telstra.com.au nus808
146.132.8.23$(echo \\t)nus721.in.telstra.com.au nus721
172.15.12.5$(echo \\t)nus022.telecom.com.au nus022
EOF
}
recreate_netmasks_file() {
echo "Reconfiguring /etc/inet/netmasks ..."
cp -ip /etc/inet/netmasks /var/tmp/netmasks.$DATE
# configure netmasks file
cat <<EOF >/etc/inet/netmasks
#
# The netmasks file associates Internet Protocol (IP) address
# masks with IP network numbers.
#
#       network-number  netmask
#
# The term network-number refers to a number obtained from the Internet Network
# Information Center.
#
# Both the network-number and the netmasks are specified in
# "decimal dot" notation, e.g:
#
#               128.32.0.0 255.255.255.0
#
EOF
}
configure_defaultrouter() {
if [ -z "$defrout" ]; then
   echo "WARNING: no default route specified in input data, skipping ..."
   return
fi
echo "Reconfiguring /etc/defaultrouter ..."
if netstat -rn |grep ^default >/dev/null; then
   defaultroutes=`netstat -rn |awk '$1 == "default" {print $2}'`
   for gateway in $defaultroutes; do
      echo "Deleting default route $gateway ..."
      route delete default -gateway $gateway >/dev/null 2>&1
   done
fi
cp -ip /etc/defaultrouter /var/tmp/defaultrouter.$DATE
echo $defrout >/etc/defaultrouter
echo "Adding $defrout as the default route ..."
route add default -gateway $defrout >/dev/null 2>&1
}
check_nodename() {
echo "Checking /etc/nodename ..."
if [ "`cat /etc/nodename`" != "$ibmhost" ]; then
echo "/etc/nodename not correct, set this manually ..."
fi
}
update_permissions() {
echo "Resetting permissions on"
for i in /etc/inet/hosts /etc/inet/netmasks /etc/defaultrouter /etc/hostname.*
do
echo "  $i ..."
chmod 444 $i
chown root:root $i
done
}
###
###
### MAIN
###
###
MODE=normal
[ "$1" = -h ] && usage
[ "$1" = -n -o "$1" = -nocheck ] && MODE=nocheck
[ "$1" = -f -o "$1" = -force ] && MODE=force
[ "$1" = -clean_net ] && MODE=clean_net
[ "$1" = -check_net ] && MODE=check_net
[ ! -a ./net.txt ] && {
   echo "ERROR: can't find 'net.txt'. Please save the file 'net.txt'"
   echo "   in the current directory and re-run $0."
   echo "Goodbye."
   exit 1
}
IBMHOST=`uname -n`
NET=net.txt
KSTAT=/tmp/kstat.txt
DATE=`date +20%y%m%d%H%M%S`
case $MODE in
clean_net )
clean_net $2
echo "Saved net.txt.local.$2 ..."
echo "Goodbye."
exit 0
;;
check_net )
check_net_txt
exit 0
;;
* )
clean_net
check_net_txt
[ $MODE = normal ] && check_input_data
[ $MODE = normal ] && check_link_speeds
unplumb_all_interfaces
configure_interfaces
hup_mpath
configure_defaultrouter
recreate_hosts_file
recreate_netmasks_file
check_nodename
update_permissions
echo "\nNOTE: Original files were saved as /var/tmp/*.$DATE files."
echo "All done.\n"
;;
esac
# end of script

$ cat net.txt
unix-server-1 unix-server-1 2300_9/10 n/a e1000g1 130.103.248.110 active CPOD1_Access9,_6/15 n/a n/a n/a n/a n/a -
unix-server-1 unix-server-1 3994_7/8 n/a NET_MGT 10.0.3.177 active CPOD1_Access7,_12/11 n/a n/a n/a n/a n/a -
unix-server-2 unix-server-2 2300_9/10 n/a e1000g1 130.103.248.111 active CPOD1_Access9,_6/16 n/a n/a n/a n/a n/a -
unix-server-2 unix-server-2 3994_7/8 n/a NET_MGT 10.0.3.178 active CPOD1_Access7,_12/12 n/a n/a n/a n/a n/a -
...

I am sure you can setup similar things for your plant.

How To Find port_wwn And node_wwn Of A Storage Device


Once the storage device is provisioned to UNIX boxes a storage engineer typically looks for the port_wwn and node_wwn numbers of the HBA to correctly slice and dice the storage device for your server needs. The problem with the situation is that most of the UNIX engineers expect to see the device and then figure out its details. However, the device is not made available to run-level-3 unless it is correctly configured by the storage engineers.
But there is something that a UNIX engineer can do to figure out the port_wwn and node_wwn numbers. Once the HBA device is connected these identification numbers are available in single user mode.

{0} ok probe-scsi-all
/pci@0/pci@0/pci@9/SUNW,emlxs@0
Cannot Init Link.
/pci@0/pci@0/pci@8/pci@0/pci@8/SUNW,emlxs@0
Cannot Init Link.
/pci@0/pci@0/pci@2/scsi@0
MPT Version 1.05, Firmware Version 1.27.00.00
Target 0
Unit 0   Disk     SEAGATE ST914603SSUN146G0768    286739329 Blocks, 146 GB
  SASAddress 5000c5000aca0491  PhyNum 0
Target 1
Unit 0   Disk     SEAGATE ST914603SSUN146G0768    286739329 Blocks, 146 GB
  SASAddress 5000c5000aca06b1  PhyNum 1
/pci@0/pci@0/pci@1/pci@0/pci@1/pci@0/usb@0,2/storage@2
  Unit 0   Removable Read Only device    TSSTcorpCD/DVDW TS-T632ASR03
{0} ok cd /pci@0/pci@0/pci@9/SUNW,emlxs@0
{0} ok pwd
/pci@0/pci@0/pci@9/SUNW,emlxs@0
{0} ok .properties
assigned-addresses       82120010 00000000 0b000000 00000000 00002000
                         82120018 00000000 0b002000 00000000 00002000
                         81120020 00000000 00004000 00000000 00000100
                         82120030 00000000 0b040000 00000000 00040000
port_wwn                 10 00 00 00 c9 96 ad 38
node_wwn                 20 00 00 00 c9 96 ad 38
alternate-reg            01120020 00000000 00000000 00000000 00000100
reg                      00120000 00000000 00000000 00000000 00000000
                         03120010 00000000 00000000 00000000 00001000
                         03120018 00000000 00000000 00000000 00000100
                         02120030 00000000 00000000 00000000 00020000
compatible               pci10df,fc20
clock-frequency          02625a00
#size-cells              00000000
#address-cells           00000002
copyright                Copyright (c) 2000-2006 Emulex
model                    LPe11000-S
name                     SUNW,emlxs
device_type              scsi-fcp
manufacturer             Emulex
fcode-version            1.50a9
fcode-rom-offset         0000c000
interrupts               00000001
cache-line-size          00000010
class-code               000c0400
subsystem-id             0000fc21
subsystem-vendor-id      000010df
revision-id              00000002
device-id                0000fc20
vendor-id                000010df
{0} ok
And there you go. Its right in front of you eyes. Communicate the same and make their life a little easier.

IO Wait Issues With Oracle Database


The following log illustrates a typical server hung / database hung state when multiple read writes are on going on the database. A simple command lsof - list of open files can help you figure it out. You will observe that the same process id is writing / reading from multiple locations, of huge sizes and the system overall running at snails pace.
unix_server1# lsof /ora/data
COMMAND   PID   USER   FD   TYPE    DEVICE   SIZE/OFF  NODE NAME
oracle    701 oracle  256u  VREG 273,58002    3727360 12302 /ora/data/RPRO1P/RPRO1Pcntl01.dbf
oracle    701 oracle  257u  VREG 273,58002    3727360 12303 /ora/data/RPRO1P/RPRO1Pcntl02.dbf
oracle    701 oracle  258u  VREG 273,58002    3727360 12304 /ora/data/RPRO1P/RPRO1Pcntl03.dbf
oracle    701 oracle  259u  VREG 273,58002  262152192 12308 /ora/data/RPRO1P/RPRO1Psystem01.dbf
oracle    701 oracle  260u  VREG 273,58002  157294592 12309 /ora/data/RPRO1P/RPRO1Ptools01.dbf
oracle    701 oracle  261u  VREG 273,58002 1048584192 12307 /ora/data/RPRO1P/RPRO1Preqpro_tbl01.dbf
oracle    701 oracle  262u  VREG 273,58002 1048584192 12306 /ora/data (/dev/vx/dsk/app-dg/vol3)
oracle    701 oracle  263u  VREG 273,58002   52436992 12305 /ora/data/RPRO1P/RPRO1Pgenuser.dbf
oracle    795 oracle  256u  VREG 273,58002  524296192 12297 /ora/data/CQST1P/CQST1Psystem01.dbf
oracle    795 oracle  257u  VREG 273,58002  209723392 12298 /ora/data/CQST1P/CQST1Ptools01.dbf
oracle    795 oracle  258u  VREG 273,58002   52436992 12293 /ora/data/CQST1P/CQST1Pcqmaster_tbl01.dbf
oracle    795 oracle  259u  VREG 273,58002   52436992 12292 /ora/data/CQST1P/CQST1Pcqmaster_idx01.dbf
oracle    795 oracle  260u  VREG 273,58002   52436992 12296 /ora/data/CQST1P/CQST1Pgenuser.dbf
oracle    795 oracle  261u  VREG 273,58002    1662976 12289 /ora/data/CQST1P/CQST1Pcntl01.dbf
oracle    795 oracle  262u  VREG 273,58002    1662976 12290 /ora/data/CQST1P/CQST1Pcntl02.dbf
oracle    795 oracle  263u  VREG 273,58002    1662976 12291 /ora/data/CQST1P/CQST1Pcntl03.dbf
oracle   1971 oracle  256u  VREG 273,58002  524296192  6260 /ora/data/CADT1D/CADT1Dsystem01.dbf
oracle   1971 oracle  257u  VREG 273,58002  209723392  6263 /ora/data/CADT1D/CADT1Dtools01.dbf
oracle   1971 oracle  258u  VREG 273,58002  524296192  6261 /ora/data/CADT1D/CADT1Dcqmaster_tbl01.dbf
You know now, when it's time to call your DBA.

Must PreCheck Before Every Change


Before doing any turnover it is very important to prepare for any eventuality and that is why you should be prepared with backout steps and procedures. However, to successfully backout you must be aware of the current state of the system. I wrote a small handy script with a bunch of commands that helps me everytime to grab the current state of the system before I go in for any change. Put  all of them in a script - precheck.sh

mkdir /var/tmp/PRE_CHANGE_data.`hostname`
cd /var/tmp/PRE_CHANGE_data.`hostname`
/usr/sbin/metastat > metastat.txt
/usr/sbin/metastat -p  > metastat-p.txt
/usr/sbin/metadb > metadb.txt
df -k > df-k.txt
for x in `/usr/sbin/metastat d0 | grep No | awk '{print $1}'`; do  /usr/sbin/prtvtoc /dev/dsk/$x > prtvtoc.$x.txt; done
/usr/sbin/eeprom | sort > eeprom.txt
/usr/local/bin/inq > inq.txt
cp -ip /etc/path_to_inst path_to_inst.txt
/usr/bin/netstat -rnv > netstat-rnv.txt
/usr/platform/sun4u/sbin/prtdiag -v > prtdiag-v.txt
cp -ip /etc/system system.txt
cp -ip /etc/vfstab vfstab.txt
/usr/sbin/mount > mount.txt
/usr/sbin/dumpadm > dumpadm.txt
ifconfig -a > ifconfig-a.txt
swap -l > swap-l.txt
kstat -p ce:::"/^link_/" > kstat-p_ce.txt
ps -ef > ps-ef.txt
netstat -a > netstat-a.txt
netstat -an > netstat-an.txt
ptree > ptree.txt
showrev -p > showrev-p.txt
egrep "^TZ" /etc/default/init > TZ.txt
zdump -v `egrep "^TZ" /etc/default/init | cut -d= -f2` | grep 2008 > zdump_2008.txt
uptime > uptime.txt
who -r > who-r.txt
cp -p /var/spool/cron/crontabs/root crontab_root.txt
/usr/sbin/vxprint > vxprint.txt
/usr/sbin/vxdisk list > vxdisk_list.txt
/usr/sbin/vxdg list > vxdg_list.txt
cp -p /kernel/drv/scsi_vhci.conf kernel-drv-scsi_vhci.conf
ps -ef|grep pmon > pmon.txt
metastat -p d0 > d0.txt
prtconf -pv |grep -i boot > prtconf_boot.txt
echo|format > format.txt
/opt/VRTSvcs/bin/hastatus -summ > hastatus.txt

Feel free to add your own commands to the list and please do share it with us all. This may be different for different flavors of UNIX.

Steps To Verify Netbackup Master Client Setup


Netbackup setup can be tedious as it is required to be done on per client basis. However, it is fairly straight forward. You just have to take care of few basic steps to ensure your netbackup client farm stands successfully setup.

First, check the version of the netbackup you are running.
# cat /usr/openv/netbackup/bin/version
Next, ensure that you have master server - client connectivity with proper DNS entries for all the necessary daemons and processes doing the handshake.

# nslookup client_name
# nslookup ipaddress_of_client
# ping -s client_name
# telnet client_name bpcd
# telnet client_name 13782
# telnet client_name vnetd
# telnet client_name 13724

If not already done - create a netbackup client
# sudo /usr/openv/netbackup/bin/admincmd/bpclient -client <client_name> -add -connect_options 2 2 3

Check the SERVER entries in the clients bp.conf file, ensuring they match what is listed in the activation guide for that operating system. The list of entries required varies, dependent on the data center and vlan the ebr interface is assigned to.

If the entries don't match what is needed, contact the system administrator assigned to the Project.

On client_name this is accomplished by checking the entries within the following file on the client:


# cat /usr/openv/netbackup/bp.conf


On client_name  within the NetBackup GUI, expand NetBackup Management → Host Properties → Clients

1. Select the client client_name  in the right hand pane.
2. Right click on this entry and select Properties.
3. In the displayed Properties dialog box, select “Servers” in the left hand side and check the list of servers displayed against the list of servers that should be present for this environment.

To ensure that the NetBackup Master Server can correctly communicate with the clients, run the following listed command on the respective cluster :

client_name> /usr/openv/netbackup/bin/bpclntcmd –pn

Send the output from these commands to the backup engineer resource to verify the command was successful.

Run this from the command prompt on the relevant client and verify that the “bpcd”, “bprestore”and “user_ops” , “bprd”, “vnetd”, “bphdb”, “bpdbsbora”, “dbclient” directories exist

ls –la  "/usr/openv/netbackup/logs"

If the directories are not in place, create them. And you are done here.