Friday, March 22, 2013

Solaris: Fibre Channel - device LUN cleanup on Solaris



Procedure to assist in removing Fiber Channel devices (on Solaris 10)
Gathering all the device information for failing devices and formulating the appropriate commands can be cumbersome.  We have an in-house tool to assist in the process that is non-destructive.  The tool will show the commands that you should use.  To use the tool simple run as root:    /install/veritas/vxvm/fc_show

 

Prereq for Veritas:    Remove device from VxFS/VxVM first . 

    If devices are under Veritas conrol:
  • first unmount the affected underlying VxFS filesystems . 
  • then remove associated disk from VxVM volume manager:   "vxdisk rm" on the device

 

General Procedure to remove devices SAN-attached storage

Description
When storage devices that present multiple luns to Solaris[TM] through a Storage Area Network(SAN) have some of those luns removed or made unavailable, Solaris device entries will still exist for those luns.
However Solaris may report "missing" or "failing" states for those luns. 
This document explains how to clean up the device entries and hence remove the error condition which is 
caused by Solaris trying to access the unavailable luns. 
This applies to Solaris 8 and Solaris 9 using the Sun StorEdge[TM] San Foundation Kit (SFK), 
also known as the Leadville driver stack. This document is specific to SAN-attached fibre channel storage and 
does not apply to direct-attached fibre channel storage.
Steps to Follow
The following commands will be presented:
- cfgadm -c configure [ap_id]
- cfgadm -al -o show_FCP_dev
- cfgadm -o unusable_FCP_dev -c unconfigure [ap_id]
- devfsadm -C
- ("luxadm -e offline " may also be needed)
The following output shows a system with 4 dual-pathed luns which are SAN-attached to a Solaris host:
cfgadm -al -o show_FCP_dev
Ap_Id Type Receptacle Occupant Condition
c2 fc-fabric connected configured unknown
c2::50060e8004274d20,0 disk connected configured unknown
c2::50060e8004274d20,1 disk connected configured unknown
c2::50060e8004274d20,2 disk connected configured unknown
c2::50060e8004274d20,3 disk connected configured unknown
c3 fc-fabric connected configured unknown
c3::50060e8004274d30,0 disk connected configured unknown
c3::50060e8004274d30,1 disk connected configured unknown
c3::50060e8004274d30,2 disk connected configured unknown
c3::50060e8004274d30,3 disk connected configured unknown
format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
(output omitted for clarity)
4. c2t50060E8004274D20d0
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,0
5. c2t50060E8004274D20d1
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,1
6. c2t50060E8004274D20d2
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,2
7. c2t50060E8004274D20d3
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,3
8. c3t50060E8004274D30d0
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,0
9. c3t50060E8004274D30d1
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,1
10. c3t50060E8004274D30d2
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,2
11. c3t50060E8004274D30d3
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,3
In this example, using the native tools of the storage device, we will remove all of the odd numbered luns. Here the storage is a Sun StorEdge[TM] 9990, so we used Storage Navigator to remove the lun mappings from the host.
The following output shows the same system after the luns have been removed:
cfgadm -al -o show_FCP_dev
Ap_Id Type Receptacle Occupant Condition
c2 fc-fabric connected configured unknown
c2::50060e8004274d20,0 disk connected configured unknown
c2::50060e8004274d20,1 disk connected configured failing
c2::50060e8004274d20,2 disk connected configured unknown
c2::50060e8004274d20,3 disk connected configured failing
c3 fc-fabric connected configured unknown
c3::50060e8004274d30,0 disk connected configured unknown
c3::50060e8004274d30,1 disk connected configured failing
c3::50060e8004274d30,2 disk connected configured unknown
c3::50060e8004274d30,3 disk connected configured failing
format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
(output omitted for clarity)
4. c2t50060E8004274D20d0
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,0
5. c2t50060E8004274D20d1
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,1
6. c2t50060E8004274D20d2
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,2
7. c2t50060E8004274D20d3
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,3
8. c3t50060E8004274D30d0
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,0
9. c3t50060E8004274D30d1
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,1
10. c3t50060E8004274D30d2
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,2
11. c3t50060E8004274D30d3
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,3
We can now see above, that "cfgadm -al -o show_FCP_dev" shows the "failing" state and format shows the device as "".
The first step in removing these devices is to change the state shown in the cfgadmoutput from "failing" to "unusable". This is done with the following command:
cfgadm -c configure c2 c3
cfgadm -al -o show_FCP_dev
Ap_Id Type Receptacle Occupant Condition
c2 fc-fabric connected configured unknown
c2::50060e8004274d20,0 disk connected configured unknown
c2::50060e8004274d20,1 disk connected configured unusable
c2::50060e8004274d20,2 disk connected configured unknown
c2::50060e8004274d20,3 disk connected configured unusable
c3 fc-fabric connected configured unknown
c3::50060e8004274d30,0 disk connected configured unknown
c3::50060e8004274d30,1 disk connected configured unusable
c3::50060e8004274d30,2 disk connected configured unknown
c3::50060e8004274d30,3 disk connected configured unusable
Possible extra step:
If the devices remaining in a "failing" state according to the above output fromcfgadm, and they do not move to an "unusable" state after running "cfgadm -c configure" as shown above, then the following command can also be tried:
luxadm -e offline /dev/dsk/c3t50060E8004274D30d3s2
(i.e. "luxadm -e offline )
Then re-run the previous cfgadm command (cfgadm -al -o show_FCP_dev) to check that the LUN state has changed from "failing" to "unusable". This luxadm command should then be repeated for each LUN which was previously shown in the "failing" state by cfgadm. Then carry on with the process below.
--oOo--
Now that the state of the inaccessible luns has been changed to "unusable" in the output from cfgadm, we can remove those entries from the list with the following command:
cfgadm -o unusable_FCP_dev -c unconfigure c2::50060e8004274d20
cfgadm -o unusable_FCP_dev -c unconfigure c3::50060e8004274d30

- If you tried remove device and you got some errors below, you'll use this:
# cfgadm -o unusable_FCP_dev -c unconfigure c3::50060e8004274d30
cfgadm: Library error: failed to offline: /devices/scsi_vhci/ssd@g600015d00005cc00000000000000f166
                    Resource                             Information    
------------------------------------------------  -------------------------
/dev/dsk/c6t600015D00005CC00000000000000F166d0s2  Device being used by VxVM
cfgadm -f -o unusable_FCP_dev -c unconfigure c3::50060e8004274d30

cfgadm -la -o show_FCP_dev
Ap_Id Type Receptacle Occupant Condition
c2 fc-fabric connected configured unknown
c2::50060e8004274d20,0 disk connected configured unknown
c2::50060e8004274d20,2 disk connected configured unknown
c3 fc-fabric connected configured unknown
c3::50060e8004274d30,0 disk connected configured unknown
c3::50060e8004274d30,2 disk connected configured unknown
format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
(output omitted for clarity)
4. c2t50060E8004274D20d0
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,0
5. c2t50060E8004274D20d2
/pci@23c,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e8004274d20,2
6. c3t50060E8004274D30d0
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,0
7. c3t50060E8004274D30d2
/pci@23c,600000/SUNW,qlc@1,1/fp@0,0/ssd@w50060e8004274d30,2
Now we see that the luns are no longer displayed in the format listing.
Even though the output of the format command looks good, there are still entries for the removed devices in /dev/disk and /dev/rdsk. These can be removed if desired, by using the devfsadm command.
ls /dev/dsk/c2t50060E8004274D20d*
/dev/dsk/c2t50060E8004274D20d0s0 /dev/dsk/c2t50060E8004274D20d2s4
/dev/dsk/c2t50060E8004274D20d0s1 /dev/dsk/c2t50060E8004274D20d2s5
/dev/dsk/c2t50060E8004274D20d0s2 /dev/dsk/c2t50060E8004274D20d2s6
/dev/dsk/c2t50060E8004274D20d0s3 /dev/dsk/c2t50060E8004274D20d2s7
/dev/dsk/c2t50060E8004274D20d0s4 /dev/dsk/c2t50060E8004274D20d3s0
/dev/dsk/c2t50060E8004274D20d0s5 /dev/dsk/c2t50060E8004274D20d3s1
/dev/dsk/c2t50060E8004274D20d0s6 /dev/dsk/c2t50060E8004274D20d3s2
/dev/dsk/c2t50060E8004274D20d0s7 /dev/dsk/c2t50060E8004274D20d3s3
/dev/dsk/c2t50060E8004274D20d1s0 /dev/dsk/c2t50060E8004274D20d3s4
/dev/dsk/c2t50060E8004274D20d1s1 /dev/dsk/c2t50060E8004274D20d3s5
/dev/dsk/c2t50060E8004274D20d1s2 /dev/dsk/c2t50060E8004274D20d3s6
/dev/dsk/c2t50060E8004274D20d1s3 /dev/dsk/c2t50060E8004274D20d3s7
/dev/dsk/c2t50060E8004274D20d1s4 /dev/dsk/c2t50060E8004274D20d4s0
/dev/dsk/c2t50060E8004274D20d1s5 /dev/dsk/c2t50060E8004274D20d4s1
/dev/dsk/c2t50060E8004274D20d1s6 /dev/dsk/c2t50060E8004274D20d4s2
/dev/dsk/c2t50060E8004274D20d1s7 /dev/dsk/c2t50060E8004274D20d4s3
/dev/dsk/c2t50060E8004274D20d2s0 /dev/dsk/c2t50060E8004274D20d4s4
/dev/dsk/c2t50060E8004274D20d2s1 /dev/dsk/c2t50060E8004274D20d4s5
/dev/dsk/c2t50060E8004274D20d2s2 /dev/dsk/c2t50060E8004274D20d4s6
/dev/dsk/c2t50060E8004274D20d2s3 /dev/dsk/c2t50060E8004274D20d4s7
devfsadm -C
ls /dev/dsk/c2t50060E8004274D20d*
/dev/dsk/c2t50060E8004274D20d0s0 /dev/dsk/c2t50060E8004274D20d2s0
/dev/dsk/c2t50060E8004274D20d0s1 /dev/dsk/c2t50060E8004274D20d2s1
/dev/dsk/c2t50060E8004274D20d0s2 /dev/dsk/c2t50060E8004274D20d2s2
/dev/dsk/c2t50060E8004274D20d0s3 /dev/dsk/c2t50060E8004274D20d2s3
/dev/dsk/c2t50060E8004274D20d0s4 /dev/dsk/c2t50060E8004274D20d2s4
/dev/dsk/c2t50060E8004274D20d0s5 /dev/dsk/c2t50060E8004274D20d2s5
/dev/dsk/c2t50060E8004274D20d0s6 /dev/dsk/c2t50060E8004274D20d2s6
/dev/dsk/c2t50060E8004274D20d0s7 /dev/dsk/c2t50060E8004274D20d2s7



source: http://xteams.oit.ncsu.edu/iso/lun_removal

No comments:

UNIX: How to print column nicely using printf

[user@hostfwnms1-oam tmp]# cat b.sh printf "%-26s %-19s %-8s %-8s %-s %-s\n" HOSTNAME IP PING SNMPWALK 0-ok 1-fail for i in `cat n...