Mirror Disk Action

DR Exercise - Mirrored Disks

Here is a document I wrote one evening for a friend to use in their Disaster Recovery Weekend Docs as a L1 section.



(1) Test Boot - Mirrored Disk

With a pair of mirrored root disks, you have some fail-over reliability. If one fails, the other carries on. If you dont boot, HPUX knows which is the good one, and that it is good.

(1.1) Finding which disk is which

You dont want to pull the wrong one!

(1.1.1) showboot

On the running system, run 'showboot' to see what is set as the PRIMARY path, and what is probably the ALT-PATH (not necessarily). Use ioscan -kfn to convert numbers into device names.

Dont worry, if you ever had to do this, the HP engineer would do it at the hardware level.

(1.1.2) ioscan -kfn

On the running system, (or from an archived printout), obtain the output from ioscan -kfn . It helps you convert device names to and from device numbers.

(1.1.3) Disk Activity LED

Supposing c2t2d0 is the primary path, you want to know which disk that is. Simply READ a lot from the disk, and see which light comes on, and label it with a sticky tab. Be certain though.


dd if=/dev/c2t2d0 of=/dev/null

(1.1.4) Repeat for Secondary

There is no need to do this, as you shall only remove the primary.

It is worth remembering, that the entire disk is NOT primary or secondary, but that each part of the first disk is repeated or mirrored on the second disk. We call them that for sanity, but HPUX has no such concept. It has a selector variable to pick a disk to boot from, and LVM mirroring.

(1.1.5) Find an LVOL

If you follow this test script, (on a different system), you will need to know the name of an LVOL, so that you can check the status of its PE's (Physical Extent = 4 M track, with independent versioning).

You may wish to get a list of all the LVOL's on the disk that you are removing (and also check everything on that PV is mirrored!)

vgdisplay -v /dev/vg00 will show you a list.

(1.2) Running with one disk

(1.2.1) Check for (no) messages

Run lvdisplay on a mirrored LVOL (on the disk you are looking at), and check that it says 'current' on both disks. Check syslog.

(1.2.2) Remove either disk

You can ONLY remove a disk if it is 'hot-pluggable'. ULN5 has its disks in the external Jamaica cabinets, with those grey/blue levers to pull and reseat disks.

Now remove one of the two disks, by pulling it out by 1/2"

(1.2.3) POWERFAIL messages

Check syslog, dmesg and lvdisplay. You should see error messages, saying POWERFAIL (on the disk address) but the system still runs. lvdisplay should show errors.

(1.2.4) Replace Disk

Now firmly/gently put the disk back, and check syslog/dmesg until you see the POWERFAIL/RECOVERED message. That indicates that the disk has been seen and checked by the OS, as being there and functioning correctly.

(1.2.5) PE becomes CURRENT

Quickly check lvdisplay (rerunning several times) and you will see some physical extents change from ERROR/STALE to current. That is the LVM bringing both sides up-to-date.

(1.2.6) Now Normal

When that is finished, the system is fully back to normal.

(1.3) Booting with one disk

(1.3.1) Quorum check at boot time

If you reboot, with a disk missing, HPUX cannot be certain that the disk that works, has the proper data on it, so the LVM refuses to activate the LV's. When you boot via the alternate path, it suceeds in finding, loading and running the kernel, but when it then activates LVM, it fails.

If you had three equally mirrored disks (what HP calls two mirror copies), and two disks vouched for each-other (ie showed the same data revisions), then HPUX would believe the two, and boot without the third. HPUX calls that quorum (more than half).

Since you have two equally mirrored disks (one mirror copy), when one is down, the system wont boot, unless you tell it to avoid the quorum check (using hpux -lq).

It is worth trying to see what happens when you dont type anything (ie dont boot with hpux -lq), and it only takes 5 minutes, as a full reboot is not necessary, it quickly drops back into IPL mode.

(1.3.2) Alternate Boot Path

As well as two mirrors, (of each of the LVOLS), there are two boot tracks. If you boot without the alternate disk, hpux might not notice until it tries to actives the LVM. If you boot without the primary path, hpux will notice straight away.

Find out which is the primary path by running 'showboot'. Find out which is that physical disk by reading from it and checking the LED. (See above).

(1.3.3) shutdown -h

Shutdown the machine, using shutdown -h, and remove the primary path disk, and boot. Booting takes at least 20 minutes, because the system wants to check everything.

(1.3.4) let autoboot fail

First let it try its own thing, to see what happens. It will detect the missing pri-path disk and should return to the BIOS command line.

(1.3.5) BIOS Prompt

To get here without removing that disk, press the space-bar during booting, when it says Press-Any-Key within 10 seconds. Options include:


HELP
MENU

Its important to realise that there are two levels. There is the machine BIOS, which does not come from any disk, and there is an IPL-BIOS which comes from the disk. You have not read the disk yet. (Which is just as well, since you have removed it/one of them).

You can set the PRI and ALT PAths from here, or you can leave them and boot from a named (numbered) path. You can also do that from the second stage loader, or from UNIX.

(1.3.6) BIOS Commands

SEA or 'search', will tell you the list of devices which the bios allows you to attempt to boot from. A recent copy of ioscan helps, so that you KNOW which is the tape, cdrom or disk. To boot an ignite tape, boot from that tape device.

(DO NOT RUN YET) BO ALT or 'boot alternate', will attempt to boot from the configured alternative path. BO 8/8.8.0 will attempt to boot from that SCSI controller, that LUN device. (Dont do that, now, unless you want to)

(1.3.7) ALT 8/8.8.0

The setting of the ALT PAth is not important, and may be wrong. You can change it if you wish, from here, from the IPL prompt or from a running UNIX.

(1.3.8) BO

BO or 'boot', will probably offer you a choice of PRI/ALT bootable paths (disks), and also a chance to 'interact with the IPL'. Say 'Y' if you want to specify the -lq option later. If you said BO ALT , you wont see the first two options.


Boot from primary - N
Boot from alternate - Y
Interact with IPL? - Y

(1.3.9) ISL> IPL Commands

That leaves you in the ISL BIOS, loaded from the disk. Again try HELP and MENU. hpux show autofile is like showboot, but from BIOS not from UNIX. If the ALTPATH is already set, life is a tiny bit easier, but it is often set to the tape (which you want to boot from for ignite), or the CDROM (which you want to reinstall). Commands include:


help

hpux		   # to boot as normally
hpux show autofile # like lifcp to screen
hpux -is boot      # boot to single user state
hpux -lq           # boot without quorum check
hpux -lm           # 3-38 - NO SWAP NO LVM # AVOID 
hpux /stand/vmunix.BCKUP # for the old kernel 
hpux .... 	   # see 2-7 for other kernels
hpux ....          # also combinations of options
primpath 8/8.8.0  # permanently use different boot path

DO NOT RUN: hpux -lm It will take you to maintenance mode, but you will have to reboot, so its mostly for when you think you can recover the disk and immediately reboot.

hpux -lq will boot with no quorum check, which will get your entire machine working. You can also add the -is to go into single-user state, then do an init 4 , but that might be confusing if you have no reason to do so.

If you have a second machine, up and running, try 'man hpux'

Why is ISL also called IPL ? The Initial Program Loader, loads the Initial System Loader from disk, and then runs the ISL. Its similar to the way HP-CDROMS have one printed label on the media, and another label on the carton.

(1.3.10) hpux -lq

Boot using 'hpux -lq', and the machine will come up cleanly (presuming that disk was the only one missing, and other LVOLs wont have difficulties). Otherwise add the '-is' option and figure it from there.

There will still be messages about the disk not being there, but the other mirrored parts make it non-fatal.

(1.3.11) System Now Up -- on one disk

Quickly test that Informix is functioning at all, check syslog.

(1.3.12) Replace Primary Disk

Now return the missing disk, by gently pushing the lever (over a catch). The LED will blink, and there may even be a SCSI bus reset as the controller detects it.

(1.3.13) ioscan detect disk

HPUX will not know its back, or even that it exists! You need to run ioscan , without the -k option (from kernel memory), but probably with -fn and through pg.

The HPUX kernel now knows that the device exists, but the LVM is still running without it. (I'm presuming the replacement disk is the original disk, with the old LVM info on it. If not, you will need to check vgcfgrestore in the manual pages and the Admin-Tasks-Guide -- again -- if that ever really happens the HP engineer will do that with you).

(1.3.14) vgsync

vgsync is required to tell the LVM to look-for, find and re-sync with the disk. Notice that you didn't do that when you didn't boot. You must do it this time, because the LVM has completely forgotten about the absent disk, and is not polling to find it.

(1.3.15) stale/error becomes current

lvdisplay -v /dev/vg00/lvol11 will show you lots of physical extents in error or stale, but repeated running will show you the LVM bringing each PE back into CURRENT status.

(1.3.16) System Now Up - on both mirrors

You can now proceed, or if superstitious, you can do a full normal, unattended reboot, to be sure.