From: www.itworld.com
October 31, 2001 —
Running into someone who understands the
intricacies of Unix device drivers is no longer the awe-inspiring
experience retold from the days of yore. If you were impressed by Unix
gurus who professed to write drivers using cat as a text
editor, it's time to join the real world and enjoy improvements in
kernel configuration, device mapping, and installation that have made
low-level kernel knowledge less of a necessity for the average Unix
system manager. But why dedicate a column to device numbering and
mapping in Solaris?
While installation has become much more automated, troubleshooting
remains a labor-intensive process. What do you do when you add a new
disk drive, and it begins using a device number for which your database isn't
prepared? How do you prevent device numbers from changing across
reboots, and how do you get them to change when you need to remove
hardware or replace failed components? Do you have high-availability
configurations that require identical disk device names on both
machines, even though the SCSI host adaptors are not quite identically
installed and cabled? How do you fix older or third-party applications
with hard-wired device names that fail in the brave new world of
tongue-twisting geographical device names?
This month, we're going to
put you back in charge of the hardware configuration with a tour of
the device identification and numbering process. We'll start with a
look back at how device numbers have been assigned and managed by
Unix, and how the Solaris kernel makes the process much more dynamic
-- and less deterministic at times. We'll dive a bit more deeply into
the depths of device autoconfiguration and numbering under Solaris,
followed by a look at persistence in device numbering and how to
override the defaults and fix some common problems.
Land of 1,000 devices
The late jazz bassist Charles Mingus said that taking something
complex and making it simple showed true creativity. One of the
elegant simplicities of the Unix operating system is the way in which
it presents physical device interfaces to the system programmer.
Devices, such as disk drives, framebuffers, pseudo-terminals, and real
serial ports appear as filesystem entries, allowing the usual set of
file manipulation system calls to be used as the application
programming interface. There's no need to learn a separate device
liturgy for each new type of hardware. Reducing the API suite to a
single set of interfaces makes it easier to port a database, for
example, that may use raw disk devices or a filesystem.
However, the output of ls shows you that device entries
in the filesystem aren't quite identical to those of regular files or
directories:
luey% ls -l sd@3,0:a* brw-r----- 1 root sys 32, 24 Oct 14 12:17 sd@3,0:a crw-r----- 1 root sys 32, 24 Oct 14 12:17 sd@3,0:a,raw
The first character in the mode tells you if this is a character (c)
or block (b) device; character devices are read a byte at a time, like
normal files, while block devices can only be accessed in multiples of
the block size. Disks are the most common block devices, while network
interfaces, terminal devices of all flavors, and tape drives are
character devices. Device, or special files, also sport a pair of
numbers in place of a size; the numbers are the major and minor
identifiers, respectively. Major numbers are indexes into the kernel's
table of device drivers, associating routines to manipulate the device
with the user-visible name for the hardware. Minor numbers are simply
instance numbers for the device -- they tell you how many you have, and
which particular unit of the device family you're addressing. The
difficult problem is telling the kernel about a new device, and making
sure it creates the appropriate associations between filesystem
entries and its own configuration tables.
SunOS 4.x and its Berkeley heritage embedded the problem of device
numbering in the kernel configuration file. If you wanted to add a new
device or increase the largest device minor number in use, you had to
reconfigure and rebuild the kernel. Even simple tasks, such as telling
the kernel that the SCSI disk on target 4 was to be known as sd4
required hand-crafting configuration files and a kernel rebuild. SunOS
devices live in the /dev directory of the root filesystem, a flat
namespace for all device types and instances.
Solaris 2.x introduced dynamic kernel configuration, removing kernel
configuration, builds, and links from the repertoire of regular system
care and feeding. The Solaris kernel identifies the drivers it needs,
links them in while building a table of major numbers, and then
assigns minor numbers to devices it finds after booting. Add a new
disk device, and Solaris assigns it the next available minor number.
When you add a new type of device such as a quad ethernet controller,
the major number table gets updated and the board's devices are
identified starting with minor number 0. The /dev directory
is now just a directory of links to the actual mapping of filesystem
entries to geographic device descriptions in /devices.
Robbins 8th & Walnut: Our Name Is Our Address*
File names in the /devices hierarchy reflect the
machine's physical connections and logical bus layout: the type of I/O
interface, any address and slot or unit number, and a device name and
minor number or other identifier:
brw-rw-rw- 1 root sys 36, 0 Oct 14 12:17 ./obio/SUNW,fdtwo@0,700000:a brw-r----- 1 root sys 32, 26 Oct 14 12:17 ./iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0:c crw------- 1 stern 11010 39, 0 Oct 14 12:17 ./iommu@f,e0000000/sbus@f,e0001000/cgsix@2,0:cgsix0
The first example is the floppy drive on my SPARCstation 10. It's
attached to the on-board I/O controller (obio), and the device name is
SUNW,fdtwo. It's at location 0, address 700000, and this
device refers to the "a" partition of the disk. The second and third
examples are for SBus-based devices. The second is a SCSI disk
attached to the on-board SCSI controller. It's connected to the main
system bus through the IOMMU (I/O memory management unit), which has a
control address associated with it. Most on-board Sbus-connected
devices that are on-board live in slot "f" -- including the control
units. The next element in the pathname shows you it's an SBus device,
also controlled through slot "f". The "esp" elements that follow are
the ESP SCSI host adaptor's DMA channel, and the ESP SCSI host
interface unit, also with control information. The final pathname
component is the SCSI disk definition: it's at target 3, logical unit
(LUN) 0, and this device refers to the "c" partition. The final
example is for the frame buffer, a cgsix device, sitting in SBus slot
2 as indicated by the cgsix@2,0 element. While these
pathnames are quite complex, they provide you a detailed view of how
hardware is plugged into the machine, and what has been discovered by
the boot prom. On server machines with multiple SBus interfaces,
you'll see more variation in the IOMMU and Sbus addressing.
Building the device tree, and creating the symbolic links to it, is a
complex process that is part of every system boot. The subtle
hand-offs and dependencies involved in adding a new device would tax
the skills of the American Ballet Theater or the Dallas Cowboys.
Before we get into diagnostics and fine-tuning device configurations,
let's walk through the boot process to see how the configuration
files, minor numbers, and links are assembled.
Building it from memory: Constructing the device landscape
After a power-on self test, every current Sun/SPARC system uses its
open boot prom (OBP, see
"Open boot secrets revealed",
Unix Insider October 1995) to probe out attached hardware,
building a machine topology that is kept in memory and handed off to
the nascent kernel. If the reconfigure -r flag was passed
to the boot program, the system will rebuild the /dev and
/devices directories, adding new devices or renumbering and
re-assigning those that have moved within the system.
A system device reconfiguration occurs in three major steps:
add_drv utility from withinls -l listing of the filesystem entries. Deviceadd_drv and noted indrvconfig utility takes thedrvconfig does its work, it setsdevlinks utility isIf you feel that a small sleight-of-hand is going on somewhere
between locating devices and building a consistent view of the
world, you're either remarkably perceptive, of you've experienced
that sinking feeling that comes from realizing that you are now
swapping to the disk that had your database on it and that a major
customer's order file is now represented by the swap pages underlying
a rude JPEG image.
Consistency is everything: Retaining device state across reboots
If everything is done dynamically, how do you ensure that life remains
the same across reboots? The answer comes from step 2 above, where
drvconfig builds the /devices tree and assigns
minor numbers. As drvconfig does its work, it is charged
with maintaining a sense of history between boots -- it notes the
mapping of physical, geographic addresses to minor numbers in the
/etc/path_to_inst file, and updates this file if needed with
new device information. Essentially, drvconfig's use of
/etc/path_to_inst ensures that once you put root on sd3, it
stays there, and that the data and log segments of your database on
sd2 don't get mixed up with sd3 after a reboot to add a third disk
drive.
If drvconfig can find a match between a device in the
in-memory tree and an entry in /etc/path_to_inst, it
continues using the minor number previously assigned. If a new device
appears, it is given the next available minor number. The full
geographic path to the device is noted in /etc/path_to_inst
as shown by this excerpt for the sd3 and fd0 devices from the example
above:
"/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0" 3 "sd" "/obio/SUNW,fdtwo@0,700000" 0 "fd"
Note that a device minor number isn't re-used if the device once
existed and then doesn't respond at boot time -- you don't want to
renumber your disks if one dies, for example, and you're counting on
your disk mirroring to get you through the failure. Smoldering disks
shouldn't lead to a melting database as the hardware failure is
communicated to you through a software disaster.
The implications of the "no re-use" policy can lead to unintentional
renumbering, however. Let's say you have a quad Ethernet controller in
board 1, Sbus slot 1 of a server, and you want to move it to a
different I/O board. Physically moving the card doesn't change a thing
as far as available hardware, but you've modified the geographic
description of the machine. As drvconfig scans the
in-memory device tree, it will believe that the "old" quad Ethernet
card is dead, and that a new one has appeared in a previously unused
slot. As a result, your network interfaces are assigned the next
available minor numbers and show up as qe4, qe5, qe6 and qe7. If you
hadn't taken the time to modify your /etc/hostname.*
configurations, you'll have trouble using the network.
To work around this dynamic derailment of your desired configuration,
edit /etc/path_to_inst by hand. You might want to do this if
you add a second network interface and want to switch their minor
numbers, changing the physical interface that is qe0 or le0 and
therefore becomes the default route. To implement a change in minor
device numbering, either correct the minor numbers you find in
/etc/path_to_inst, or remove the entries for the devices you
want renumbered and let drvconfig start from ground zero
on a reboot. You must do a boot -r to get the changes to
take effect. The manual page has more information but fails to put the
following warning in huge flashing lights: do not remove
/etc/path_to_inst, or you won't be able to find somewhat
important devices like the root disk and the swap device. As with all
key configuration files, make a backup, and preferably copy the file
to another machine so you can inspect your handiwork later if
required.
Small cordless devices: How to play with your hardware and not get toasted
Device configuration is yet another area where things go subtly
wrong when you are under the most pressure. Here are some of the
more useful tips and tricks to help your play with your devices:
prtconf -p, displayingprtconf; if it doesn't show up there it's not responding-P option and you'll see pseudo devices in the mix asdrvconfig invoked out of-d option andname="sd" class="scsi" target=0 lun=0; # target 0 LUN 0, default name="sd" class="scsi" target=0 lun=1; # target 0 LUN 1 name="sd" class="scsi" target=0 lun=2; # target 0 LUN 2 name="sd" class="scsi" target=0 lun=3; # target 0 LUN 3
You'll see the multiple units show up as sd@0,0 through sd@0,3
in /devices.
Knowing how the system assembles the software representation of its
hardware configuration may represent the closest thing to a computer's
mind-body problem. It's up to you and your managerial devices to coax
it through times of crisis.
Unix Insider