Marc2_AdminGuide/3_Installation

Installation Procedures

This chapter explains the installation procedure for the different types of nodes in the system. After reading this, the administrator should be able to install a node of a certain class after some component has been replaced.

SystemImager

SystemImager is a software package which eases the installation of a set of computers with similar properties and hardware. The basic idea is to set up a machine according to the desired specifications (“golden client”) and then clone it to the rest of the machines. The process of cloning should keep the common configuration (e.g. disk layout) the same, while at the same time changing the individual features of the clones (e.g. the IP addresses and host names). The way this happens in SystemImager is via rsyncing the hard disk contents onto the installation server, creating a “golden client image”, which can then be deployed back to the rest of the machines. The necessary changes are performed by several mechanisms: scripts and overrides. Another important point are the exclude lists, which are used to keep unwanted files out of the cloning process. This is important for e.g. log files or the /proc file system.

ParaStation pscluster

The ParaStation pscluster suite is a set of scripts written around the low level SystemImager commands. It will save you the trouble of having to learn the right options for the different steps of the installation which can sometimes involve operations on several nodes. In the order of the steps in the installation process, the following scripts are available:

  • psnodes-getimage: sets up the model machine as golden client and retrieves the image into the installation server.
  • psnodes-addnew: collects MAC address information and updates the DHCP server's database, then distributes the image onto the target nodes.
  • psnodes-admin: displays information about configuration details.
  • psnodes-reinstall: reinstalls a node.
  • psnodes-replace: reinstalls a node whose MAC address has changed (e.g. after main board exchange).
  • psnodes-update: propagates changes in the golden client image into the nodes, e.g. after adding software or upgrading versions.

Some more commands dealing with power management and monitoring are:

  • psconsole: starts a serial over LAN session so that the screen output of a machine can be monitored on a terminal window.
  • pspowercycle: does a hard power cycling of a node via IPMI.
  • pspowerbutton: has the same effect like pushing the power button of a node.

For a detailed description and more available commands, see the pscluster documentation and manual pages.

Installation boot concept

The installation boot concept is entirely network based. The installation server (= master) is set up as DHCP and TFTP boot server and the nodes are configured to perform a PXE boot. This requires that the BIOS is configured appropriately. Additional BIOS settings have to ensure that the IPMI address is entered correctly and that serial over LAN capability is available. For easy installation of the nodes the IP address of their BMC has to be setup correctly to allow troubleshooting via serial console. During installation, the servers are also power cycled automatically using their BMC. The MAC address of the ethernet interface used for installation determines the node name.

Assuming the above assumptions all apply, powering up a client node will start the following actions:

  1. On the node, the network card will send a DHCP request onto the network. This request contains the MAC address of the card.
  2. The DHCP server will receive the MAC address, consult its database and assign the IP address matched to the MAC address.
  3. The PXE boot will ask the TFTP boot server for a boot file whose filename represents the IP address assigned before. This file can contain instructions to boot from local disk or to load an installation kernel and ramdisk provided by the boot server. While (re-)installing a node, the information where to load an installation kernel and ramdisk will be returned to the node.
  4. The kernel and ramdisk are loaded by the client using tftp and will start the installation process.

It is important that the boot file switches back from installation to boot from local disk after the node is installed.

An essentially important point is that the DHCP server hands out the correct addresses according to the MAC addresses present in the cluster. In the end, this determines the relation between the assigned IP address and the physical location. Several methods are supported:

  1. Assigning IP addresses in the order in which machines are powered up.
  2. Using a list of MAC addresses provided by the hardware vendor. The list is represented as a file in which each line contains the pair <nodename> <MAC>.
  3. If the hardware permits, the MAC addresses can be read from service processors, e.g. in the case of a blade system.

The first method is only practical for systems of order of 100 nodes, thus method 2 will be used in Marc2.

Scripts, excludes and overrides

The pscluster scripts use several configuration files and scripts. There are pre- and post-install scripts as well as files which select the files seen by systemimager. It is useful to remember that the rsync process always runs in “pull” mode.

Excludes

Excludes determine which files are not considered in the process of getting and distributing images.

The following files govern the exclusions when getting images or distributing them to the clients:

  • /etc/parastation/general/excludes-si_getimage: determines what is skipped during the fetching of the image from the golden client, passed by the pscluster-getimage script via the —exclude-file option to SystemImager. You may want to modify this, e.g. for keeping the directories on shared file systems or log files.
  • /etc/parastation/general/updateclient.local.exclude: determines what is skipped when fetching the image from the server onto the client.

Overrides

According to the SystemImager manual, overrides can be used to manage differences from images. A typical use is to apply different configuration files to different nodes if needed. Override files are stored in /var/lib/systemimager/overrides. Currently, the following overrides are used in the setup of Marc2:
marc2-h2

Pre- and Postinstall scripts

Pre- and postinstall scripts provide a further way of customising the installation. They are placed in the directories /var/lib/systemimager/scripts/pre-install and /var/lib/systemimager/scripts/post-install. The script has to start with a two digit number which determines the order in which they are processed. The pscluster package installs a default set which usually does not require modifications.

Autoinstall scripts

The autoinstall script contains the knowledge about the system layout. Most important here are the file system configuration and the boot loader configuration. It can be found under /var/lib/systemimager/scripts/<golden client image name>.master.

For the file system layout, the underlying SystemImager script records the sizes of the disk and partitions of the machine installed as golden client. Clearly, this is undesirable for clusters which have varying disk sizes: Either one wastes space or the partitioning of clients will fail because the disk is smaller than expected. To this end, one should use the option psnodes-getimage -a which will allow editing of the corresponding file, followed by the necessary re-generation of the autoinstall script. The two places to change are the size of the actual partition and the size of the last disk (e.g. the scratch disk of a node).

<disk dev="/dev/sda" label_type="msdos" unit_of_measurement="MB">
	<!--
	This disk's output was brought to you by the partition tool "parted",
	and by the numbers 4 and 5 and the letter Q.
	-->
	<part num="1" size="98.67" p_type="primary" p_name="-" flags="boot"/>
	<part num="2" size="*" p_type="primary" p_name="-" flags="lvm" \ lvm_group="system"/>
</disk>

In the LVM section, make sure that the scratch partition takes the remaining space:

<lvm version="2">
	<lvm_group name="system" max_log_vols="0" max_phys_vols="0" phys_extent_size="32768K">
	<lv name="root" size="10485760K" />
	<lv name="swap" size="8388608K" />
	<lv name="tmp" size="10485760K" />
	<lv name="var" size="10485760K" />
	<lv name="local" size="*" />
	</lvm_group>
</lvm>

The “*” entry will be translated appropriately into the statements of the shell script generated from this xml file.

The images used for the installation of Marc2

The software base for the Marc2's setup will be CentOS6.2. It is among the software baseline of Redhat Enterprise Server 6.2 and offers the drivers needed to access the relatively new hardware used. Moreover, it packages the Open Fabrics Enterprise Enterprise Distribution (OFED) and a newer gcc version 4.4.6 which supports OpenMP.

The use of SystemImager suggests to prepare different images from golden clients of each particular node type. The imaging concept allows to define images for all types of nodes in the cluster including the service nodes, e.g. master node or file server nodes. The images follow a naming convention. Each image name consists of the cluster name (marc2) and a suffix denoting the type. The following types are used currently:

  • L: login node
  • C: compute node
  • M: master node
  • F: file server node
  • SM: subnet manager node (not used)

Image usage

To get an overview which images are currently available and in use, run

marc2-h1:~ # psnodes-getimage 

Image                     RetrievalTime    GoldenClient    Nodes
Marc2.CentOS62.C2         2012.02.24_16:00 node001         node[002-088]
Marc2.RHEL62.M            2012.02.24_11:03 marc2-h1        marc2-h2

Updating an existing image

To update an existing image from a pre-defined golden client, use

marc2-h1:~ # psnodes-getimage node001
image update reason: for testing only

This program will get the "Marc2.CentOS62.C2" system image from "node001"



An image named "Marc2.CentOS62.C2" already exists...
Update existing image? ([y]/n): y
...

Use the golden client listed previously to update the corresponding image.

Updating nodes

To update all nodes currently installed with a particular image, use

marc2-h1:~ # psnodes-update -i Marc2.CentOS62.C2
updating image Marc2.CentOS62.C2 only
update all nodes (y|[n])? y

All appropriate nodes will be synchronized with the selected image.

Re-installing a node

To (re-)install a node use psnodes-reinstall :

marc2-h1:~ # psnodes-reinstall n42
NodeClass: dell
re-installing node042 with image Marc2.CentOS62.C2 ok ([y]|n)? y
...

The image to be used may be changed by selecting n. The node will be prepared to boot an installation environment using PXE at the time of the next reboot. After loading this installation environment, the node's disk will be wiped out and the selected image will be copied to the newly formatted disk. All system-dependent configuration will be done and all override files will be installed. At the end, the node will be rebooted from the newly installed local disk.

Replacing a node

If node repair changed the MAC address of the ethernet interface (e.g. replaced motherboard or complete node replacement) the system-internal node database has to be modified to include the new MAC address of NIC2. This is done with

psnodes-replace -m {new MAC address} {node}

The command updates the node database and starts re-installation by powercycling the node. If powercycling fails (due to unconfigured iDRAC IP address or IPMI not working), the node may also be switched on manually.

Preliminaries for replacing a node

  • PXE must be activated on NIC2 (restart node, press F2 for System Setup, Integrated Devices: Disable PXE on NIC1, Enable PXE on NIC2; restart node, press F2 again, Boot Order: 1. Embedded NIC2, 2. SATA Optical Drive, 3. Hard Drive C:)
  • System Setup (F2): Embedded Server Management: User-defined LCD string: node-xxx (e.g. node-068)
  • System Setup (F2): Memory Settings: System Memory Testing: Enabled (may be cancelled once by Esc key during each node startup)
  • System Setup (F2): iDRAC settings: IP address 172.26.8.x (e.g. 172.26.8.68 on node068), Subnet mask 255.255.255.0, Gateway 172.26.8.250
  • on the iDRAC front LCD: Setup: Home screen: User string