How To: Guides for users and admins.

Help: Links and emails for further info.

Configuration: technical layout of the cluster, primarily for admins.

Log: All changes done to the cluster, primarily for admins.

Configuration

The UMD HEP T3 cluster is composed of one head node (HN), one grid node (GN), two interactive nodes (INs), and eight worker nodes (WNs). After RAID and formatting, we have ~9TB disk space for large datesets, ~400GB for network mounted software such as CMSSW, and ~400GB disk space for users' network mounted /home. Our cluster is managed by Rocks and is designed to have full T3 capability, including a storage element. It is on the Open Science Grid (OSG) and affiliated with the CMS virtual organization (VO).

Last edited August 17, 2009

Table of Contents

 


 

Node Roles

The OSG Site Planning guide played an important role in the design of our cluster. Our head node (HN) distributes the OS and basic configuration to all other nodes via Rocks Kickstart files, as well as running the Squid web proxy for accessing CMSSW's Frontier database. The grid node (GN) runs the OSG computing element (CE), storage element (SE), PhEDEx, and CMSSW. Users login to and run interactive jobs on the two interactive nodes (INs), which have locally installed gLite-UI & CRAB software. The eight worker nodes (WNs) are members of the condor pool and service batch jobs submitted either by local users or grid users within our supported VOs (primarily CMS).

Head node:

external name: hepcms-hn.umd.edu
internal name: HEPCMS-0 (for historical reasons)

Grid node:

external name: hepcms-0.umd.edu
internal name: grid-0-0

Having one node fulfill the four important roles of CE, SE, PhEDEx service, and CMSSW network mount is not a scalable solution. We do this because splitting the roles is not practical on such a small cluster.

Some implementations of PhEDEx run atop gLite-UI, which may cause problems with the Rocks frontend, OSG CE or SE. Additionally, some CRAB installations (such as ours) can run atop gLite-UI, which may need to be configured differently for CRAB vs. PhEDEx. Our PhEDEx installation uses simple srm commands instead of the specialized file transfer service (FTS), which requires gLite-UI. A PhEDEx installation which uses gLite-UI should not be on the OSC CE or SE, a Rocks frontend, or on a node with gLite-UI configured for CRAB.

Two interactive nodes:

external names: hepcms.umd.edu points to hepcms-in1.umd.edu & hepcms-in2.umd.edu
internal names: interactive-0-0 & interactive-0-1

One note of import is that gLite-UI does not do well on a Rocks frontend (some tarball installations of gLite-UI seem better behaved). So our CRAB, based on gLite-UI, cannot be installed on the HN, nor on the GN for similar problems with the OSG CE & SE. However, CRAB does support job sumission to European sites using Condor GlideIn to some CrabServers, which does not require gLite-UI.

Eight worker nodes:

external name: hepcms-1.umd.edu -> hepcms-7.umd.edu
internal names: compute-0-0 -> compute-0-7

 


 

Hardware

HN: Dell PowerEdge 2950

GN: Dell PowerEdge 2950

INs: Dell PowerEdge 1950

WNs: Dell PowerEdge 1950

PowerVault MD1000 (aka big disk)

PowerConnect 6224

APS 2200 VA

PowerEdge 2160AS KVM switch

 


 

Partitions

Head node:

/dev/sda  69374, RAID-1 67.75 GB physical disks 0:0:0, 0:0:1 :
root/      8189 /sda1 ext3
swap       8189 /sda2 swap
/var       4095 /sda3 ext3
/sda4 is the extended partition which includes /sda5
/scratch  48901 /sda5 ext3

/dev/sdb 418168, RAID-5 408.38 GB physical disks 0:0:2, 0:0:3, 1:0:4, 1:0:5 :
/export  418168  /sdb1 ext3

Grid node:

/dev/sda   69374, RAID-1 67.75 GB physical disks 0:0:0, 0:0:1 :
root/       8189 /sda1 ext3
/tmp        8189 /sda2 ext3
swap        4094 /sda3 swap
/sda4 is the extended partition which includes the rest
swap        4095 /sda5 swap
/var        4095 /sda6 ext3
/localsoft 40712 /sda7 ext3

/dev/sdb 418168, RAID-5 408.38 GB physical disks 0:0:2, 0:0:3, 1:0:4, 1:0:5 :
/scratch 418168  /sdb1 ext3

/dev/sdc 9744877, RAID-6, 8.9 TB 15 physical disks
(Logical volume)
/data    9744877 /dev/mapper/datastore-cmsdata0 xfs

Interactive nodes:

/dev/sda         134.8 GB :
root/              7.9 GB /sda1 ext3
swap               7.9 GB /sda2 ext3
/var               4.0 GB /sda3 ext3
/sda4 is the extended partition which includes the rest
/scratch           115 GB /sda5 ext3

/dev/sdb         134.8 GB :
/tmp             134.8 GB /sdb1 ext3

Worker nodes:

/dev/sda          76293 :
root/              8192 /sda1 ext3
swap               8192 /sda2 ext3
/var               4096 /sda3 ext3
/scratch          55813 /sda4 ext3

/dev/sdb         238418 :
/tmp             238418 /sdb1 ext3

Big disk array:

The entire disk array is treated as a single drive in the OS. We use RAID-6 so single disk failure will not result in a significant performance loss and so our data survives dual disk failure. This disk is treated as a logical volume in the OS. Our disk array allows connections to up to two additional arrays in a daisy-chain. By doing an LVM, we can install additional arrays and merely extend the LVM over the new available space. We use the XFS formatting system, which is designed to handle large disk volumes and has been documented to perform well with BeStMan. While we do not use BeStMan in a pure storage resource manager (SRM) capacity, the ability to do so later may become necessary as the size of the volume increases. The disk array, at the present time, is managed by the OS and is network mounted as /data on all nodes. This makes the array much more accessible to users, but is not a scalable solution. After RAID-6 and formatting, our disk array is roughly 9TB in size.

 


 

Network

For security purposes, port information is not listed here. It can be read (by the root user only) in the file ~root/network-ports.txt on the HN.

external IP  : external hostname  : internal IP    : Rocks name 
--------------------------------------------------------------------
    N/A      :   N/A (switch)     : 255.255.254    : network-0-0
128.8.164.11 : hepcms-hn.umd.edu  : 10.1.1.1       : HEPCMS-0 
128.8.164.12 : hepcms-0.umd.edu   : 10.255.255.237 : grid-0-0 
128.8.164.13 : hepcms-1.umd.edu   : 10.255.255.253 : compute-0-0 
128.8.164.14 : hepcms-2.umd.edu   : 10.255.255.252 : compute-0-1 
128.8.164.15 : hepcms-3.umd.edu   : 10.255.255.251 : compute-0-2 
128.8.164.16 : hepcms-4.umd.edu   : 10.255.255.250 : compute-0-3 
128.8.164.17 : hepcms-5.umd.edu   : 10.255.255.249 : compute-0-4 
128.8.164.18 : hepcms-6.umd.edu   : 10.255.255.248 : compute-0-5 
128.8.164.19 : hepcms-7.umd.edu   : 10.255.255.247 : compute-0-6 
128.8.164.20 : hepcms-8.umd.edu   : 10.255.255.246 : compute-0-7 
128.8.164.21 : hepcms-in1.umd.edu : 10.255.255.236 : interactive-0-0 
128.8.164.22 : hepcms-in2.umd.edu : 10.255.255.235 : interactive-0-1

internal network always on eth0
external network always on eth1

External Gateway: 128.8.164.1
Netmask for external internet: 255.255.255.0
Netmask for internal network (on HN): 255.0.0.0
DNS for external internet: 128.8.74.2, 128.8.76.2
DNS for internal network (on HN): 10.1.1.1

The command 'dbreport dhcpd' issued from the HN can provide much of this information, including MAC addresses.