How To: Guides for users and Maryland T3 admins.

Help: Links and emails for further info.

Configuration: technical layout of the cluster, primarily for admins.

Log: All changes done to the cluster and errors encountered, primarily for admins.

Open Science Grid

Description Install, configure, and run an Open Science Grid compute and storage element using BeStMan-Gateway with Hadoop.
Notes

Page being actively updated March 2014

These instructions are for OSG 3.0. Configuration is primarily based on historical choices which were optimized for older versions of OSG and T3_US_UMD hardware and are occasionally difficult to use in the OSG3 setup.
You are better off following the official OSG3 Release documentation for design and setup of your cluster.
We configured a combined CE and SE as described here.
The OSG 1.2 version of this web page is archived for local documentation purposes only.

Warning Never blindly follow commands from this or any other guide. Some may be optional, some are to be run on different machines as different users, some may be only valid during initial setup and not a re-install.
Last modified March 11, 2014

Table of Contents

Setup yum:

Description Install EPEL, OSG repositories, and setup yum priorities
Dependencies None
Notes Each of the OSG3 guides will instruct you to do this as the beginning step to install that particular client
Guides OSG3 Using yum and RPM

As root (su -) on all nodes (this can be done using rocks commands on the HN instead):

Install OSG Client on the Interactive Nodes:

Description Install the OSG Client on the Interactive Nodes
Dependencies None.
- Steps shown assume you setup yum above
Notes These are only additional notes, follow the official OSG guide. We choose to have certificates update automatically using osg-ca-scripts.
Guides

- OSG guide to installing the OSG Client
- OSG guide to installing the certificate authorities

Request host certificates:

Description Get site and service certificates.
Dependencies

- Request comes from CMS T3 sysadmin with grid-admin privileges
- Machine you request the certificates on must have the OSG client installed
- Steps shown assume you have configured RPM above

Notes These are only additional notes, follow the official OSG guide to get certificates.
Guides - OSG guide to getting host and service certificates
- OSG guide for certificate request as GridAdmin

If you choose to install GUMS, you will need host and http certs for your GUMS host as well. We install GUMS on our SE node, so place two more requests using hepcms-1.umd.edu instead of hepcms-0.umd.edu. Configure the certificates as tomcat owned http certificates (with appropriate permissions as above) following the GUMS guide.

Any changes in the authority which provides the site certificates will require adding them appropriately in GUMS for http and rsv services.

If you are doing a yearly replacement of certificates, be sure to Stop and Start OSG services followed by tests of the grid services. Be sure to test again after the date at which the old certificates expire in case you forgot to replace something.

Install and configure the CE, BeStMan, and the WN client

Description Install all OSG core software, specifically the compute element, a BeStMan-Gateway storage element, and the worker node client.
Dependencies - Site certificates
- RPM setup on the grid node
- Disk array network mounted on all nodes
Notes We install the worker-node client, the CE, and SE all on the same node (the grid node) and the CE & SE in the same directory.
Guides

- OSG release documentation
- VDT documentation
- OSG Tier-3 guide
- OSG ports guide

Prepare the environment:

Description Prepare to install OSG by creating the appropriate directories, network mounting, installing xinetd, and changing the output of hostname.
Dependencies - Base installation directory (we use /sharesoft) network mounted on all nodes
Notes We install the worker-node client, the CE, and SE all on the same node (the grid node) and the CE & SE in the same directory.
Guides - How to change the output of hostname
  1. Create some directories which are wanted only for historical OSG reasons (for the empty setup.(c)sh script). As root on the GN (su -):
    mkdir /scratch/osg
    mkdir /scratch/osg/ce

    touch /scratch/osg/ce/setup.sh
    touch /scratch/osg/ce/setup.csh
  2. Create directories needed for test jobs:
    mkdir /hadoop/osg
    chown root:users /hadoop/osg
    chmod 775 /hadoop/osg
  3. Have all nodes (including the GN) mount /scratch/osg on the GN as /sharesoft/osg. Edit /etc/auto.sharesoft on the HN as root (su -) and add the line:
    osg grid-0-0.local:/scratch/osg
  4. We use /tmp on the WNs as the temporary working directory for OSG jobs. If you haven't done so already, configure cron to garbage collect /tmp on all of the nodes.
  5. Either Rocks 5.4 or SL 5.4 doesn't install the xinetd service on the GN, which is needed by OSG services. Install it, start it, and add it to the boot sequence (not certain this is needed in Release3 of OSG):
    yum install xinetd
    /etc/rc.d/init.d/xinetd restart
    chkconfig --add xinetd
    chkconfig xinetd on
  6. On a Rocks appliance, the command hostname outputs the local name (in our case, grid-0-0.local) instead of the FQHN. OSG needs hostname to output the FQHN, so we modify our configuration such that hostname prints hepcms-0.umd.edu following these instructions. Specifically (Note that it is probably better to do this using Rocks tools instead of hard coded):
    1. In /etc/sysconfig/network, replace:
      HOSTNAME=grid-0-0.local
      with
      HOSTNAME=hepcms-0.umd.edu
    2. In /etc/hosts, add:
      128.8.164.12 hepcms-0.umd.edu
    3. Then tell hostname to print the true FQHN:
      hostname hepcms-0.umd.edu
    4. And restart the network:
      service network restart
    5. Important: log out from the GN and log back in again before proceeding. (otherwise your CE install below may not pick up the correct hostname)
  7. Follow the OSG ports guide to open up necessary ports.

Install the compute element:

Description Needs rewriting for Release3! Install the CE, set Condor as the job manager, install ManagedFork, handle port conflicts, download certs, and configure rsvuser.
Dependencies - Site certificates obtained
Notes These are only additional notes, follow the official OSG CE release docs, consulting our notes for details. We install the worker-node client, the CE, and SE all on the same node (the grid node) and the CE & SE in the same directory.
Guides - OSG release docs to install the CE
- Using condor as the OSG jobmanager
- Installing ManagedFork

Configure the CE:

Description Configure which services the CE will run and other various settings in config.ini.
Dependencies - OSG CE installed and CE environment sourced
Notes Follow the official OSG CE release docs, our config.ini is available here for reference. In OSG 1.2, config.ini is placed in the $VDT_LOCATION/osg/etc directory instead of $VDT_LOCATION/monitoring. After editing config.ini, be sure to call configure-osg -v to verify your syntax and configure-osg -c to actually configure OSG. Because we install the CE & SE software in the same directory, the config.ini which comes with the SE will overwrite the config.ini from the CE. This may also occur on subsequent updates, so be sure to keep a backed up copy of config.ini in a different location.
Guides - OSG release docs to configure the CE

Get the OSG environment:

Description Edit the .bashrc & .cshrc skeleton files so new users get the OSG environment on login.
Dependencies None
Notes Highly optional - this is just how we do it at UMD. Existing users (such as cmssoft) will have to add the source commands to their ~/.bashrc & ~/.cshrc files.
Guides

As root (su -) on the HN:

  1. Add to /etc/skel/.bashrc:
    . /sharesoft/osg/ce/setup.sh
  2. Add to /etc/skel/.cshrc:
    source /sharesoft/osg/ce/setup.csh

Edit vomses:

Description Edit the vomses file to use one proxy server for cms.
Dependencies - OSG CE installed
Notes This is optional, and if users let CRAB initiate getting a proxy this is not needed. Do NOT remove mis or ops
Guides

As root (su -) on the GN:

  1. Remove the following line from /sharesoft/osg/ce/glite/etc/vomses:
    "cms" "voms.cern.ch" "15002" "/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch" "cms"

Install and configure GUMS:

Description Install the service which maps a user's distinguished name in their certificate to an account on your cluster. We install GUMS on the HN.
Dependencies - OSG CE installed (for /etc/grid-security/certificates).
Notes These are only additional notes, follow the official OSG release docs, consulting our notes for details. The alternative to GUMS is the grid-mapfile service, which is a simpler way to get started. However, GUMS is highly recommended as the permanent authentication mechanism.
Guides - OSG guide to getting site certificates
- OSG GUMS guide

Follow the OSG GUMS guide to install and configure. Notes and additions:

Install & configure the gridftp-hdfs service:

Description Install the service which maps a user's distinguished name in their certificate to an account on your cluster.
Dependencies - GUMS installed, configured, and running
- Hadoop installed and running
Notes These are only additional notes, follow the official OSG release docs, consulting our notes for details. Full testing may only be available after the SE is configured below.
Guides - Hadoop docs for gridftp-hdfs


 

Install & configure the storage element:

Description Install BeStMan-Gateway to provide access to disk array via grid utilities. (NOT fixed for OSG Release3!)
Dependencies - OSG CE configured
- GUMS installed, configured, and running
- Disk array network mounted on all nodes
Notes These are only additional notes, follow the official OSG release docs, consulting our notes for details. We install the worker-node client, the CE, and SE all on the same node (the grid node) and the CE & SE in the same directory. We run BeStMan as user "best" instead of as "daemon" because we do not allow world-readable access to files on our SE. The daemon user is not in the "users" group, but the best user is.
Guides - OSG release docs for BeStMan-Gateway

Install the worker node client:

Description Install the worker node client to give worker nodes access to certificate information, system configuration, and software binaries.
Dependencies - OSG CE configured
Notes These are only additional notes, follow the official OSG release docs, consulting our notes for details.
Guides - OSG release docs for worker node client

Start the CE & SE

Description Start the compute and service elements, check RSV tests and service logs, publish CMSSW software pins if applicable.
Dependencies - OSG CE configured
- OSG SE installed and configured
- OSG WN client installed
Notes
Guides - Globus 2.0 error codes (for debugging)

As root (su -) on the GN:

  1. Start the OSG CE & SE:
    cd /sharesoft/osg/ce
    . setup.sh
    vdt-control --on

    This starts all the services for both the CE & SE because we installed them in the same directory.
    • In the case of running gridftp-hdfs, make sure gsiftp is disabled
  2. You can perform a series of simple tests to see if your CE has basic functionality. Login to any user account and:
    source /sharesoft/osg/ce/setup.csh
    grid-proxy-init
    cd /sharesoft/osg/ce/verify
    ./site_verify.pl
  3. The CEmon log is kept at $VDT_LOCATION/glite/var/log/glite-ce-monitor.log.
  4. The GIP logs are kept at $VDT_LOCATION/gip/var/logs.
  5. globus & gridftp logs are kept in $GLOBUS_LOCATION/var and $GLOBUS_LOCATION/var/log.
  6. The BeStMan log is kept in $VDT_LOCATION/vdt-app-data/bestman/logs/event.srm.log.
  7. Results of the RSV probes will be visible at https://hepcms-0.umd.edu:7443/rsv in 15-30 mins. Further information can be found in the CE $VDT_LOCATION/osg-rsv/logs/probes.
  8. You can force RSV probes to run immediately (as rsvuser on GN) following these instructions: rsv-control --run --all-enabled.

After starting the CE for the first time, the file /sharesoft/osg/app/etc/grid3-locations.txt is made. This file is used to publish VO software pins and should be edited every time a new VO software release is installed or removed. If CMSSW is installed (instructions below are repeated in the CMSSW installation):

  1. Add a link to the CMSSW installation in the osg-app directory:
    cd /sharesoft/osg/app
    mkdir cmssoft
    chmod 777 cmssoft
    chown cmssoft:users cmssoft
  2. Give cmssoft ownership of the release file:
    chown cmssoft:users /sharesoft/osg/app/etc/grid3-locations.txt
  3. As cmssoft (su - cmssoft), create the needed symlink in the OSG APP directory to CMSSW:
    cd /sharesoft/osg/app/cmssoft
    ln -s /sharesoft/cmssw cms
  4. As cmssoft (su - cmssoft), inform BDII which versions of CMSSW are installed and that we have the slc4_ia32_gcc345 environment. Edit /sharesoft/osg/apps/etc/grid3-locations.txt to include the lines:
    VO-cms-slc4_ia32_gcc345 slc4_ia32_gcc345 /sharesoft/cmssw
    VO-cms-CMSSW_X_Y_Z CMSSW_X_Y_Z /sharesoft/cmssw
    (modify X_Y_Z and add a new line for each release of CMSSW installed)

Register with the Grid Operations Center (GOC):

Description Register site with OSG.
Dependencies None
Notes We used a much older registration process, but provide the options we selected for reference. Follow the OIM registration instructions guide for details as the new registration page has changed substantially.
Guides - OIM web portal
- OIM registration instructions
- BDII information about your site, once registered