Installing an iRODS slave server

From BeSTGRID

Jump to: navigation, search

The BeSTGRID DataFabric has been deployed as a single iRODS zone. Institutions are welcome to join the BeSTGRID DataFabric by providing a local storage resource. To link the resource into the BeSTGRID single iRODS zone, the institution would have to deploy a local iRODS server acting as a slave server in the BeSTGRID zone - and handling requests for that resource. The institution can then install a local copy of the Davis web-and-webDAV interface on top of the local iRODS server. Optionally, the institution could also install a GridFTP server as an additional entry point into the DataFabric.

This guide covers the following:

  • Deployment considerations for the local iRODS server.
  • Step-by-step guide for installing the local iRODS server.
  • Step-by-step guide for a local Davis installation (optional but recommended)
  • Step-by-step guide for installing Griffin (optional, the GridFTP-to-iRODS interface)

Contents

[edit] Deployment considerations

We recommend each site deploys first a test server, to test the whole installation process, polish off the rough edges, and be prepared for the real deployment. The test server would aim to test software interoperability and network settings - so all of the prerequisites would apply as well, except for the amount of storage and hardware performance requirements.

The recommended naming conventions:

  • For a test server: irodsdev.your.site.name (TBD)
  • For a production server: irods.your.site.name (if the system is not running Davis)
  • An additional CNAME for a production server if the site is running Davis: bestgrid-df.your.site.name

[edit] Host system selection

The deployment is primarily dependent on the location and connectivity of the actual storage resource to be made available. For performance reasons, iRODS should be as close to the storage resource as possible. Also, the network connections should be as fast as possible (dedicated 1Gbps).

The implication of this is that iRODS should not be installed on a standard virtual machine mounting the storage resource over NFS. NFS writes are painfully slow and this would make storing a large file into the system unacceptably slow. Ideally, iRODS should be installed on a system "as close to the data as possible". E.g., at Canterbury, iRODS is running on an IBM p520 running AIX, directly linked into the GPFS filesystem where the storage resource resides. For a deployment in a virtual machine, a direct fibre-channel HBA dedicated to the VM is a good solution. iSCSI over a fast network connection (see below) would also work.

If deploying inside a virtual machine, it's highly recommended to use a dedicated Ethernet adapters, used exclusively by the virtual machine. From the experience with Xen, high throughput data traffic over the bridged network was straining the CPU in the dom0 host too much. Using a dedicated Ethernet adapter (allocated to the VM at the PCI level) has removed the excessive CPU load. Similar rules apply to other virtualization platform.

Note that these recommendations are the "ideal case". If you can't meet them at your site, do your best to get as close as possible. But it is still better to have an a bit slower resource linked into the BeSTGRID DataFabric than not having a resource linked - these recommendations are not a dogma.


For a test server, installing iRODS in a standard virtual machine, using only a local directory as a storage resource is perfectly fine.

[edit] Software selection

Each institution's iRODS server should be running:

  • iRODS server. All of the sites must be running the same version, at the time of writing, it is version 2.3 (with a selection of patches, see below). The purpose of running a local iRODS server is two-fold: (i) provide access to a local storage resource and (ii) provide a local entry point into the DataFabric. The iRODS server will be configured as a slave server without support database interaction and metadata catalogue (these functions will be handled by the master server).
  • Davis (version 0.9.1 at the time of writing). Davis provides the web and webDAV interface into iRODS - both to files on the local storage resource, and to the whole contents of the BeSTGRID DataFabric. Installing Davis is optional - local users can still use the central Davis instance. For performance and easier accessibility, an institutions can choose to deploy a local copy of Davis and recommend that to local users as the preferred entry point into the DataFabric.
  • Griffin (optional, version 0.7.1 at the time of writing). Griffin provides a GridFTP interface into the iRODS virtual filesystem. Deploying Griffin is optional - and probably unnecessary to be done at each institutions. The reason to deploy Griffin would be to use the DataFabric from within the computational grid (and to have fast local access instead of going through the central server).
  • Note: it is possible to run Davis and iRODS (and Griffin) on separate systems - please interpret the relevant sections of the guide accordingly.
  • Note also that theoretically, it would be possible to install only Davis and point it to the central master iRODS server - but, there would be little benefit from doing the whole exercise, and this guide does not recommend it.

[edit] Prerequisites

[edit] OS requirements

iRODS itself is supported on a number of systems, and the selection is more a matter of what integrates well with the storage system and with the institutional ICT infrastructure. iRODS is known to run well on a number of Linux distributions, as well as other POSIX systems (AIX,...). Unless there's another reason, the first recommendation would be to go for Linux, and within that, for CentOS 5. But other choices would be very well acceptable for iRODS itself.

Davis is a web application, and should be installed together with the Shibboleth Service Provider software to facilitate Shibboleth login into Davis. Installing (and possibly compiling) Shibboleth on some exotic architectures (such as AIX) could be a very challenging task - and that could be a reason for splitting Davis and iRODS into separate hosts (as it was done at Canterbury).

Griffin is a Java application and would probably run anywhere where Java runs. Linux / CentOS 5 again recommended.

A 64-bit OS is preferred if available.

[edit] Hardware requirements

  • RAM (minimum/recommended):
    • iRODS only: 1024MB / 2048MB
    • iRODS+Davis: 2048MB / 4096MB
    • Davis only: 1024MB / 2048MB

(The memory requirements can be halved for a test server).

  • CPU: a reasonable up to date dual-core system.
  • Swap: twice the RAM (recommended)

[edit] Data resource integration

Your iRODS server needs to have direct filesystem access to the storage resource (as discussed above, performance matters and NFS mounts are highly discouraged). All of the files stored on the resource will be owned by a single unix account (rods), and the permission setup should be simple (no need to grant root access to the filesystem).

[edit] Network requirements

  • The server needs a public (and static) IP address.
  • The hostname must resolve to this IP address and the IP address must resolve back to the system's hostname.
  • The server needs to be able to open INcoming and OUTgoing TCP connections to ports 80(http), 443(https), 1247(irods), 5432(postgres), 2811(gridftp), 7512(myproxy), and 40000-41000 (a range of 1001 ports).
  • In addition to that, is requires INcoming + OUTgoing UDP to ports 40000-41000 (again a range of 1001 ports).
  • In addition, OUTgoing UDP to port 4810 (Globus usage statistics packets)
  • Note: The outgoing TCP traffic to ports 80 and 443 MAY go through a proxy (if the http_proxy environment variable is properly set), but all other traffic must be a direct connection.
  • Note: the port 7512 (myproxy) can be OUTgoing only (used by Davis to fetch a proxy certificate for a user).
  • Note: the port 5432 (postgres) is for future use (if we move to database replication across sites)

Note: Remember to check the firewall settings on the server itself, as your default installation may restrict the use of these ports.

[edit] Other requirements

  • The system is setup to send outgoing email (i.e., typically, default SMTP relay would be set to the site's local SMTP server).
    • Note: it is a requirement that the SMTP server does not overwrite the sender domain (in the From: address) - the domain must stay as the full hostname.
  • The system is configured for time synchronization with a reliable time source.

[edit] Certificates

Before proceeding with the certificate, obtain a grid host certificate for this system from the APACGrid CA. The name in this certificate should be how other systems on the grid call your system (and must be the same as what your IP address resolves back to). If installing Davis on a separate system, get also a separate grid certificate for this box (again with the CN matching the reverse lookup of the system's IP).

For Davis, get a "commercial" certificate that would be trust in major browsers. This may depend on your site's policies and supplier preferences - just follow them, there's nothing special about this certificate, it only has to be trusted by browsers. The name in this certificate should be how your users will call this system. This may be the same as the irods system, or it can be a CNAME alias, or it can be a different hostname if Davis is installed on a separate system.


[edit] Installing Globus

iRODS will be built with support for Globus and GSI (Grid Security Infrastructure). Hence, the first part is to install Globus, including the development headers and libraries. Besides installing Globus, it will also be necessary to properly setup GSI (host certificate, CA certificates, CRLs...)

The probably easier path is to install Globus from VDT, as it comes with tools for handling CA certificates and all the related tasks. An alternative is to install Globus from source and install CA certificates separately.

[edit] Installing Globus from VDT

This is the easier path, suitable for systems supported by VDT

[edit] Installing host certificate

  • Install your host certificate into /etc/grid-security as hostcert.pem / hostkey.pem
mkdir -p /etc/grid-security
chown -R root:root /etc/grid-security
chmod 755 /etc/grid-security
...
chmod 644 /etc/grid-security/hostcert.pem
chmod 600 /etc/grid-security/hostkey.pem
chown root.root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem


[edit] Installing the VDT Globus-Base-SDK package

  • Install the VDT Globus-Base-SDK package
  • As root, download and setup pacman:
mkdir /opt/vdt
cd /opt/vdt
wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-latest.tar.gz
tar xf pacman-*.tar.gz
cd pacman-*/ && source setup.sh && cd ..
  • Set the environment:
    export VDTMIRROR=http://vdt.cs.wisc.edu/vdt_200_cache
  • Run the pacman installer to install Globus-Base-SDK is:
    pacman -get $VDTMIRROR:Globus-Base-SDK
  • Wait about a minute or two for the installer to prompt you to agree to licenses.
  • Have a cup of coffee - the download and installation may take 15-30 minutes.
  • Make the environment variable setup script created by VDT load in the default profile
ln -s /opt/vdt/setup.sh /etc/profile.d/vdt.sh
. /etc/profile

[edit] Configure VDT certificate distribution

VDT comes with a tool to download and update a certificate distribution, but requires the user to make an (informed) choice on which certificate distribution to trust. The VDT team is also creating a convenient distribution based on IGTF - but we do need to configure this tool to point to this distribution.

  • Run the following command to select the VDT distribution and install it into /etc/grid-security/certificates
    vdt-ca-manage setupca --location root --url vdt
    • Note: behind the scenes, the tool adds the following line to $VDT_LOCATION/vdt/etc/vdt-update-certs.conf:
      cacerts_url = http://vdt.cs.wisc.edu/software/certificates/vdt-igtf-ca-certs-version


  • Install the ARCS SLCS1 CA bundle: based on the instructions at http://wiki.arcs.org.au/bin/view/Main/SLCS, do the following steps:
    • Get the SLCS1 CA bundle, extract it into /etc/grid-security (creates arcs-slcs-ca subdirectory) and copy the files into /etc/grid-security/certificates
cd /etc/grid-security  
wget --no-check-certificate https://slcs1.arcs.org.au/arcs-slcs-ca.tar.gz -O - | tar xvz  
cd arcs-slcs-ca 
cp * /etc/grid-security/certificates  
    • Also tell the VDT certificates updater to include the files in the next certificates update: edit /opt/vdt/vdt/etc/vdt-update-certs.conf and add:
include=/etc/grid-security/arcs-slcs-ca/1ed4795f.0 
include=/etc/grid-security/arcs-slcs-ca/1ed4795f.namespaces 
include=/etc/grid-security/arcs-slcs-ca/1ed4795f.signing_policy

[edit] Enable VDT services

Start the VDT services - in particular, this enables cron jobs that (i) download CRLs and (ii) update the IGTF root certificate distribution.

vdt-control --enable fetch-crl vdt-rotate-logs vdt-update-certs
vdt-control --on

[edit] Installing Globus from source

This is the more difficult path - but likely to work on a broader range of systems.

[edit] Download and build Globus

  • Create a local globus user account. This account must be created beforehand (as a system account with a home directory created):
adduser globus -r --create-home
  • Create /opt/globus owned by globus
mkdir /opt/globus
chown globus.globus /opt/globus
  • As the globus user, run:
./configure --prefix=/opt/globus
make
# note: compiling Globus may take several hours
make install

[edit] Configure CA certificates for a Globus from source installation

Note that VDT installtion method effectively creates an indentical CA certificates setup

  • Install IGTF CA certificates from VDT-RPM
wget -P /etc/yum.repos.d/ http://vdt.cs.wisc.edu/vdt_rpms/vdt-ca-certs/vdt-ca-certs.repo
yum install vdt-ca-certs
  • Note: the CA certificates will not get automatically updated (as they would with the VDT CA Updater). To update the IGTF CA certificates distribution, explicitly run:
    yum --disablerepo=\* --enablerepo=vdt-ca-certs update


  • Install Fetch-CRL (2.8.5 at time of writing) from RPM
  • Run the following command regularly from a cron job:
    /usr/sbin/fetch-crl --loc /etc/grid-security/certificates --out /etc/grid-security/certificates --quiet
    • Put the following line into root's crontab (run crontab -e):
      3 1,7,13,19 * * * /usr/sbin/fetch-crl --loc /etc/grid-security/certificates --out /etc/grid-security/certificates --quiet >/dev/null 2>&1


  • Install the ARCS SLCS1 CA bundle: based on the instructions at http://wiki.arcs.org.au/bin/view/Main/SLCS, do the following steps:
    • Get the SLCS1 CA bundle, extract it into /etc/grid-security (creates arcs-slcs-ca subdirectory) and copy the files into /etc/grid-security/certificates
cd /etc/grid-security  
wget --no-check-certificate https://slcs1.arcs.org.au/arcs-slcs-ca.tar.gz -O - | tar xvz  
cd arcs-slcs-ca 
cp * /etc/grid-security/certificates  
    • As we are not using the VDT CA certificate updater, we are not registering the SLCS CA bundle files with the updater....

[edit] Setup Globus environment variables for a Globus from source installation

Note that these environment variables are supplied for you if Globus is installed by VDT

  • Create /etc/profile.d/globus.sh with the following contents:
GLOBUS_LOCATION=/opt/globus
GLOBUS_TCP_PORT_RANGE=40000,41000
export GLOBUS_LOCATION
export GLOBUS_TCP_PORT_RANGE

. $GLOBUS_LOCATION/etc/globus-user-env.sh
PATH=$GLOBUS_LOCATION/prima/bin:$PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GLOBUS_LOCATION/prima/lib

[edit] Installing iRODS

Credits: This section is based around the various documents in the ARCS iRODS wiki

  • Create rods group and user (provide custom uid/gid as suitable for your environment)
groupadd rods
useradd -g rods -m -d /home/rods -c "iRODS" rods
  • Note: it is necessary to have full Globus env setup while running irodssetup (else finishSetup breaks with Cannot scramble password)
  • Create /opt/iRODS owned by the rods user:
mkdir /opt/iRODS
chown rods.rods /opt/iRODS/
  • Extract the downloaded tarball into /opt/iRODS as the rods user:
cd /opt/iRODS
tar xzf ~rods/inst/irods2.3.tgz
mv iRODS iRODS-2.3
ln -s iRODS-2.3 iRODS
cd iRODS
  • IMPORTANT: for iRODS 2.3, patch the source code with the following patches:
    • irods-2.3-gsiauth-slave.patch (needed to get GSI authentication working on a slave server)
    • irods-2.3-gsiauth-iexecmdenv+arr.diff (needed to fix some issues with invocation of the createUser script in GSI authentication)
    • leak_server_misc_obj.patch (fix a leak in rodsServer.c)
wget -O - http://projects.arcs.org.au/svn/systems/trunk/dataFabricScripts/iRODS/BugFix/patches-2.3/irods-2.3-gsiauth-slave.patch | patch -p 1
wget -O - http://projects.arcs.org.au/svn/systems/trunk/dataFabricScripts/iRODS/BugFix/patches-2.3/irods-2.3-gsiauth-iexecmdenv+arr.diff | patch -p 1
wget -O - http://projects.arcs.org.au/svn/systems/trunk/dataFabricScripts/iRODS/BugFix/patches-2.3/leak_server_misc_obj.patch | patch -p 0
  • For iRODS 2.5:
    • the ACL policy patch listed above for iRODS 2.4 is still needed
    • fix a bug with data replication:
      • See discussion on this issue for full information
      • Add this line at the beginning of subroutine dataObjCopy() (around line #886 of rsDataObjRepl.c):
        bzero (&dataCopyInp, sizeof (dataCopyInp));
  • Prepare answers to the questions asked by the iRODS installer. Namely:
    • Host running iCAT-enabled iRODS server. For a test server, this is gridgw.canterbury.ac.nz. For a production server, this is ngdata.canterbury.ac.nz
    • Resource name for the resource created on your host. A recommended convention is to use your hostname as the name of the resource.
    • Resource storage area directory - this should be the filesystem path to your storage resource. On a test server, it's acceptable to use the Vault directory created within the iRODS tree - but recommended to greate the directory elsewhere (files will not get lost in an iRODS upgrade).
    • Existing iRODS admin login name (and password). The login name is rods. The author of this manual can tell you the password (which is different for the test server and for the production server).
    • iRODS zone name. This is BeSTGRID-DEV for a test server and BeSTGRID for a production server.
    • GLOBUS_LOCATION. This is /opt/vdt/globus if installing Globus from VDT and /opt/globus if installing from source. If your environment is setup correctly, irods can provide the correct default value. If it cannot, go back and fix your environment - the installation would break at a later stage.
    • GSI Install Type to use. Pick the globus flavour that matches your system. On a GNU/Linux 64-bit system, it would be gcc64dbg. On a 32-bit system, it would be gcc32dbg. On a non-GNU/Linux system (not using gcc), it would be vendorcc64dbg or vendorcc32dbg. You can get the list of available flavours with:
       $GPT_LOCATION/sbin/gpt-query globus_gssapi_gsi
  • Run irodssetup and enter the following non-default answers (hit
    ./irodssetup
    :
   Include additional prompts for advanced settings [no]? yes
   Build an iRODS server [yes]? (leave the default as yes)
   Make this Server ICAT-Enabled [yes]? no
   Host running iCAT-enabled iRODS server? gridgwtest.canterbury.ac.nz
   Existing iRODS admin login name [rods]? (leave the default as rods)
   Password [rods]? irods-admin-password
   zone: BeSTGRID (or BeSTGRID-DEV)
   Starting Server Port [20000]? 40000
   Ending Server Port [20199]? 40199
   Include GSI [no]? yes
   GLOBUS_LOCATION [/opt/vdt/globus]? 
   GSI Install Type to use? gcc64dbg
   Save configuration (irods.config) [yes]? 
   Start iRODS build [yes]? yes
  • Create a copy of the host certificate as irodscert/key.pem, readable by the rods user:
cd /etc/grid-security/
cp hostcert.pem irodscert.pem
cp hostkey.pem irodskey.pem
chown rods.rods irods{cert,key}.pem
  • Configure irods environment <tt>/etc/profile.d/irods.sh
GLOBUS_LOCATION=/opt/globus
export GLOBUS_LOCATION
. $GLOBUS_LOCATION/etc/globus-user-env.sh
IRODS_HOME=/opt/iRODS/iRODS
PATH=$IRODS_HOME/clients/icommands/bin:$PATH
export LD_LIBRARY_PATH IRODS_HOME PATH
MYPROXY_SERVER=myproxy.arcs.org.au
export MYPROXY_SERVER
  • Install iupdate into $IRODS_HOME/
cd $IRODS_HOME/clients/icommands/bin/
wget http://projects.arcs.org.au/svn/systems/trunk/dataFabricScripts/iRODS/utils/iupdate
  • Reload the environment
. /etc/profile
  • iRODS should have already been started by the setup script. As rods, first check if iRODS is already running with:
    /opt/iRODS/iRODS/irodsctl status
    • Note: on a slave server, you only want an iRODS server running (not a database server)
  • And if not running, start irods with:
    /opt/iRODS/iRODS/irodsctl start

[edit] iRODS post-configuration

  • Install default iRODS rules for BeSTGRID
    • Download bestgrid.irb and install it as $IRODS_HOME/server/config/reConfigs/bestgrid.irb
    • Edit the file and replace all occurrences of griddata.canterbury.ac.nz with the name of your local iRODS resource.
    • Activate the file by editing $IRODS_HOME/server/config/server.config: add bestgrid to the reRuleSet line:
reRuleSet   bestgrid,core

Note: at a later stage, we will be setting up automatic updates of rule files. But not now.

  • Install the BeSTGRID createUser script into $IRODS_HOME/server/bin/cmd/createUser
  1. Install createInbox.sh as $IRODS_HOME/server/bin/cmd/createInbox.sh
  • Install the BeSTGRID createUser.config file into $IRODS_HOME/server/config/createUser.config and customize this file accordingly:
    • change the path to the iCommands and to the createInbox script if you installed iRODS into a different location (and had a good reason for doing so)
    • configure the email address where account creation notifications should go. Direct it to yourself for a test server and leave help at bestgrid.org for a production server.
    • if your server's hostname is not suitable to be used as the From: domain in the notifications (rods@`hostname`), provide an alternative hostname in the H directive (and uncomment it)
    • configure the iRODS master server hostname in the M directive (gridgwtest.canterbury.ac.nz for a test server, ngdata.canterbury.ac.nz for a production server) (and uncomment the directive)


  • If your server has alternative hostnames to be referred to, configure all of them as aliases to localhost by listing them (with the proper one first) on a single line in server/config/irodsHost. Example (!!!use your irods hostname here):
    localhost hpcgrid1.canterbury.ac.nz hpcgrid1 hpcgrid1-c

[edit] Turning iRODS on

  • Download, from ARCS, and install the irods service control script into /etc/rc.d/init.d as irods
  • Add irods as a service automatically started with:
chkconfig --add irods
chkconfig irods on
  • This script starts irods as the rods user and sets the X509_USER_CERT / X509_USER_KEY variables to the host certificate/key.
  • If running Postgres (this would be the case only on a master server, or in a replicated database setup), do the same for the postgres service control script (chkconfig --add postgres ; chkconfig postgres on)
  • If running Postgres, then also tweak your /etc/rc.d/init.d/irods to point to the correct location (should be /opt/iRODS/Postgres/pgsql/lib) in the LD_LIBRARY_PATH settings).
  • If your server is having a problem initializing GSI (cannot find the certificate), set this variable also in the internal perl startup script: edit $IRODS_HOME/scripts/perl/irodsctl.pl and add (around line 258, after section "Overrides", before section "Check usage")
$ENV{'X509_USER_CERT'} = '/etc/grid-security/irodscert.pem';
$ENV{'X509_USER_KEY'} =  '/etc/grid-security/irodskey.pem';
  • Restart iRODS so that it's running with the proper environment:
    • First as rods, stop the already running iRODS server:
      /opt/iRODS/iRODS/irodsctl stop
    • Then as root, start iRODS via the service control script:
      service irods start

[edit] Deploying Davis

Davis is deployed as a standalone web application (it comes with the Jetty web applications container). It is recommended to deploy Davis behind Apache - in this case, Apache can take care of the https socket, and also of providing the Shibboleth login on the plain http interface. Hence, as a prerequisite to this section, the system where Davis will be installed should have Apache (and mod_ssl) already installed. Also, one of the first steps will be installing the Shibboleth SP software and registering the system into AAF - this is covered in a separate guide linked below. It is possible to skip this part and go without Shibboleth. Davis would be still providing the web and webDAV interface on https (authenticating via a MyProxy username and password) - and the Shibboleth interface can be added later.

This section is based on the Davis install guide, http://projects.arcs.org.au/trac/davis/wiki/HowTo/Install

Before proceeding with this section, agree on the hostname users would using for accessing this system (e.g., bestgrid-df.site.domain). For the rest of this section, the name would be referred to as DAVIS-HOSTNAME. Please substitute accordingly.

You will need X509 certificates issued in the name of DAVIS-HOSTNAME before proceeding. For a production server, these certificates MUST be issued by a CA trusted by the major web browsers.

Note: To upgrade Davis to a newer release, follow the notes on upgrading davis - which extract the steps that need to be re-applied to a new Davis installation.

  • Shibboleth: setup the system running Davis as a Shibboleth 2.x SP, using DAVIS-HOSTNAME in the URLs and in the entityID.
  • Configure Apache:
    • SSL: use DAVIS-HOSTNAME:443 as the ServerName in the SSL virtual host and use the proper certificates
--- ssl.conf.dist	2009-11-13 12:47:25.000000000 +1300
+++ ssl.conf	2010-03-26 14:53:43.000000000 +1300
@@ -85,2 +85,3 @@
 #ServerName www.example.com:443
+ServerName DAVIS-HOSTNAME:443
 
@@ -111,3 +112,3 @@
 # certificate can be generated using the genkey(1) command.
-SSLCertificateFile /etc/pki/tls/certs/localhost.crt
+SSLCertificateFile /etc/pki/tls/certs/DAVIS-HOSTNAME.crt
 
@@ -118,3 +119,3 @@
 #   both in parallel (to also allow the use of DSA ciphers, etc.)
-SSLCertificateKeyFile /etc/pki/tls/private/localhost.key
+SSLCertificateKeyFile /etc/pki/tls/private/DAVIS-HOSTNAME.key
 
@@ -128,2 +129,3 @@
 #SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt
+SSLCertificateChainFile /etc/pki/tls/certs/CA-CERTIFICATE-CHAIN.crt
 
  • Rather than modifying the default httpd.conf configuration file, we create a separate configuration file for the DataFabric-specific Apache configuration directives. By putting the file into /etc/httpd/conf.d, this file will be automatically included in the Apache configuration.
    • Create /etc/httpd/conf.d/df.conf and with following contents (and with more local configuration to follow later)
ServerName DAVIS-HOSTNAME
  • Create a davis account Davis will run under
groupadd davis
useradd -g davis -m -d /home/davis -c "Davis webDAV" davis
  • Create /opt/davis and make it owned by davis
mkdir /opt/davis
chown davis.davis /opt/davis
cd /opt/davis
tar xzf /tmp/davis-0.9.1.tar.gz
ln -s davis-0.9.1 davis
  • Note: earlier versions of Davis required updating Jargon to a more recent version (2.3) to make GSI login work. With Davis 0.9.1 and later, this is no longer needed.
  • Copy /opt/davis/davis/bin/jetty.sh into /etc/rc.d/init.d/davis
ln -s /opt/davis/davis/bin/jetty.sh /etc/rc.d/init.d/davis
chmod +x /opt/davis/davis/bin/jetty.sh
  • Install OpenJDK (Davis runs well under OpenJDK, but would also run under Sun JDK)
yum install java-1.6.0-openjdk java-1.6.0-openjdk-devel
  • Create /etc/default/jetty with configuration for Jetty (the web apps container Davis runs in). Change values as needed for your environment.
JETTY_HOME=/opt/davis/davis
JETTY_USER=davis
JAVA_HOME=/usr/lib/jvm/java
JAVA_OPTIONS="-server -Xms512m -Xmx768m"
  • Configure jetty: make sure SSL is disabled and AJP enabled in /opt/davis/davis/etc/jetty.xml
  • Configure jetty to only listen on the localhost interface (127.0.0.1) to protect it from remote connections. Edit /opt/davis/davis/etc/jetty.xml and set the host option for the AJP connector:
--- jetty.xml.dist	2010-03-25 15:39:39.000000000 +1300
+++ jetty.xml	2010-03-30 12:22:47.000000000 +1300
@@ -65,4 +65,5 @@
        <New class="org.mortbay.jetty.ajp.Ajp13SocketConnector">
          <Set name="port">8009</Set>
+         <Set name="host">127.0.0.1</Set>
          <Set name="ThreadPool">
            <New class="org.mortbay.thread.BoundedThreadPool">
  • Make Apache pass requests for /BeSTGRID (on a production server) or /BeSTGRID-DEV (on a test server): add the following to /etc/httpd/conf.d/df.conf (also require Shibboleth for /BeSTGRID on http frontend - skip the Location snippet if not installing Shibboleth yet)
ProxyRequests Off
ProxyPreserveHost On

ProxyPass /BeSTGRID ajp://localhost:8009/BeSTGRID flushpackets=on

<VirtualHost *:80>
  ServerName DAVIS-HOSTNAME
  DocumentRoot "/var/www/html"

  <Location /BeSTGRID>
  AuthType shibboleth
  ShibRequireSession On
  ShibUseHeaders On
  require shibboleth
  </Location>
</VirtualHost>
  • Note: because we are configuring ProxyPass only for the iRODS zone, we must make the dojoroot, images, and include directories visible to Apache
ln -s /opt/davis/davis/webapps/dojoroot /var/www/html
ln -s /opt/davis/davis/webapps/images /var/www/html
ln -s /opt/davis/davis/webapps/include /var/www/html
  • Davis (or rather the Jetty web service container it's running its) wants to store the web server request log into $HOME/logs. To have all the logs in one place, make this a symbolic link into the davis log directory. Run the following command as davis:
ln -s /opt/davis/davis/logs ~davis/logs
  • Enable auto startup (start later after configuring):
chkconfig --add davis

[edit] Customize Davis

  • Create /opt/davis/davis/webapps/root/WEB-INF/davis-host.properties
  • Modify the configuration accordingly):
    • zone-name - BeSTGRID for a production server and BeSTGRID-DEV for a test server
    • server-name: your iRODS server hostname
    • default-resource: your preferred default iRODS resource (typically our local iRODS resource)
    • favicon: replace DAVIS-HOSTNAME with proper value
    • insecureConnection: change to shib if you have successfully installed Shibboleth
webdavis.Log.threshold=WARNING
#webdavis.Log.threshold=DEBUG

shared-token-header-name=shared-token
cn-header-name=cn
admin-cert-file=/etc/grid-security/daviscert.pem
admin-key-file=/etc/grid-security/daviskey.pem

organisation-name=BeSTGRID
authentication-realm=BeSTGRID Data Fabric
# use new logo from BeSTGRID branding
organisation-logo=/images/lg_BeSTGRID-DataFabric.gif
organisation-logo-geometry=400x70
#organisation-logo=/images/bestgrid-logo-32x32.gif
#organisation-logo-geometry=32x32
favicon=http://DAVIS-HOSTNAME/favicon.ico
organisation-support=BeSTGRID technical staff at help@bestgrid.org
helpURL=http://technical.bestgrid.org/index.php/Using_the_DataFabric

anonymousCredentials=irods\\anonymous:anything
anonymousCollections=/ARCS/projects/public,/ARCS/projects/open,/BeSTGRID/projects/public,/BeSTGRID/projects/open,/BeSTGRID-DEV/projects/public,/BeSTGRID-DEV/projects/open

myproxy-server=myproxy.arcs.org.au
server-type=irods
server-port=1247
#default-idp=arcs idp
default-idp=myproxy
zone-name=BeSTGRID

server-name=iRODS-HOSTNAME # ngdata.canterbury.ac.nz
default-resource=iRODS-RESOURCE-NAME # griddata.canterbury.ac.nz
insecureConnection=block

# new options in Davis 0.9.3
shib-init-path=/Shibboleth.sso/Login
disable-replicas-button=true
ui-include-head=<!-- Google Analytics code -->    \
<!-- put the code snippet here  with -->          \
<!-- trailing backslashes on all but last line -->          

administrators=firstname.lastname,joe.otheradmin

#to allow MacOS computers connect using finder
webdavUserAgents=WebDAVFS
browserUserAgents=

login-image=https://df.bestgrid.org/images/bestgrid-logo.gif

# configure later: QuickShare

# not loading: PID objects
  • Disable ARCS default configuration:
cd /opt/davis/davis/webapps/root/WEB-INF
mv davis-organisation.properties davis-organisation.properties.disabled
  • Make the host certificate available to the davis user as daviscert.pem + daviskey.pem (this is only needed if installing Shibboleth - Davis then uses it's own certificate to talk to iRODS)
cd /etc/grid-security
cp hostcert.pem daviscert.pem
cp hostkey.pem daviskey.pem
chown davis.davis daviscert.pem daviskey.pem
  • For Shibboleth login to work, Davis must be able to authenticate to iRODS as the rods user, via GSI with the davis grid host certificate. Add the DN from the davis grid host certificate as an additional DN for the rods user with:
iadmin aua rods '/C=NZ/O=BeSTGRID/OU=University of Canterbury/CN=gridgwtest.canterbury.ac.nz'
    • substituting the DN for the actual DN of your host certificate. You can get the DN with:
openssl x509 -subject -noout -in /etc/grid-security/daviscert.pem

[edit] Monitoring Davis use with Google Analytics

To monitor the Davis use (via a browser) with Google Analytics, use the Davis ui-include-head configuration directive in davis-host.properties to insert the javascript code snippet into the UI HTML just before the closing </head tag (note that this is preferred over modifying /opt/davis/davis/webapps/root/WEB-INF/ui.html).

Example:

ui-include-head=<!-- Google Analytics -->       \n\
<script type="text/javascript">                 \n\
  var _gaq = _gaq || [];                        \n\
  _gaq.push(['_setAccount', 'UA-1896366-16']);  \n\
  _gaq.push(['_trackPageview']);                \n\
  (function() {                                 \n\
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; \n\
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    \n\
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);    \n\
  })();                                         \n\
</script>

[edit] Davis GSI configuration

Davis needs proper GSI configuration for authenticating to the iRODS server - not only the host certificate configured above, but also the full set of trusted CAs, and CRLs. If you are installing Davis on the same system as iRODS, this has already been done when installing Globus.

If you are installing Davis on a standalone system, the following must be done:

  • Install IGTF CA certificates from VDT-RPM
wget -P /etc/yum.repos.d/ http://vdt.cs.wisc.edu/vdt_rpms/vdt-ca-certs/vdt-ca-certs.repo
yum install vdt-ca-certs
  • Run the following command regularly from a cron job:
    /usr/sbin/fetch-crl --loc /etc/grid-security/certificates --out /etc/grid-security/certificates --quiet
    • Put the following line into root's crontab (run crontab -e):
      3 1,7,13,19 * * * /usr/sbin/fetch-crl --loc /etc/grid-security/certificates --out /etc/grid-security/certificates --quiet >/dev/null 2>&1
  • Install the ARCS SLCS1 CA bundle: based on the instructions at http://wiki.arcs.org.au/bin/view/Main/SLCS, do the following steps:
    • Get the SLCS1 CA bundle, extract it into /etc/grid-security (creates arcs-slcs-ca subdirectory) and copy the files into /etc/grid-security/certificates
cd /etc/grid-security  
wget --no-check-certificate https://slcs1.arcs.org.au/arcs-slcs-ca.tar.gz -O - | tar xvz  
cd arcs-slcs-ca 
cp * /etc/grid-security/certificates  
    • As we are not using the VDT CA certificate updater, we are not registering the SLCS CA bundle files with the updater....


[edit] Battling firewall connections drop problem for irods/davis

The University of Auckland firewall presents a problem for irods because of the way the connections are terminated - the buckets for them are silently removed, so that the TCP sides receive no notification about the connection being terminated. As such, anything sent across will be devoured by a firewall sink and both sides will be left dangling and without any mean to obtain the real status of the connection. In order to counteract this beahviour, keep alives for TCP can be used to keep the TCP connections to the CAT server active.

The following data is to be added to /etc/syslog.conf on the slave side:

net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_intvl=5
net.ipv4.tcp_keepalive_probes=5

and then sysctl -p command to be issued to propagate the changes. Unfortunately, keepalives are only sent when a socket actively configured to do so by the application using setsockopt interface. In order to utilise keepalives it was decided to use libkeepalive [1] library, which is registered in /etc/ld.so.preload in order to have the library loaded into every starting application. The library wraps a call to socket() by setting setsockopt/KEEPALIVE automatically.

/etc/ld.so.preload file may look like this:

/usr/lib/libkeepalive.so

provided libkeeplive.so file is installed in /usr/lib/ directory. In short, /etc/ld.so.preload contains the list of shared libraries that are to be preloaded for every application. Effectively it is the same as having the library listed in LD_PRELOAD environment variable.

iRODS server should restarted in order to force libkeeplive into its address space:

/opt/iRODS/iRODS/irodsctl restart


[edit] User database update

The main DataFabric server (df.bestgrid.org) is hosting a local MySQL database that is storing additional information about users on top of what iRODS stores in the user table. This information particularly covers the affiliation and contact details of the users and is collected from the Shibboleth headers when the user accesses the web interface to the DataFabric - see [Administering_the_DataFabric#Registering_DataFabric_users] for more information.

On a slave server, install the same script as documented at this link, but instead of creating a local database on the slave server, configure the script to access the remote MySQL at df.bestgrid.org directly. Talk to the master server administrator about getting a MySQL account setup for that.

[edit] Deploying Griffin

Griffin is the GridFTP to iRODS interface developed by ARCS's Shunde Zhang.

Installing Griffin is optional - only if you really need a local GridFTP interface into irods, otherwise use the central one - gsiftp://df.bestgrid.org:2811/BeSTGRID

Hence, this section is a bit terser....

Follow http://projects.arcs.org.au/trac/griffin

Plan: run as user davis, install griffin in /opt/griffin

  • As root:
mkdir /opt/griffin
chown davis.davis /opt/griffin
  • As davis:
wget http://projects.arcs.org.au/trac/griffin/raw-attachment/wiki/releasenotes/0.7.0/griffin-0.7.2-jargon.tar.gz
tar xzf griffin-0.7.2-jargon.tar.gz
cd griffin-0.7.2-jargon
./install.sh /opt/griffin
  • On top of the Griffin manual:
  • Create /etc/default/griffin
APP_HOME=/opt/griffin
JAVA_OPTIONS="-Dlog4j.configuration=file:/opt/griffin/log4j.properties -server -Xms128m -Xmx384m"
APP_USER=davis
  • Copy griffin-0.7.2-jargon/griffin into /etc/rc.d/init.d - as root:
cp griffin-0.7.2-jargon/griffin /etc/rc.d/init.d/
  • Edit /opt/griffin/griffin-ctx.xml
    • Point to service.cert and service.key readable by the user (davis) => daviskey.pem and daviscert.pem
    • set serverName to ngdata.canterbury.ac.nz (local iRODS server)
    • set defaultResource to "griddata.canterbury.ac.nz" (local iRODS resource)
    • comment out mapfile (not needed if the slave server is properly patched)
    •  ??set mapFile and updates??
  • modify /opt/griffin/log4j.properties
    • reduce irods logging
log4j.logger.edu.sdsc.grid.io.irods=WARN
    • comment out
#log4j.logger.au.org.arcs.griffin=DEBUG
  • Register & start griffin service
chkconfig --add griffin
service griffin start
  • Note: when testing Griffin with UberFTP, disable data channel authentication with "dcau n". Else all directory listing and file transfer commands fail with
536 Specified protection level not supported. 
  • Note: in order for Griffin to work without a mpafile, BOTH client and server with the gsi-auth patch:
  • Note: in order for account creation to work on a slave (with setting a password), createUser must issue the iadmin moduser password command to the Master server:
irodsHost=ngdata.canterbury.ac.nz iadmin moduser vladimir.mencl password test
  • Note: in order for CD with absolute paths to work, the root of the filesystem must be readable to all.
ichmod read public /