Administering the DataFabric

From BeSTGRID

Jump to: navigation, search

This page should document tasks system administrators may be required to do on the DataFabric.

An additional resource one should consult when seeking more information beyond what's documented here is:

Contents

[edit] User administration

[edit] Linking DN and sharedToken in a single account

The DataFabric automatically creates an account for a user on first access - and if users have multiple identities (such as a Shibboleth login AND an APACGrid certificate), the DataFabric would create two separate accounts.

It is possible to list the two identities together. Ideally, the user should request linking the two identities before the second iRODS account is created - but that still can be worked around (by deleting the other account) and adding the authentication information for both identities to the user's primary account.

Before proceeding, gather information on both user identities and make sure these two identities represent the same person.

The information to be gathered is:

  1. User's full DN from the X509 APACGrid certificate.
    • Can be retrieved with
      openssl x509 -subject -noout -in $HOME/.globus/usercert.pem
  2. User's Common Name (CN) and the Shared Token from the Shibboleth login.
    • Can be gathered by asking the user to access http://df.bestgrid.org/shared-token/
      • After visiting this page, the Shibboleth attributes received by the DataFabric are stored in ngdata.canterbury.ac.nz:/var/www/html/shared-token/.htlog/sharedtoken-sso.log
  3. User's iRODS username (displayed in the web interface, can be looked up in iRODS)

The following steps would link the user's two identities together - and must be performed with an iRODS administrator login (typically the rods account).

  • If the user's iRODS username is not known, look it up by listing all user accounts:
    iadmin lu
  • Display detailed information about the user with:
    iadmin lu <username>
  • List all authentication information associated with the user:
    iadmin lua <username>
  • If the user got accidentally two accounts created, agree with the user on which of the accounts to delete - and delete the account with:
    iadmin rmuser <username>
    • Note: the account must not own any files in order to be deleted. Make sure all files are deleted (or moved to a different account) and also empty the trash for the user - can be done with:
      irmtrash -M -u <username>
  • To add a user's DN to an existing account (typically created for a Shibboleth login), run:
    iadmin aua <username> <userDN>
    • The user DN would have to be quoted to be passed as a single argument. Example:
      iadmin aua vladimir.mencl '/C=NZ/O=BeSTGRID/OU=University of Canterbury/CN=Vladimir Mencl'
  • To add a user's Shibboleth identity to an existing account (typically created for an X509 certificate):
    • Add the user's SLCS-based DN to the irods account with
      iadmin aua <username> <userDN>
      • Example:
        iadmin aua vladimir.mencl '/DC=nz/DC=org/DC=bestgrid/DC=slcs/O=University of Canterbury/CN=Vladimir Mencl -2vdKb_4CoiSg1P_uGfB9YTRJLo'
      • Note: to construct the exact DN from the CN and SharedToken (and the user's institution), refer to https://slcs1.arcs.org.au/idp-acl.txt for the institution-specific prefix. The SLCS DN is then constructed as "$institutionPrefix/CN=$cn $sharedToken"
    • Add the user's shared token to the account information with:
      iadmin moduser <username> info '<ST>shared-token-value</ST>'
      • Example:
        iadmin moduser stuart.charters info '<ST>sEKLKTK5obsy6qGN4GsF1PFJy3w</ST>'

Now the user should be able to login with either of the two identities and both should map to the same DataFabric/iRODS account.

[edit] Setting up a project

Note: Mounted collections (an iRODS 2.3+ feature allowing e.g. to link a project directory from /BeSTGRID/home to /BeSTGRID/projects) are not supported by the Jargon library (and consequently Davis) yet (Davis 0.9.2 as of 2010-08-10, ). Hence, the users should be instructed to access their project solely via /BeSTGRID/home, and the collections under /BeSTGRID/projects should not be used.

The DataFabric is suitable for hosting the data for collaborative projects. In this setting, the project data should be stored in a collection under BeSTGRID/home, named after the project. All of the users collaborating on the project would be members of a project group, and the group membership would give them access to the project directory.

Setting up a project consists of:

  • Creating the project group (this also creates a home directory for the group, which will be the project directory).
  • Adding all the users to the project group
  • Giving the project full control over the directory (and setting the inherit flag to make the same permissions apply to newly created files and folders).

Before setting up the project, get the following information from the project leader:

  • Project acronym/codename: this would be used both for the iRODS group and for the project collection (directory).
  • List of project members.
    • Ask the project leader and all project members to login to the DataFabric at least once - so that their account gets created.
  • Optionally: get an estimate of the total space used for the project.

Record the project in List of DataFabric projects

The following commands should be run as an iRODS administrator (typically the rods user):

  1. Create the project group:
    iadmin mkgroup <project group name>
  2. Add the users to the project group: run this command for each user working on the project:
    iadmin atg <project group name> <username>
    • Note: if not all project members haven an iRODS account yet, add them to the group later - no problem with that.
  3. The project group got a home collection created as /BeSTGRID/home/<project group name>, with the group having ownership on that group. Make the permissions propagate to all subfolders / files:
    ichmod inherit /BeSTGRID/home/<project group name>
    • If this fails, with an error message saying rods user does not have access to that directory, tell ichmod to do the operation as one of the users already in that group:
      clientUserName=<username> ichmod inherit /BeSTGRID/home/<project group name>
  4. Check the permissions with
    ils -A /BeSTGRID/home/<project group name>
    • The output should list the project group as having the own privilege (all member user accounts will be listed too) and should say: Inheritance - Enabled

Example: setting up the BCprognosis project for Mik Black, so far making Mik Black the only member of the group:

iadmin mkgroup BCprognosis
iadmin atg BCprognosis mik.black
clientUserName=mik.black ichmod inherit /BeSTGRID/home/BCprognosis
ils -A /BeSTGRID/home/BCprognosis

The output from ils -A /BeSTGRID/home/BCprognosis is:

/BeSTGRID/home/BCprognosis:
        ACL - BCprognosis#BeSTGRID:own   mik.black#BeSTGRID:own   
        Inheritance - Enabled

[edit] Making a project publicly accessible

To make a the files and directories in a project publicly readable:

  • give the group public and the user anonymous read access to the project collection:
    clientUserName=<username> ichmod read public /BeSTGRID/home/<project collection name> ; clientUserName=<username> ichmod read anonymous /BeSTGRID/home/<project collection name>
  • add the project collection to anonymously accessible collections: add this project home directory to the list of collections in anonymousCollections (comma separated) in /opt/davis/davis/webapps/root/WEB-INF/davis-host.properties. Example:
anonymousCollections=/BeSTGRID/home/GeoFabric,/ARCS/projects/public,/ARCS/projects/open,/BeSTGRID/projects/public,/BeSTGRID/projects/open,/BeSTGRID-DEV/projects/public,/BeSTGRID-DEV/projects/open
  • configure Shibboleth not to require a session when accessing the project collection directly: add the following snippet into the <VirtualHost *:80> section in /etc/httpd/conf.d/df.conf:
 <Location /BeSTGRID/home/GeoFabric>
 ShibRequireSession Off
 </Location>
  • Note: these changes require reloading Apache and restarting Davis. We apologize for the inconvenience and hope Jargon+Davis will support mounted collections soon.


[edit] Upgrading iRODS

On a slave server, this process is quite simple: stopping iRODS, building new iRODS version, starting iRODS.

On a master server, this process has to also include updates to the ICAT database. Updates on a master server should be done first, either followed or in parallel with update on slave servers.

For more information, please see:

[edit] Stop iRODS and make backup

As rods:

  • Stop the iRODS server
cd /opt/iRODS/iRODS
./irodsctl istop
  • On a master server:
    • make a full database backup:
/opt/iRODS/Postgres/pgsql/bin/pg_dump ICAT > /tmp/irods-backup.sql
    • and stop the database server:
./irodsctl dbstop
  • Backup iRODS configuration
cp /opt/iRODS/iRODS/config/irods.config /tmp/irods.config-backup


[edit] Create new iRODS directory & unpack

  • the iRODS tarball will extract all files into "iRODS" - let's make that a symlink to an "iRODS-2.4" directory.
cd /opt/iRODS
mkdir iRODS-2.4
ln -snf iRODS-2.4 iRODS
tar xzf ~/inst/irods2.4.tgz
    • Patch the newly extracted source tree as needed

[edit] Run irodsconfig in upgrade mode

  • Go into the NEW iRODS directory
cd /opt/iRODS/iRODS
  • Run
./irodssetup --upgrade

You may get a prompt asking if you have already installed database patches. Answering "no" will terminate the process, but we learn the patches to apply. For upgrading from 2.3 to 2.4, the patch is: psg-patch-v2.3tov2.4.sql.

  • If upgrading a master server, install the patches now (next section)
  • If upgrading a slave server, make sure the master has been already upgraded (and the database patched) and skip the next section and continue building iRODS.

[edit] Apply database patch

This is only relevant when upgrading a master server.

  • Switch the iRODS symlink back to previous version
cd /opt/iRODS
ln -snf iRODS-2.3 iRODS
  • Bring up database server for old iRODS
cd /opt/iRODS/iRODS
./irodsctl dbstart
  • Patch the database by running psql with input redirected from a patch in the NEW iRODS directory:
/opt/iRODS/Postgres/pgsql/bin/psql ICAT < /opt/iRODS/iRODS-2.4/server/icat/patches/psg-patch-v2.3tov2.4.sql
    • Note: this may produce an error message. For the 2.3 to 2.4 upgrade, the error message does not really matter: that was for dropping a table that does not exist yet:
ERROR:  table "r_rule_base_map" does not exist
  • Stop database again
./irodsctl dbstop
  • Switch the iRODS symlink again to the new version
cd /opt/iRODS
ln -snf iRODS-2.4 iRODS

[edit] Upgrade - take 2

Copy irods.config from the old version into the new one

cp /opt/iRODS/iRODS-2.3/config/irods.config /opt/iRODS/iRODS/config/ 
  • Run the installer again
cd /opt/iRODS/iRODS
./irodssetup --upgrade
  • Answer the three questions with default "yes" answers:
   Have you run one of those? [yes]? yes
   Use the existing iRODS configuration without changes [yes]? yes
   Start iRODS build [yes]? yes

This completes the setup and starts iRODS.

[edit] Reapply local iRODS changes

Reapply all the steps from iRODS post-configuration

  • rule file
  • server config to load the rule
  • createUser script + config + createinboxsh
  • server/config/irodsHost - local alias names
  • X509_USER_CERT/KEY
cd /opt/iRODS
cp iRODS-2.3/server/config/createUser.config iRODS/server/config/
cp iRODS-2.3/server/bin/cmd/createUser iRODS/server/bin/cmd/        
cp iRODS-2.3/server/bin/cmd/createInbox.sh iRODS/server/bin/cmd/     
cp iRODS-2.3/server/config/reConfigs/bestgrid.irb iRODS/server/config/reConfigs/
  • edit iRODS/server/config/server.config and change
reRuleSet   core

to

reRuleSet   bestgrid,core
  • if needed (multiple hostnames), edit iRODS/server/config/irodsHost
  • optionally, re-edit $IRODS_HOME/scripts/perl/irodsctl.pl to add (around line 258, after section "Overrides", before section "Check usage")
$ENV{'X509_USER_CERT'} = '/etc/grid-security/irodscert.pem';
$ENV{'X509_USER_KEY'} =  '/etc/grid-security/irodskey.pem';

[edit] Restoring database from backup

If during the database upgrade it becomes necessary to restore from backup: drop the database and restoring from backup:

/opt/iRODS/Postgres/pgsql/bin/dropdb ICAT
/opt/iRODS/Postgres/pgsql/bin/createdb ICAT
/opt/iRODS/Postgres/pgsql/bin/psql ICAT < /tmp/irods-backup.sql 


[edit] Setting up an outage notice

To make Apache display an outage notice while Davis is not running (e.g., because Davis or iRODS are being upgraded):

  • Put the outage notice text into: /var/www/html/SystemOutage.html
  • Add the following into /etc/httpd/conf.d/df.conf:
    ErrorDocument 503 /SystemOutage.html
  • Reload Apache configuration:
    service httpd reload</tt>
    
The outage notice activates once Davis becomes unreachable - i.e., after <pre>service davis stop
  • An ARCS extension to this: temporarily apply the Outage notice also while Davis is running but exclude sys-admin boxes and let them access Davis directly (using *Deny* to block access to Davis and setting up the same outage document as the error document for HTTP 403):
#
# Enables 'Outage' mode where normal users see an outage notice but admins in the list below can see
# the website.
#
<Location />
    Order deny,allow
# Enable this line during outage so that users will see outage page but IPs listed can see DF
    Deny from all
# Put all admin addresses here. These addresses will see the website rather than an outage message.
    Allow from scad.hpsc.csiro.au obstler.ivec.org
</Location>


ErrorDocument 503 /SystemError.html
ErrorDocument 403 /SystemOutage.html
ProxyPass /SystemError.html !
ProxyPass /SystemOutage.html !
<Location /SystemOutage.html>
  Allow from all
</Location>
<Location /SystemError.html>
  Allow from all
</Location>

[edit] Emailing out an outage notice

When there is a need to contact all DataFabric user, we can use the information collected in the MySQL database - which includes email addresses.

A convenient script for that is ngdata.canterbury.ac.nz:/home/rods/bin/emailUsers.sh

  • Invocation:
    emailUsers.sh email-text.txt [SQL expression]
    • The email-text.txt file would include the Subject: header (separated by a blank line from the body).
    • The SQL expression can be used instead of the default (all users) and I've used it for testing what the mailing would look like:
      emailUsers.sh outage-2011-08-24.txt "select duEmail from dfUser WHERE duCN='Vladimir Mencl';"
      We might possibly use it also to select users from a particular institution or users who have logged in in the last 3 months...

[edit] Registering DataFabric users

The DataFabric automatically creates user accounts on first access - via Shibboleth or GSI. As a temporary workaround before we get a BeSTGRID user management tool in place, the following service has been setup as an extension of Davis to collect additional information about users - namely, their institutional affiliation and email address.

The service runs on the same host and Davis, and lives at a separate Shibboleth protected URL. The URL is loaded from within the Davis UI, and by "touching" the URL, the services gets to collect the attributes present in the Davis session.

The (PHP-based) service stores the user database in a local MySQL table, so first:

  • Install MySQL and the PHP MySQL module:
yum install mysql-server
yum install php-mysql
service mysqld start
chkconfig mysqld on
  • Create the MySQL table:
CREATE DATABASE dfUsers;
CREATE USER 'dfTracker'@'localhost' IDENTIFIED BY 'DB-PASSWORD';
GRANT ALL PRIVILEGES ON dfUsers.* TO 'dfTracker'@'localhost';
ALTER DATABASE dfUsers DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

use dfUsers;

create table dfUser (
 duSharedToken VARCHAR(50) NOT NULL,
 duCN varchar(100),
 duEmail varchar(100),
 duIdP varchar(100),	
 duUsername varchar(100),
 duOrgName varchar(100),
 duAffiliation varchar(100),
 duFirstAccess timestamp,
 duLastAccess timestamp,
 PRIMARY KEY  (duSharedToken)
);
  • Note: replace DB-PASSWORD with the database password
  • Note: adding duAffiliation later: modify structure with:
    alter table dfUser add column duAffiliation varchar(100) after duOrgName;
  • As for the database schema: we may hold DNs in the future in a separate table (linked via... shared token?)
  • Add Shibboleth protection for this service (not requiring a session): add the following to /etc/httpd/conf.d/davis.conf (outside a <VirtualHost> section!)
<Location /dfusers>
  AuthType shibboleth
  ShibRequestSetting requireSession 0
  require shibboleth
</Location>
  • Use the Davis ui-include-head configuration directive in davis-host.properties to load the URL from the Davis UI (this directive injects the script just before the closing </head> tag).
    • Note that this is preferred over modifying /opt/davis/davis/webapps/root/WEB-INF/ui.html
    • Note also that if also using this directive for Google Analytics, both scriplets need to be specified in a single ui-include-head value:
ui-include-head=<!-- DataFabric user registration -->           \n\
<script language="javascript">                  \n\
  if (typeof XMLHttpRequest != "undefined") {   \n\
    var userreg_client = new XMLHttpRequest();  \n\
    userreg_client.open("POST", "/dfusers/userreg.php");   \n\
    userreg_client.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");   \n\
    userreg_client.send("username=<parameter account/>");  \n\
  };                                            \n\
</script>
  • Note: the username itself is not available via Shibboleth and is passed to the service via the username attribute. Davis substitutes <parameter account/> with the account name when rendering the UI to the user. This processing is done also on the value of ui-include-head, so relying on the this substitution inside the snippet is all fine.
  • Request the following attributes from AAF:
    • SharedToken (required)
    • CommonName (required)
    • Email address (required)
    • Organization Name
    • Affiliation

[edit] Making list of users available

This section talks about browsing the MySQL database with BeSTGRID users through a very simple HTML interface, protected by Shibboleth and accessible only to the DataFabric administrators.

  • Make this directory protected by Shibboleth, allowing only the administrators in (based on their shared token values). Add the following to /etc/httpd/conf.d/davis.conf (outside a <VirtualHost> section!) and reload Apache:
<Location /dfuseradmin>
  AuthType shibboleth
  ShibRequireSession On
  require shared-token "Y7rpGFpSV8z7TRK288wcQo9Eo_M" "-2vdKb_4CoiSg1P_uGfB9YTRJLo" "FxreQkk5UID8ZzwxpKR9tB7Tw1Q"
  # shared token values: Nick Jones | Vlad | Gene (UoA)
</Location>

[edit] Upgrading Davis

To upgrade Davis (do these steps preferrably as user davis):

  • unpack the new distribution into a version-specific directory in /opt/davis (e.g, /opt/davis/davis-0.9.0b)
cd /opt/davis
tar xzf /home/davis/inst/davis-0.9.0b-vlad.tar.gz
cd davis-0.9.0b
  • Reapply all of the changes done to the current Davis installation. These instructions assume:
    • The currently running version of Davis is in /opt/davis/davis (a symbolic link)
    • The new version of Davis is in the current directory (e.g, /opt/davis/davis-0.9.0b)
  • (NO LONGER NEEDED as of Davis 0.9.5.dev-r700) Re-apply changes to etc/jetty.xml - feed the following patch to patch -p 0
--- etc/jetty.xml.dist	2010-03-25 15:39:39.000000000 +1300
+++ etc/jetty.xml	2010-03-30 12:22:47.000000000 +1300
@@ -65,4 +65,5 @@
        <New class="org.mortbay.jetty.ajp.Ajp13SocketConnector">
          <Set name="port">8009</Set>
+         <Set name="host">127.0.0.1</Set>
          <Set name="ThreadPool">
            <New class="org.mortbay.thread.BoundedThreadPool">
  • Make bin/jetty.sh executable again
chmod +x bin/jetty.sh
  • again, disable ARCS specific configuration
( cd webapps/root/WEB-INF ; mv davis-organisation.properties davis-organisation.properties.disabled )
  • copy the Davis configuration file from the current Davis installation
cp /opt/davis/davis/webapps/root/WEB-INF/davis-host.properties webapps/root/WEB-INF/davis-host.properties
  • Copy over BeSTGRID logos
cp /opt/davis/davis/webapps/images/bestgrid-logo* webapps/images
  • Re-apply BeSTGRID branding (davis.css and images):
( cd webapps ; tar xzf ~/inst/davis-bestgrid-branding.tar.gz )
  • (NOT NEEDED as of Davis 0.9.5.dev.r700) Reapply changes to /opt/davis/davis/webapps/root/WEB-INF/ui.html if applicable (such as Google Analytics monitoring or Registering DataFabric users)
  • Make sure all the files are owned by davis.davis (e.g., if the above steps were done as root)
chown -R davis.davis .
  • Shut the old version down, switch the symbolic link, bring the new version up (do these as root):
service davis stop
ln -snf davis-0.9.0b /opt/davis/davis
service davis start

[edit] Reloading Davis configuration

  • Use a Davis URL with the ?reload-config query string appended - like http://df.bestgrid.org/BeSTGRID?reload-config
    • Note that your iRODS user name must be listed in the Davis administrators directive to get access to this feature

[edit] Replicating files across multiple resources

In iRODS, a file can have multiple replicas across multiple resources. In the default configuration, a file is created with only one replica on the default resource. It may be later replicated (with the irepl) command, or iRODS rules may be configured to automatically replicate the file upon creation (either synchronously or via delayExec). Due to issues with the instantaneous replication, the safer solution is to replicate files afterwards explicitly. This section documents configuring a replication script that scans the whole DataFabric (or a subtree) and replicates files that do not have the minimal amount (2) of replicas.

  • To get the script going:
    • install timeout from (CentOS5) /usr/share/doc/bash-3.2/scripts/timeout
cp /usr/share/doc/bash-3.2/scripts/timeout /usr/local/bin/
chmod +x /usr/local/bin/timeout 
  • Create a replication set resource group:
  • create a REPLISET resource group: both for production and test DataFabric
    • As rods#BeSTGRID-DEV, run:
iadmin atrg BeSTGRID-DEV-REPLISET gridgwtest.canterbury.ac.nz
iadmin atrg BeSTGRID-DEV-REPLISET ngdata.vuw.ac.nz
    • As rods#BeSTGRID, run:
iadmin atrg BeSTGRID-REPLISET griddata.canterbury.ac.nz
iadmin atrg BeSTGRID-REPLISET irods.ceres.auckland.ac.nz


  • Invocation:
./replicatior.sh [-s] [-n] Resource Collection
    • Example: replicate rods' home directory on test DataFabric:
./replicator.sh -s BeSTGRID-DEV-REPLISET /BeSTGRID-DEV/home/rods
  • Replicate all of the BeSTGRID DataFabric
./replicator.sh -s BeSTGRID-REPLISET /BeSTGRID
  • Note: this script is so far (as of Dec 10, 2010) being used for one-off replication and not yet for continuous operation.

[edit] Configuring QuickShare

The QuickShare feature allows each DataFabric user to easily share files they own (have uploaded them) by generating a link that can be externally accessed by anyone who has the link (with a unique hash). The link can be accessed without having a DataFabric/iRODS account - which makes it much easier to share files with either individual collaborators who do not have/want DataFabric accounts - or making a file public.

The QuickShare feature is implemented by a separate servlet running on the same server as Davis. The servlet is connecting to iRODS separately, under its own account (typically called QuickShare).

To get QuickShare going (in Davis 0.9.3+):

  • Create the iRODS QuickShare user - as an ordinary rodsuser user, picking a random password
iadmin mkuser QuickShare rodsuser
iadmin moduser QuickShare password QUICKSHARE-PASSWORD
  • Add the following attributes to the Davis configuration (davis-host.properties)
# QuickShare: metadata attribute name
sharing-key=QuickShare

# QuickShare: iRODS account & and server connection details
sharing-user=QuickShare
sharing-password=QUICKSHARE-PASSWORD
sharing-host=irods.institution.ac.nz
sharing-port=1247
sharing-zone=BeSTGRID

# QuickShare - URL prefix for the Davis server
sharing-URL-prefix=https://df.bestgrid.org/quickshare
  • Patch your iRODS server so that it can can properly expand the userNameClient variable in acAclPolicy rule (used below). Apply the aclpolicy_username.patch patch to server/api/src/rsGenQuery.c and recompile and restart your iRODS server.
  • Add a rule exempting the QuickShare user from the acAclPolicy STRICT policy to bestgrid.irb (replacing the existing acAclPolicy rule)
    acAclPolicy|"$userNameClient" != "QuickShare"|msiAclPolicy(STRICT)|nop
  • Add a ProxyPass directive for /quickshare into /etc/httpd/conf.d/df.conf:
    ProxyPass /quickshare ajp://localhost:8009/quickshare flushpackets=on
  • Restart Davis and reload Apache
service davis restart
service httpd reload

[edit] Setting up GeoIP redirect

GeoIP-enabled landing page providers users links from the landing page (http://df.bestgrid.org/) to the "home" URL (/BeSTGRID/home/) on the nearest DataFabric server (Auckland or Canterbury at the moment)

Original instructions are at https://wiki.auckland.ac.nz/display/BeSTGRID/Using+geolocation+information+for+choosing+the+nearest+Davis+server+%28GeoDavis%29, this is the short and up to date version to get it all going

  • Get and build GeoIP
svn co https://subversion.ceres.auckland.ac.nz/BeSTGRID/geoip/ geoip
cd geoip
make
<pre>
* Install geoip.cgi into /opt/geoip/bin/geoip
<pre>
mkdir -p /opt/geoip/bin
cp geoip.cgi /opt/geoip/bin/geoip
$df_servers = "gridgwtest.canterbury.ac.nz:ngdata.vuw.ac.nz";
$df_path = "/BeSTGRID-DEV/home/";
$df_title = "BeSTGRID TEST DataFabric";

[edit] Disaster Recovery

Recovering from a failure of a storage resource - notes from DS4200 double disk failure at Canterbury.

  • Deleting a resource is only permitted when there are no replicas on that resource.
  • Delete all replicas from that resource: recursively across /BeSTGRID, remove all replicas from griddata.canterbury.ac.nz, in admin mode, keeping at least 1 replica:
    itrim -r -M -N 1 -S griddata.canterbury.ac.nz /BeSTGRID/home
  • If this end up to be too much straining the iCAT, do it user by user:
iquest "%s" "select USER_NAME where USER_ZONE = 'BeSTGRID'" | sort |
while read DFUSER; do
   echo "Running itrim for $DFUSER" ; 
   itrim -r -M -N 1 -S griddata.canterbury.ac.nz /BeSTGRID/home/$DFUSER 2>> log-autotrim-BeSTGRID-home-$DFUSER.log >&2 ; 
done
    • Works even when resource is marked as *down*
  • Find files that are still saying on the resource: this may be files that have zero length and therefore did not get replicated:
    iquest "select DATA_NAME where RESC_NAME = 'gridddata.canterbury.ac.nz'"
    • Or directly from iCAT (psql ICAT):
       select coll_name, data_name, data_path from r_data_main,r_coll_main where r_data_main.coll_id=r_coll_main.coll_id and resc_name = 'griddata.canterbury.ac.nz';
    • Manually replicate these files onto another resource (with *irepl* - for zero-length files, works even when the resource is offline) and then trim there replica at the failed resource.
  • Finally, delete and create the resource.
    • Remember to add the new resource into any groups the old one was in.