DataFabric Improvements

From BeSTGRID

Jump to: navigation, search

This page outlines suggested improvements to the BeSTGRID DataFabric. Some are ideas that have already been used by ARCS, some are ideas coming locally from within the BeSTGRID community. This page can also be seen as a wish-list maintained by the BeSTGRID operators. And in particular, it's a list of things that Vladimir Mencl has had on his TODO-list for a long time.

Contents

[edit] Rule-based file replication

So far, we are doing replication through a shell script that periodically scans the iCAT for files that need replication (i.e., have only one fresh replica). And the shell script for example cannot handle files that have back-slashes in their name.

ARCS have developed an alternative implementation (in Python) that:

  1. reliably replicates files that are missing replicas
  2. reliably replicates files as soon as the file is uploaded (via a rule)

Use replicate.py and replicateBacklog.py</tt> from http://projects.arcs.org.au/svn/systems/trunk/dataFabricScripts/iRODS/utils/

Put the following into bestgrid.irb to activate the scripts:

#Replication Rules
acPostProcForPut||delayExec(<PLUSET>30s</PLUSET>,msiExecCmd(replicate.py,$dataId,null,null,null,*REPLI_OUT),nop)|nop
acPostProcForCopy||delayExec(<PLUSET>30s</PLUSET>,msiExecCmd(replicate.py,$dataId,null,null,null,*REPLI_OUT),nop)|nop

Additional documentation:

[edit] Proper iCAT backups

Instead of just doing periodic iCAT backups (a full SQL dump stored offsite), we should be using Postgres support for archiving Write-Ahead-Log (WAL) files offsite.

Instructions at http://wiki.arcs.org.au/foswiki/bin//view/Main/ChangeNote201003-003

Additional information:

Make sure you have enabled WAL archiving on postgres as below, to make IcatBackup.sh script to work.

    * archive_mode = on
    * archive_command = 'ssh arcs-df.ac3.edu.au test ! -f /data/DataFabric_Backups/Current_Wal_Archives/%f && rsync -az %p arcs-df.ac3.edu.au:/data/DataFabric_Backups/Current_Wal_Archives/%f'
    * checkpoint_timeout = 1h
    * archive_timeout = 12h

IcatBackup.sh, this script will take backup of whole database cluster
pgdump.sh is basic pg_dump backup script
Let me know If you need any further information.

Additional links:

[edit] iCAT streaming replication

Postgres9 supports synchronous mode - implement this and deploy slave database servers at other sites. iRODS would be using the local database replica for read-only operations.

[edit] iRODS Resource monitoring

  • iRODS can monitor whether a resource server is available and automatically mark the resource as down (avoiding timeouts in accessing the server).

[edit] Nagios monitoring of DataFabric services

[edit] WebDAV over Shibboleth

Experiment with setting up IdP-specific URLs at the ShibSP for webDAV over Shibboleth: see https://wiki.shibboleth.net/confluence/display/SHIB2/WebDAV

[edit] BeSTGRID frontpage

  • Additional features for the HTML frontpage at http://df.bestgrid.org/
    • Instead of giving links just to the closest server (GeoIP), give also links to all other servers available (test first at gridgwtest.canterbury.ac.nz)