Running a Grid Job using GANGA
Ganga is a frontend tool for job definition and management with access to all grid infrastucture supported by ATLAS.
Detailed Information about GANGA can be found in
http://documentation.hepcg.org/res/ap3/w_301106.pdf
.
How to setup and use GANGA
(a) Setup Grid environment and GANGA
The following two lines set up the Grid interface and GANGA using the newest version available on AFS:
- source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh
- source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh
(b) Setup Athena
Set up your Athena environment as usual, for example under 12.0.6:
* source ~/cmthome/setup.sh -tag=12.0.6
(c) Run GANGA
Start GANGA from the cmt or run directory of the Athena working area that has been setup before with just typing: ganga
To execute a script to submit a job in GANGA, type in GANGA command line (not GUI version):
execfile('/home/bernius/testarea/11.0.42/PhysicsAnalysis/AnalysisCommon/ttHHbb/run/mygangajob.py')
(This script can be found here:
mygangajob.py)
The job can also be submitted by just typing: ganga mygangajob.py
For more information about submitting your own jobs see the GANGA tutorial:
https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427
By default GANGA submits your jobs to the LCG. Since GANGA version 4.3.0, you can also submit your jobs to
NorduGrid using the new NG backend. More information on how to change your jobs from LCG to NG can be found here:
https://twiki.cern.ch/twiki/bin/view/Atlas/GangaNGTutorial430
Some GANGA commands and things to know
- exit GANGA: ctrl-D
- get online help: help (exit help: ctrl-D)
- repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/Local
- view job repository: jobs
- view subjobs with: subjobs
- to get info about specific jobs: jobs[jobid]
- remove job: jobs[jobid].remove()
- view job output directory of finished jobs that is retrieved back to the job repository: jobs[jobid].peek()
- export job configuration to a file: export(jobs[jobid], '~/jobconf.py')
Sandbox fun
* Input Sandbox:
-
- GANGA keeps the input sandbox for all jobs in $HOME/gangadir/workspace so there might be quota problems
- The size is by default 10MB -> Submission failes because "JobSizeException: Job Size exceeds limits." , look at tarfile in /gangadir/workspace/Local/jobid how big the file is
- Output Sandbox: * the output can be found by default in /gangadir/workspace/Local/jobid/output (j.outputdata.local_location='/home/bernius/outputGanga') is not working for me) * to specify which files you want to receise: j.outputsandbox=['*.dat','*.txt','*.root'] or j.outputsandbox=['*'] (to receive all)
- there are more options for the Input and Output Sandboxes, see https://twiki.cern.ch/twiki/bin/view/Atlas/GangaUpdates420
When you submit a job, GANGA will try to tar up your whole testarea to send with the job, which will inevitably be much larger than the 10MB limit for most sites. If it's only a little bit over then you can try and delete some things but a useful strategy is to create a separate testarea just for GANGA. The only things you need to run your job successfully are the job options and your testarea/InstallArea folder so if you just copy those into the fake testarea, your job should still run fine and fit in under the size limit.
More Information about GANGA can be found in the Links 1.-4.
Making your jobs actually work
Most likely when you first try to submit grid jobs you will encounter lots of problems with your job being sent to a site where the dataset you asked for is empty. In general this is related to the fact that the resource broker doesn't really understand the concept of incomplete datasets and handles them badly. On the LCG your best bet is to try and find out where your dataset is available and send the job there yourself. The first port of call here is AMI:
The ATLAS Metadata Interface (AMI):
http://lpsc1168x.in2p3.fr:8080/opencms/opencms/AMI/www/index.html
Using AMI you can search for your dataset. When you've found the one you want, the DQ2 link next to it takes you to the PANDA page where you can browse around and try and figure out what sites actually have your data. Once you've done that you need to find a computing element (CE) which has access to the storage element (SE) which holds your dataset. Some information on this topic is available if you run:
lcg-infosites --vo atlas closeSE
Although it's a bit of black magic to work out what is actually associated with what.
What's easier than all this is to use
NorduGrid instead which allows data to travel to the node where your job is (to some extent). On
NorduGrid the splitter element of the job setup seems to be important and setting the number of subjobs to the number of files in the dataset seems to produce the best results (least failed subjobs).
Links
--
CatrinBernius - 09 May 2007