Distributed Analysis on Panda
This page describes how to submit user analysis jobs from LCG/OSG/NG to the OSG production system (Panda) from UCL.
Detailed Information about DA on Panda can be found here:
https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda
.
In order to set up pathena at UCL, follow the instructions below:
First, make sure you have a grid certificate. See Starting on the Grid. You should have usercert.pem and userkey.pem under ~/.globus.
Then setup Athena because pathena works in the Athena runtime environment.
Then checkout
PandaTools which contains pathena (for 13.0.X or 12.0.X, instructions for 11.0.X can be found here
https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda
):
cd /somewhere/workarea //this is your workarea where your analysis code is in, eg: testarea1206/12.0.6/
export CMTPATH=`pwd`:${CMTPATH}
export PATHENA_GRID_SETUP_SH=/usr/local/glite/etc/profile.d/grid_env.sh
cmt co PhysicsAnalysis/DistributedAnalysis/PandaTools
cd PhysicsAnalysis/DistributedAnalysis/PandaTools/cmt
source setup.sh
make
cd /somewhere/workarea/.../somedirectory //this means go to your run directory in your analysis code
mkdir run //create a run directory if you don't have one
cd run
When you run Athena with:
athena jobO_1.py jobO_2.py jobO_3.py
all you need is
pathena jobO_1.py jobO_2.py jobO_3.py [--inDS inputDataset] --outDS outputDataset
where inputDataset is a dataset which contains input files, and outputDataset is a dataset which will contain output files. For details about options, see pathena.
More options can be found here:
https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda#pathena
Update your pathena version
If you get an error message: 52 ERROR : could not access DQ2 server
then you have to update your pathena version.
Here are two links for updating dq2 and pathena:
https://twiki.cern.ch/twiki/bin/view/Atlas/UsingDQ2#How_to_migrate_to_DQ2_0_3
https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda#How_to_migrate_to_DQ2_0_3
The following describes how to update your existing version of pathena:
(1) Firstly Remove Existing Panda Tools
cd ~/somewhere/12.0.6/PhysicsAnalysis/DistributedAnalysis/PandaTools/cmt
gmake clean
cd ../../../
rm -r DistributedAnalysis
(2)Now check it out again:
cd ~/somedirectory/12.0.6
cmt co PhysicsAnalysis/DistributedAnalysis/PandaTools
cd PhysicsAnalysis/DistributedAnalysis/PandaTools/cmt
source setup.sh
make
(3)Do a CVS update
cd ~/somedirectory/12.0.6/PhysicsAnalysis/DistributedAnalysis/PandaTools
cvs update
(4) Check that you have the latest version
cvs status ChangeLog
You should see Working Revision: 1.7
(5) Last step:
cd cmt
setup.sh
make
Monitoring
The Panda interface for monitoring your jobs is pretty good. You can visit a webpage with a url like this:
http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query?ui=user&name=adamdavison
Obviously replacing adamdavison with your own username. Your username appears to be the name field of your grid certificate.
Compiling takes all day
So it's good that you can avoid it.
Once you've got a job running with a:
pathena mycooljoboptions.py --inDS dataset --outDS dataset
You can go to the panda monitoring page and find the name of the library dataset produced by the build step.
As long as you're happy with this set of binaries and you only want to change your top level job options for your next job, you can do:
pathena mycoolerjoboptions.py --inDS dataset --outDS dataset --libDS library dataset
And go straight to running.
--
AdamD - 23 Jul 2007
--
CatrinBernius - 19 Jun 2007