Difference: AtlasGanga (1 vs. 6)

Revision 62010-04-21 - JamesRobinson

  META TOPICPARENT 
 name="HEPGroup.AtlasStuff" 

 Running a Grid Job using GANGA
- META TOPICPARENT
+ name="HEPGroup.AtlasStuff"
-<
<
+ Ganga is a frontend tool for job definition and management with access to all grid infrastucture supported by ATLAS. Detailed Information about GANGA can be found in http://documentation.hepcg.org/res/ap3/w_301106.pdf.
->
>
+ Ganga is a frontend tool for job definition and management with access to all grid infrastucture supported by ATLAS. Detailed Information about GANGA can be found in http://documentation.hepcg.org/res/ap3/w_301106.pdf.The following steps assume that you have a valid grid certificate.
  How to setup GANGA
-<
<
+ (a) Setup Grid environment and GANGA 
 The following two lines set up the Grid interface and GANGA using the newest version available on AFS:
->
>
+ (a) Setup Grid environment 

In order to use GANGA you should run the following commands in a clean shell.

Set up the grid environment and create a new proxy
  source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh
-<
<
+ source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh
->
>
+ voms-proxy-init -voms atlas
  (b) Setup Athena
-<
<
+Set up your Athena environment as usual, for example under 15.6.0: 
 source ~/cmthome/setup.sh -tag=15.6.0 
 
 (c) Run GANGA
->
>
+Set up Athena as you would normally (for me this is) 
 source /home/robinson/athena/DiJets/15.6.7/cmthome/setup.sh 
 
 (c) Setup GANGA 

Now source the GANGA setup 

 source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh
 
 Running GANGA
 Change directory to the run directory of the whichever package you are working on and then start ganga with: 
 ganga
-<
<
+ Using GANGA with the LCG backend
-<
<
+The ganga command line is a python shell which can be used to submit jobs. A sample job script is shown here:
->
>
+The ganga command line is a python shell which can be used to submit jobs. Sample job scripts are shown below. These can either be typed line by line into the GANGA shell or saved as a file and executed from the GANGA shell with 
 execfile('')
 
 Using GANGA with the LCG backend
-<
<
+j = Job()
j.application = Athena()
j.name='PTResolution.LowPT.SmallEta.J5'
j.application.option_file=[ '/home/robinson/athena/15.6.0/PhysicsAnalysis/ForwardJets/run/jobOptions.PTResolution.LowPT.SmallEta.py' ]
j.application.athena_compile = True
j.application.atlas_release='15.6.0'
j.application.prepare()
j.inputdata=DQ2Dataset()
j.inputdata.dataset=[ 'mc08.105014.J5_pythia_jetjet.merge.AOD.e344_s479_s520_r809_r838/' ]
j.outputdata=DQ2OutputDataset()
j.outputdata.outputdata=['PTResolution.root']
j.splitter=DQ2JobSplitter()
j.splitter.numsubjobs=500
j.backend = LCG()
j.backend.requirements.cloud='UK'
j.submit()
->
>
+config['LCG']['MatchBeforeSubmit'] = True
j = Job()
j.application = Athena()
j.name ='JES.ESD.J1'
j.application.option_file = [ '/home/robinson/athena/DiJets/15.6.7/PhysicsAnalysis/DiJets/share/jobOptions.ForwardJES.py' ]
j.application.athena_compile = True
j.application.atlas_release = '15.6.7'
j.application.prepare()
j.inputdata = DQ2Dataset()
j.inputdata.dataset = [ 'mc09_7TeV.105010.J1_pythia_jetjet.recon.ESD.e468_s766_s767_r1206/' ]
j.outputdata = DQ2OutputDataset()
j.outputdata.outputdata = [ 'ForwardJES.root' ]
j.splitter = DQ2JobSplitter()
j.splitter.numsubjobs = 500
j.backend = LCG()
j.backend.requirements.cloud = 'IT'
j.submit()
 The most important options here are 
 j.application.option_file which contains your Athena jobOptions
 If the data is not present, you can go here to request replication: http://panda.cern.ch:25980/server/pandamon/query?mode=ddm_req


The Athena version that you request must be present at all sites that your job is sent to. You can check which versions are available at which sites by running:
-<
<
+ lcg-infosites --vo atlas ce tag -- JamesRobinson - 13 Nov 2009
->
>
+ lcg-infosites --vo atlas
  OLD BUT MAY STILL BE RELEVANT

Revision 52009-11-13 - JamesRobinson

Line: 1 to 1

META TOPICPARENT	name="HEPGroup.AtlasStuff"

Running a Grid Job using GANGA

Ganga is a frontend tool for job definition and management with access to all grid infrastucture supported by ATLAS. Detailed Information about GANGA can be found in http://documentation.hepcg.org/res/ap3/w_301106.pdf

Line: 16 to 16

Change directory to the run directory of the whichever package you are working on and then start ganga with:

ganga

Changed:

<
<

Using GANGA

>
>

Using GANGA with the LCG backend

The ganga command line is a python shell which can be used to submit jobs. A sample job script is shown here:

Line: 33 to 33

ganga scriptname

For more information about submitting your own jobs see the GANGA tutorial: https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorial

Added:

>
>

Using the Panda backend

The Panda backend has to be used for jobs sent to US sites. It requires a slightly different form of job submission script. A sample job script is shown here:

j = Job()
j.application = Athena()
j.name='PTResolution.LowPT.SmallEta.J5'
j.application.option_file=[ '/home/robinson/athena/15.6.0/PhysicsAnalysis/ForwardJets/run/jobOptions.PTResolution.LowPT.SmallEta.py' ]
j.application.athena_compile = True
j.application.atlas_release='15.6.0'
j.application.prepare()
j.inputdata=DQ2Dataset()
j.inputdata.dataset=[ 'mc08.105014.J5_pythia_jetjet.merge.AOD.e344_s479_s520_r809_r838/' ]
j.outputdata=DQ2OutputDataset()
j.splitter=DQ2JobSplitter()
j.splitter.numsubjobs=500
j.backend=Panda()
j.submit()

Using the NorduGrid backend

If submitting jobs to NorduGrid in the Netherlands there is only one cloud. Therefore, the your script needs to be of the following form, changing the backend to 'NG'.

j = Job()
j.application = Athena()
j.name='PTResolution.LowPT.SmallEta.J5'
j.application.option_file=[ '/home/robinson/athena/15.6.0/PhysicsAnalysis/ForwardJets/run/jobOptions.PTResolution.LowPT.SmallEta.py' ]
j.application.athena_compile = True
j.application.atlas_release='15.6.0'
j.application.prepare()
j.inputdata=DQ2Dataset()
j.inputdata.dataset=[ 'mc08.105014.J5_pythia_jetjet.merge.AOD.e344_s479_s520_r809_r838/' ]
j.outputdata=DQ2OutputDataset()
j.outputdata.outputdata=['PTResolution.root']
j.splitter=DQ2JobSplitter()
j.splitter.numsubjobs=500
j.backend=NG()
j.submit()

*Useful GANGA python shell commands

exit GANGA: ctrl-D
get online help: help (exit help: ctrl-D)

Line: 44 to 54

view job output directory of finished jobs that is retrieved back to the job repository: jobs(jobid).peek()
view stdout or stderr for debugging failed jobs: jobs(jobid).peek('stdout.gz','emacs')
export job configuration to a file: export(jobs[jobid], '~/jobconf.py')

Changed:

<
<

force a job into a particular status: jobs(jobid).force_status("failed")

The repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/username/LocalAMGA

>
>

force a job into a particular status: jobs(jobid).force_status("failed") The repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/username/LocalAMGA

Common GANGA Problems

The datasets belonging to the container that you want to run on must all be present on the same cloud (although not necessarily at the same site). You can check where datasets are available by running:

Changed:

<
<

dq2-ls -r "datasetname" (outside ganga) The Athena version that you request must be present at all sites that your job is sent to. You can check which versions are available at which sites by running:
lcg-infosites --vo atlas ce tag

Using the Panda backend

The Panda backend has to be used for jobs sent to US sites. It requires a slightly different form of job submission script. A sample job script is shown here:

>
>

dq2-ls -r "datasetname" (outside ganga)

Changed:

<
<

>
>

If the data is not present, you can go here to request replication: http://panda.cern.ch:25980/server/pandamon/query?mode=ddm_req

Changed:

<
<

-- JamesRobinson - 13 Nov 2009

>
>

The Athena version that you request must be present at all sites that your job is sent to. You can check which versions are available at which sites by running:

lcg-infosites --vo atlas ce tag -- JamesRobinson - 13 Nov 2009

OLD BUT MAY STILL BE RELEVANT

Deleted:

<
<

GANGA on NorduGrid

By default GANGA submits your jobs to the LCG. Since GANGA version 4.3.0, you can also submit your jobs to NorduGrid using the new NG backend. More information on how to change your jobs from LCG to NG can be found here:

https://twiki.cern.ch/twiki/bin/view/Atlas/GangaNGTutorial430

Sandbox fun

* Input Sandbox:

Revision 42009-11-13 - JamesRobinson

  META TOPICPARENT 
 name="HEPGroup.AtlasStuff" 

 Running a Grid Job using GANGA
- META TOPICPARENT
+ name="HEPGroup.AtlasStuff"
-<
<
+Ganga is a frontend tool for job definition and management with access to all grid infrastucture supported by ATLAS. 
Detailed Information about GANGA can be found in http://documentation.hepcg.org/res/ap3/w_301106.pdf.
->
>
+ Ganga is a frontend tool for job definition and management with access to all grid infrastucture supported by ATLAS. Detailed Information about GANGA can be found in http://documentation.hepcg.org/res/ap3/w_301106.pdf.
-<
<
+ How to setup and use GANGA
->
>
+ How to setup GANGA
  (a) Setup Grid environment and GANGA 
The following two lines set up the Grid interface and GANGA using the newest version available on AFS:
-<
<
  source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh
  source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh
-<
<
  (b) Setup Athena
-<
<
+Set up your Athena environment as usual, for example under 12.0.6:

  * source ~/cmthome/setup.sh -tag=12.0.6
->
>
+Set up your Athena environment as usual, for example under 15.6.0: 
 source ~/cmthome/setup.sh -tag=15.6.0
  (c) Run GANGA
-<
<
+Start GANGA from the cmt or run directory of the Athena working area that has been setup before with just typing: ganga
To execute a script to submit a job in GANGA, type in GANGA command line (not GUI version): 
execfile('/home/bernius/testarea/11.0.42/PhysicsAnalysis/AnalysisCommon/ttHHbb/run/mygangajob.py')
(This script can be found here: mygangajob.py)
The job can also be submitted by just typing: ganga mygangajob.py
For more information about submitting your own jobs see the GANGA tutorial:
https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427
->
>
+Change directory to the run directory of the whichever package you are working on and then start ganga with: 
 ganga 
 
 Using GANGA 

The ganga command line is a python shell which can be used to submit jobs. A sample job script is shown here:

j = Job()
j.application = Athena()
j.name='PTResolution.LowPT.SmallEta.J5'
j.application.option_file=[ '/home/robinson/athena/15.6.0/PhysicsAnalysis/ForwardJets/run/jobOptions.PTResolution.LowPT.SmallEta.py' ]
j.application.athena_compile = True
j.application.atlas_release='15.6.0'
j.application.prepare()
j.inputdata=DQ2Dataset()
j.inputdata.dataset=[ 'mc08.105014.J5_pythia_jetjet.merge.AOD.e344_s479_s520_r809_r838/' ]
j.outputdata=DQ2OutputDataset()
j.outputdata.outputdata=['PTResolution.root']
j.splitter=DQ2JobSplitter()
j.splitter.numsubjobs=500
j.backend = LCG()
j.backend.requirements.cloud='UK'
j.submit()

The most important options here are 

 j.application.option_file which contains your Athena jobOptions 
  j.outputdata.outputdata which contains the output specified by your Athena jobOptions 
 

To execute this script (which should be in the run directory from which you ran ganga), simple type 

 execfile('scriptname') 
 

Alternatively, the job can be submitted from outside the ganga shell by typing 

 ganga scriptname 
 

For more information about submitting your own jobs see the GANGA tutorial: https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorial
 *Useful GANGA python shell commands 
 
 exit GANGA: ctrl-D
  get online help: help (exit help: ctrl-D)
  view job repository: jobs
  view subjobs with: jobs(jobid).subjobs
  to get info about specific jobs: jobs(jobid)
  to get the job status: jobs(jobid).status
  remove job: jobs(jobid).remove()
  view job output directory of finished jobs that is retrieved back to the job repository: jobs(jobid).peek()
  view stdout or stderr for debugging failed jobs: jobs(jobid).peek('stdout.gz','emacs')
  export job configuration to a file: export(jobs[jobid], '~/jobconf.py')
  force a job into a particular status: jobs(jobid).force_status("failed")
 
The repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/username/LocalAMGA
 Common GANGA Problems
 

The datasets belonging to the container that you want to run on must all be present on the same cloud (although not necessarily at the same site). You can check where datasets are available by running: 

 dq2-ls -r "datasetname" (outside ganga) The Athena version that you request must be present at all sites that your job is sent to. You can check which versions are available at which sites by running: 
  lcg-infosites --vo atlas ce tag 
 
 Using the Panda backend 

The Panda backend has to be used for jobs sent to US sites. It requires a slightly different form of job submission script. A sample job script is shown here:

j = Job()
j.application = Athena()
j.name='PTResolution.LowPT.SmallEta.J5'
j.application.option_file=[ '/home/robinson/athena/15.6.0/PhysicsAnalysis/ForwardJets/run/jobOptions.PTResolution.LowPT.SmallEta.py' ]
j.application.athena_compile = True
j.application.atlas_release='15.6.0'
j.application.prepare()
j.inputdata=DQ2Dataset()
j.inputdata.dataset=[ 'mc08.105014.J5_pythia_jetjet.merge.AOD.e344_s479_s520_r809_r838/' ]
j.outputdata=DQ2OutputDataset()
j.splitter=DQ2JobSplitter()
j.splitter.numsubjobs=500
j.backend=Panda()
j.submit()

-- JamesRobinson - 13 Nov 2009

 OLD BUT MAY STILL BE RELEVANT
  GANGA on NorduGrid 
By default GANGA submits your jobs to the LCG. Since GANGA version 4.3.0, you can also submit your jobs to NorduGrid using the new NG backend. More information on how to change your jobs from LCG to NG can be found here:

https://twiki.cern.ch/twiki/bin/view/Atlas/GangaNGTutorial430
-<
<
+ Some GANGA commands and things to know 
 
 exit GANGA: ctrl-D
  get online help: help (exit help: ctrl-D)
  repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/Local
  view job repository: jobs
  view subjobs with: subjobs
  to get info about specific jobs: jobs(jobid)
  to get the job status: jobs(jobid).status
  remove job: jobs(jobid).remove()
  view job output directory of finished jobs that is retrieved back to the job repository: jobs(jobid).peek()
  export job configuration to a file: export(jobs[jobid], '~/jobconf.py')
  Sandbox fun 
  * Input Sandbox:
->
>
  GANGA keeps the input sandbox for all jobs in $HOME/gangadir/workspace so there might be quota problems
  The size is by default 10MB ->  Submission failes because "JobSizeException: Job Size exceeds limits." , look at tarfile in /gangadir/workspace/Local/jobid how big the file is
-<
<
+ Output Sandbox:         * the output can be found by default in /gangadir/workspace/Local/jobid/output (j.outputdata.local_location='/home/bernius/outputGanga') is not working for me)         * to specify which files you want to receise: j.outputsandbox=['*.dat','*.txt','*.root'] or j.outputsandbox=['*'] (to receive all)
->
>
+ Output Sandbox: * the output can be found by default in /gangadir/workspace/Local/jobid/output (j.outputdata.local_location='/home/bernius/outputGanga') is not working for me) * to specify which files you want to receise: j.outputsandbox=['*.dat','*.txt','*.root'] or j.outputsandbox=['*'] (to receive all)
  there are more options for the Input and Output Sandboxes, see https://twiki.cern.ch/twiki/bin/view/Atlas/GangaUpdates420
 

When you submit a job, GANGA will try to tar up your whole testarea to send with the job, which will inevitably be much larger than the 10MB limit for most sites. If it's only a little bit over then you can try and delete some things but a useful strategy is to create a separate testarea just for GANGA. The only things you need to run your job successfully are the job options and your testarea/InstallArea folder so if you just copy those into the fake testarea, your job should still run fine and fit in under the size limit.

Revision 32007-11-26 - MarioCampanelli

Line: 1 to 1

META TOPICPARENT	name="HEPGroup.AtlasStuff"

Running a Grid Job using GANGA

Ganga is a frontend tool for job definition and management with access to all grid infrastucture supported by ATLAS.

Line: 36 to 36

repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/Local
view job repository: jobs
view subjobs with: subjobs

Changed:

<
<

to get info about specific jobs: jobs[jobid]
remove job: jobs[jobid].remove()
view job output directory of finished jobs that is retrieved back to the job repository: jobs[jobid].peek()

>
>

to get info about specific jobs: jobs(jobid)
to get the job status: jobs(jobid).status
remove job: jobs(jobid).remove()
view job output directory of finished jobs that is retrieved back to the job repository: jobs(jobid).peek()

export job configuration to a file: export(jobs[jobid], '~/jobconf.py')

Sandbox fun

Revision 22007-07-23 - AdamD

Line: 1 to 1

META TOPICPARENT	name="HEPGroup.AtlasStuff"

Running a Grid Job using GANGA

Ganga is a frontend tool for job definition and management with access to all grid infrastucture supported by ATLAS.

Line: 64 to 64

lcg-infosites --vo atlas closeSE

Changed:

<
<

Although it's a bit of black magic to work out what is actually associated with what.

>
>

Update: The ganga people now have this wiki page which looks extremely helpful:

https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ

Although I haven't had a chance to try it all out properly yet -- AdamD - 23 Jul 2007

What's easier than all this is to use NorduGrid instead which allows data to travel to the node where your job is (to some extent). On NorduGrid the splitter element of the job setup seems to be important and setting the number of subjobs to the number of files in the dataset seems to produce the best results (least failed subjobs).

Revision 12007-05-09 - CatrinBernius

Line: 1 to 1

Added:

>
>

META TOPICPARENT	name="HEPGroup.AtlasStuff"

Running a Grid Job using GANGA

How to setup and use GANGA

(a) Setup Grid environment and GANGA

The following two lines set up the Grid interface and GANGA using the newest version available on AFS:

source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh
source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh

(b) Setup Athena

Set up your Athena environment as usual, for example under 12.0.6:

* source ~/cmthome/setup.sh -tag=12.0.6

(c) Run GANGA

Start GANGA from the cmt or run directory of the Athena working area that has been setup before with just typing: ganga To execute a script to submit a job in GANGA, type in GANGA command line (not GUI version): execfile('/home/bernius/testarea/11.0.42/PhysicsAnalysis/AnalysisCommon/ttHHbb/run/mygangajob.py') (This script can be found here: mygangajob.py) The job can also be submitted by just typing: ganga mygangajob.py For more information about submitting your own jobs see the GANGA tutorial: https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427

GANGA on NorduGrid

https://twiki.cern.ch/twiki/bin/view/Atlas/GangaNGTutorial430

Some GANGA commands and things to know

exit GANGA: ctrl-D
get online help: help (exit help: ctrl-D)
repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/Local
view job repository: jobs
view subjobs with: subjobs
to get info about specific jobs: jobs[jobid]
remove job: jobs[jobid].remove()
view job output directory of finished jobs that is retrieved back to the job repository: jobs[jobid].peek()
export job configuration to a file: export(jobs[jobid], '~/jobconf.py')

Sandbox fun

* Input Sandbox:

- GANGA keeps the input sandbox for all jobs in $HOME/gangadir/workspace so there might be quota problems
- The size is by default 10MB -> Submission failes because "JobSizeException: Job Size exceeds limits." , look at tarfile in /gangadir/workspace/Local/jobid how big the file is
Output Sandbox: * the output can be found by default in /gangadir/workspace/Local/jobid/output (j.outputdata.local_location='/home/bernius/outputGanga') is not working for me) * to specify which files you want to receise: j.outputsandbox=['*.dat','*.txt','*.root'] or j.outputsandbox=['*'] (to receive all)
there are more options for the Input and Output Sandboxes, see https://twiki.cern.ch/twiki/bin/view/Atlas/GangaUpdates420

When you submit a job, GANGA will try to tar up your whole testarea to send with the job, which will inevitably be much larger than the 10MB limit for most sites. If it's only a little bit over then you can try and delete some things but a useful strategy is to create a separate testarea just for GANGA. The only things you need to run your job successfully are the job options and your testarea/InstallArea folder so if you just copy those into the fake testarea, your job should still run fine and fit in under the size limit.

More Information about GANGA can be found in the Links 1.-4.

Making your jobs actually work

Most likely when you first try to submit grid jobs you will encounter lots of problems with your job being sent to a site where the dataset you asked for is empty. In general this is related to the fact that the resource broker doesn't really understand the concept of incomplete datasets and handles them badly. On the LCG your best bet is to try and find out where your dataset is available and send the job there yourself. The first port of call here is AMI:

The ATLAS Metadata Interface (AMI): http://lpsc1168x.in2p3.fr:8080/opencms/opencms/AMI/www/index.html

Using AMI you can search for your dataset. When you've found the one you want, the DQ2 link next to it takes you to the PANDA page where you can browse around and try and figure out what sites actually have your data. Once you've done that you need to find a computing element (CE) which has access to the storage element (SE) which holds your dataset. Some information on this topic is available if you run:

lcg-infosites --vo atlas closeSE

Although it's a bit of black magic to work out what is actually associated with what.