Difference: AtlasGrid (1 vs. 20)

Revision 202010-04-21 - JamesRobinson

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Changed:
<
<
Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into. Here, I'll give rough instructions but I want to focus on tips and give scripts that will make a physicist run his/her jobs and get results fast.

A very good introduction to GRID is given by Steve Lloyd (see link 2) but also at the Atlas Wiki (see link 3). In terms of "bureaucracy" you will need (a) to get a GRID certificate and (b) to join a Virtual Organisation (VO), which -for us- is the atlas VO The certicate will be valid only for the machine that you used to issue it. This is very important, since if you need to have access to web-pages (like for example the Grid User Support), you will need to do so using the machine you got the certificate from. Otherwise, you should log-in to that machine and from there do whatever you wish... These steps now take few days (instead of few months, as it used to be)

>
>
Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into. Here, I'll give rough instructions but I want to focus on tips and give scripts that will make a physicist run his/her jobs and get results fast.
 
Changed:
<
<
After that, you will need to get an account on a User Interface and install your certificate. The final aim of doing these steps is to create your proxy certificate, which will allow you to have access to the grid for a desired period of time.
>
>
A very good introduction to GRID is given by Steve Lloyd (see link 2) but also at the Atlas Wiki (see link 3). In terms of "bureaucracy" you will need (a) to get a GRID certificate and (b) to join a Virtual Organisation (VO), which -for us- is the atlas VO The certicate will be valid only for the machine that you used to issue it. This is very important, since if you need to have access to web-pages (like for example the Grid User Support), you will need to do so using the machine you got the certificate from. Otherwise, you should log-in to that machine and from there do whatever you wish... These steps now take few days (instead of few months, as it used to be)

After that, you will need to get an account on a User Interface and install your certificate. The final aim of doing these steps is to create your proxy certificate, which will allow you to have access to the grid for a desired period of time.

  The guides on how to do the above steps are on our group's web-page (see link 4)
Added:
>
>

Ganga

 
Added:
>
>
Ganga is a command line (or graphical) frontend for submitting and running jobs on the grid. Ganga usage is explained at AtlasGanga. The rest of this page is probably not so relevant any more.

The Grid

  By now, you will be able to use the grid. Each time you log in, you should do:
Line: 24 to 19
  For convinience, you could add this command into your shell script (.bashrc usually)
Changed:
<
<
You have also to get or check if you have a valid grid proxy and if so, for how long. To get a proxy type:
>
>
You have also to get or check if you have a valid grid proxy and if so, for how long. To get a proxy type:
  grid-proxy-init -valid 24:30
Line: 41 to 35
  The detailed guide/manual for the LCG is given in Link 5.
Changed:
<
<
It is essential to understanbd that between you and the site that your job will eventually run, there is the Resource Broker (RB), which controls and distributes the jobs. For us, the RB is RAL. For example, you sumbit your jobs there, and you retrieve your job from there (you will that when you perform these requests). The concept is that the user doesn't have direct contact with the final site.

To make the first step, run the HelloWorld, described at Steve's notes (link 2). There are three main componets of your job: (a) the .jdl file, which gives the instructions to the RB of what your jobs will need. (b) you .sh file , which is the executable script (the same that we submit at the PBS) and (c) your .py file, which is the normal python jobOptions file that we all run within ATHENA.

>
>
It is essential to understanbd that between you and the site that your job will eventually run, there is the Resource Broker (RB), which controls and distributes the jobs. For us, the RB is RAL. For example, you sumbit your jobs there, and you retrieve your job from there (you will that when you perform these requests). The concept is that the user doesn't have direct contact with the final site.

To make the first step, run the HelloWorld, described at Steve's notes (link 2). There are three main componets of your job: (a) the .jdl file, which gives the instructions to the RB of what your jobs will need. (b) you .sh file , which is the executable script (the same that we submit at the PBS) and (c) your .py file, which is the normal python jobOptions file that we all run within ATHENA.

  One example of these files can be found on the work-book. Another example is given here. The full simulation (GEANT) of (Pythia) generated events is used as a case study.
Line: 55 to 45
  * simulate.jdl: That is an example of a jdl file
Changed:
<
<
The first line declares which file is the executable one. The second and the third line define the names of the files to dump the errors and the print-outs. In the fourth line we give the files that we want to be copied at the site. For example, I copy the executable of course, and the jobOptions file which I will use to run ATHENA. The OutputSandbox variable defines which files I want to get back. One of these is of course the GEANT output (Hits). Finally, the Requirements variable gets all our options: we need to define within which VO we will run our jobs (here is atlas) and what release we want to use (here 11.0.4). So the RB will search all the sites in the atlas VO which have the release 11.0.4. And we finally require our jobs to run in the long queue by adding the other.GlueCEPolicyMaxCPUTime > 120
>
>
The first line declares which file is the executable one. The second and the third line define the names of the files to dump the errors and the print-outs. In the fourth line we give the files that we want to be copied at the site. For example, I copy the executable of course, and the jobOptions file which I will use to run ATHENA. The OutputSandbox variable defines which files I want to get back. One of these is of course the GEANT output (Hits). Finally, the Requirements variable gets all our options: we need to define within which VO we will run our jobs (here is atlas) and what release we want to use (here 11.0.4). So the RB will search all the sites in the atlas VO which have the release 11.0.4. And we finally require our jobs to run in the long queue by adding the other.GlueCEPolicyMaxCPUTime > 120
 
  • TIP Tip: You can check which sites satisfy all of your requirements by typing:
Line: 75 to 59
  * simulate.sh: That is an example of a sh file
Changed:
<
<
This is a quite long file to explain, but it is very simple to understand, even for someone with a basic knowledge of bash commands. There few important things to mention here:
>
>
This is a quite long file to explain, but it is very simple to understand, even for someone with a basic knowledge of bash commands. There few important things to mention here:
 
  • Since our generated files are very big to be transfered (GRID allows only up to a certain amount of MB to be transfered with the jdl file) we need to copy them
Changed:
<
<
to the site that the job is running and which of course we don't know and control. Therefore we must first copy them to site(s), from where we can retrieve them. It is essential that we register the file to the grid. Say for example that you need to transfer the file /home/storage/fileGenEvents.pool.root to the site se1.pp.rhul.ac.uk. You will have to type:
>
>
to the site that the job is running and which of course we don't know and control. Therefore we must first copy them to site(s), from where we can retrieve them. It is essential that we register the file to the grid. Say for example that you need to transfer the file /home/storage/fileGenEvents.pool.root to the site se1.pp.rhul.ac.uk. You will have to type:
  lcg-cr -d se1.pp.rhul.ac.uk -l lfn:/grid/atlas/fileGenEvents.pool.root --vo atlas file:////home/storage/fileGenEvents.pool.root
Changed:
<
<
The above command will make copy of that file with a name of /grid/atlas/fileGenEvents.pool.root. But the file will get a unique Identification Code (something like file7764465a-55ca-4396-85e8-655c86d2c1bd) which identifies where exactly it is.
>
>
The above command will make copy of that file with a name of /grid/atlas/fileGenEvents.pool.root. But the file will get a unique Identification Code (something like file7764465a-55ca-4396-85e8-655c86d2c1bd) which identifies where exactly it is.
  * ALERT! Caution: All the file names should start with the /grid/atlas

Changed:
<
<
* TIP Tip: You can check all the available storage elements by typing: lcg-infosites --vo atlas se. A --help will explain how to use this and all the lcg commands.
>
>
* TIP Tip: You can check all the available storage elements by typing: lcg-infosites --vo atlas se. A --help will explain how to use this and all the lcg commands.
  You can check the existence of the file by typing:
Line: 101 to 79
  lcg-cp --vo atlas lfn:/grid/atlas/fileGenEvents.pool.root file:///home/storage/copiedFile.pool.root
Changed:
<
<
But there is the possibility that the site which hosts the file you want may not be available. Therefore you must make replicas of that file. A replica means that the file name will be same, but it will be hosted in different places. The command lcg-rep does this job.
>
>
But there is the possibility that the site which hosts the file you want may not be available. Therefore you must make replicas of that file. A replica means that the file name will be same, but it will be hosted in different places. The command lcg-rep does this job.
 
  • TIP Tip: It is good to have the file at quite few places so that you make sure that it will be copied successfully. It is also advised to try to copy it to sites outside UK, since sometimes, the GRID problems are country-dependent.
Changed:
<
<
This is what we do in the first line of the sh file. We give the sites that we made replicas of our generated samples and then, by checking each time if the copy has been successful, we loop over the site to get the file.
>
>
This is what we do in the first line of the sh file. We give the sites that we made replicas of our generated samples and then, by checking each time if the copy has been successful, we loop over the site to get the file.
 
  • The other line that is important (just for the simulation step) is that the file geomDB_sqlite, which is needed by GEANT, must be copied at the local area:
Line: 137 to 108
  edg-job-get-output -i simulate_jobIDfile
Changed:
<
<
The important key to mention here is the file simulate_jobIDfile, which includes the identification of the submitted job. Unfortunately, the code given to the job is a random one (like akqIkNdtGa4LPNUTUrsWgg). Therefore the book-keeping must be very carefull.
>
>
The important key to mention here is the file simulate_jobIDfile, which includes the identification of the submitted job. Unfortunately, the code given to the job is a random one (like akqIkNdtGa4LPNUTUrsWgg). Therefore the book-keeping must be very carefull.
  If you want to submit many jobs, it is wise to make first a template of you jdl, sh and py files. Then you can use the script:

Revision 192007-05-09 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 184 to 184
 

META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="checkStatus.sh" path="checkStatus.sh" size="122" user="StathisStefanidis" version="1.1"
Deleted:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="Grid setup commands" date="1173955919" name="mygridsetup.sh" path="mygridsetup.sh" size="217" user="Main.CatrinBernius" version=""
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="simulate.sh" path="simulate.sh" size="1659" user="StathisStefanidis" version="1.1"
Deleted:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="Setup commands for GANGA" date="1173955868" name="myGangaSetup.sh" path="myGangaSetup.sh" size="419" user="Main.CatrinBernius" version=""
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="createSubmitFiles.sh" path="createSubmitFiles.sh" size="798" user="StathisStefanidis" version="1.1"
Deleted:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="Athena Setup commands" date="1173955960" name="myAthenaSetup.sh" path="myAthenaSetup.sh" size="339" user="Main.CatrinBernius" version=""
 
META FILEATTACHMENT attr="h" autoattached="1" comment="A n example of a jdl file" date="1152980708" name="simulate.jdl" path="simulate.jdl" size="769" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"
Deleted:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="GANGA job script example" date="1173956003" name="mygangajob.py.txt" path="mygangajob.py.txt" size="3743" user="Main.CatrinBernius" version=""
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="This is an example of a jobOptions file" date="1152980708" name="simulate.py.txt" path="simulate.py.txt" size="1717" user="StathisStefanidis" version="1.1"

Revision 182007-05-09 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 158 to 158
 A detailed guide for the LCG is given in Link 5.
Deleted:
<
<

Running a Grid Job using GANGA

Brief description of GANGA components

  • Job-registry component allows for storage and recovery of job information, and allows for job objects to be serialized
  • Script-generation component translates a job's work flow into the set of instructions to be executed when the job is running
  • Job-submission component submits work flow script to target (batch system, creates JDL file and translates resource request)
  • File-transfer component handles transfer between sites of input & output files, adds commands to work flow script on submission
  • Job-monitoring component performs queries of job status

Detailed Information about GANGA can be found in http://documentation.hepcg.org/res/ap3/w_301106.pdf.

How to install, setup and use GANGA

(a) Install GANGA

To intall GANGA, follow the instructions from https://twiki.cern.ch/twiki/bin/view/Atlas/WorkBookGanga It is optional to modify the GANGA start-up parameters, descriped in Section 1.4 in https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427

(b) Setup Grid environment and GANGA

(c) Setup Athena

In the case for ttH analysis: myAthenaSetup.sh

(d) Run GANGA

Start GANGA from the cmt or run directory of the Athena working area that has been setup before with just typing: ganga To execute a script to submit a job in GANGA, type in GANGA command line (not GUI version): execfile('/home/bernius/testarea/11.0.42/PhysicsAnalysis/AnalysisCommon/ttHHbb/run/mygangajob.py') (This script can be found here: mygangajob.py) The job can also be submitted by just typing: ganga mygangajob.py

Some GANGA commands and things to know

  • exit GANGA: ctrl-D
  • get online help: help
  • exit help: ctrl-D
  • repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/Local
  • view job repository: jobs
  • view subjobs with: subjobs
  • to get info about specific jobs: j (if defined) or jobs[jobid]
  • get jobid number: print j.id
  • summary of a job: j
  • retrieve job ojects from a registry: print jobs[jobid].status
  • get output info: j.outputdata
  • kill running or queued job: jobs[jobid].kill()
  • remove job with status new: jobs[jobid].remove()
  • delete job with status failed/completed: del jobs[jobid]
  • view job output directory of finished jobs that is retrieved back to the job repository: jobs[jobid].peek()
  • view stdout log file of finished job: jobs[jobid].peek('stdout', 'cat')
  • export job configuration to a file: export(jobs[jobid], '~/jobconf.py')
  • load job configuration form file: load('~/jobconf.py')
  • status of job: j.status
  • IPython prompt tab expansion (methods with underscores are methods related to the implementation and should not be used directly): j.
  • get list of backends: backends
  • get list of application: applications
  • Input Sandbox:
    • GANGA keeps the input sandbox for all jobs in $HOME/gangadir/workspace so there might be quota problems
    • The size is by default 10MB -> Submission failes because "JobSizeException: Job Size exceeds limits." , look at tarfile in /gangadir/workspace/Local/jobid how big the file is
  • Output Sandbox: * the output can be found by default in /gangadir/workspace/Local/jobid/output (j.outputdata.local_location='/home/bernius/outputGanga') is not working for me) * to specify which files you want to receise: j.outputsandbox=['*.dat','*.txt','*.root'] or j.outputsandbox=['*'] (to receive all)
  • there are more options for the Input and Output Sandboxes, see https://twiki.cern.ch/twiki/bin/view/Atlas/GangaUpdates420

More Information about GANGA can be found in the Links 5.-8. To search for Datasets with the ATLAS Metadata Interface (AMI): http://lpsc1168x.in2p3.fr:8080/opencms/opencms/AMI/www/index.html

 

Links

Line: 237 to 169
 

Deleted:
<
<

 

Bibliography

Revision 172007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 216 to 217
 
  • Input Sandbox:
    • GANGA keeps the input sandbox for all jobs in $HOME/gangadir/workspace so there might be quota problems
    • The size is by default 10MB -> Submission failes because "JobSizeException: Job Size exceeds limits." , look at tarfile in /gangadir/workspace/Local/jobid how big the file is
Added:
>
>
  • Output Sandbox: * the output can be found by default in /gangadir/workspace/Local/jobid/output (j.outputdata.local_location='/home/bernius/outputGanga') is not working for me) * to specify which files you want to receise: j.outputsandbox=['*.dat','*.txt','*.root'] or j.outputsandbox=['*'] (to receive all)
  • there are more options for the Input and Output Sandboxes, see https://twiki.cern.ch/twiki/bin/view/Atlas/GangaUpdates420

More Information about GANGA can be found in the Links 5.-8. To search for Datasets with the ATLAS Metadata Interface (AMI): http://lpsc1168x.in2p3.fr:8080/opencms/opencms/AMI/www/index.html

 

Links

Line: 231 to 239
 

Added:
>
>

 

Bibliography

  • 1. The GRID: Blueprint for a New Computing Infrastructure

Revision 162007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 215 to 215
 
  • get list of application: applications
  • Input Sandbox: * GANGA keeps the input sandbox for all jobs in $HOME/gangadir/workspace so there might be quota problems
Changed:
<
<
* The size is by default 10MB -> Submit failed because "JobSizeException: Job Size exceeds limits." , look at tarfile in /gangadir/workspace/Local/ how big the file is
>
>
    • The size is by default 10MB -> Submission failes because "JobSizeException: Job Size exceeds limits." , look at tarfile in /gangadir/workspace/Local/jobid how big the file is
 

Links

Revision 152007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 211 to 211
 
  • load job configuration form file: load('~/jobconf.py')
  • status of job: j.status
  • IPython prompt tab expansion (methods with underscores are methods related to the implementation and should not be used directly): j.
Changed:
<
<
get list of backends: backends
>
>
  • get list of backends: backends
 
  • get list of application: applications
  • Input Sandbox:
        • GANGA keeps the input sandbox for all jobs in $HOME/gangadir/workspace so there might be quota problems

Revision 142007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 187 to 187
 Start GANGA from the cmt or run directory of the Athena working area that has been setup before with just typing: ganga To execute a script to submit a job in GANGA, type in GANGA command line (not GUI version): execfile('/home/bernius/testarea/11.0.42/PhysicsAnalysis/AnalysisCommon/ttHHbb/run/mygangajob.py')
Changed:
<
<
(This script can be found here: mygangajob.py)
>
>
(This script can be found here: mygangajob.py)
 The job can also be submitted by just typing: ganga mygangajob.py

Some GANGA commands and things to know

Line: 252 to 252
 
META FILEATTACHMENT attr="" autoattached="1" comment="Athena Setup commands" date="1173955960" name="myAthenaSetup.sh" path="myAthenaSetup.sh" size="339" user="Main.CatrinBernius" version=""
META FILEATTACHMENT attr="h" autoattached="1" comment="A n example of a jdl file" date="1152980708" name="simulate.jdl" path="simulate.jdl" size="769" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="GANGA job script example" date="1173956003" name="mygangajob.py.txt" path="mygangajob.py.txt" size="3743" user="Main.CatrinBernius" version=""
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="This is an example of a jobOptions file" date="1152980708" name="simulate.py.txt" path="simulate.py.txt" size="1717" user="StathisStefanidis" version="1.1"
Deleted:
<
<
META FILEATTACHMENT attachment="mygangajob.py.txt" attr="" comment="GANGA job script example" date="1173956003" name="mygangajob.py.txt" path="mygangajob.py" size="3743" stream="mygangajob.py" user="Main.CatrinBernius" version="0"

Revision 132007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 249 to 249
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="simulate.sh" path="simulate.sh" size="1659" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="" autoattached="1" comment="Setup commands for GANGA" date="1173955868" name="myGangaSetup.sh" path="myGangaSetup.sh" size="419" user="Main.CatrinBernius" version=""
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="createSubmitFiles.sh" path="createSubmitFiles.sh" size="798" user="StathisStefanidis" version="1.1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Athena Setup commands" date="1173955960" name="myAthenaSetup.sh" path="myAthenaSetup.sh" size="339" user="Main.CatrinBernius" version=""
 
META FILEATTACHMENT attr="h" autoattached="1" comment="A n example of a jdl file" date="1152980708" name="simulate.jdl" path="simulate.jdl" size="769" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="This is an example of a jobOptions file" date="1152980708" name="simulate.py.txt" path="simulate.py.txt" size="1717" user="StathisStefanidis" version="1.1"
Changed:
<
<
META FILEATTACHMENT attachment="myAthenaSetup.sh" attr="" comment="Athena Setup commands" date="1173955959" name="myAthenaSetup.sh" path="myAthenaSetup.sh" size="339" stream="myAthenaSetup.sh" user="Main.CatrinBernius" version="0"
>
>
META FILEATTACHMENT attachment="mygangajob.py.txt" attr="" comment="GANGA job script example" date="1173956003" name="mygangajob.py.txt" path="mygangajob.py" size="3743" stream="mygangajob.py" user="Main.CatrinBernius" version="0"

Revision 122007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 245 to 245
 

META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="checkStatus.sh" path="checkStatus.sh" size="122" user="StathisStefanidis" version="1.1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Grid setup commands" date="1173955919" name="mygridsetup.sh" path="mygridsetup.sh" size="217" user="Main.CatrinBernius" version=""
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="simulate.sh" path="simulate.sh" size="1659" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="" autoattached="1" comment="Setup commands for GANGA" date="1173955868" name="myGangaSetup.sh" path="myGangaSetup.sh" size="419" user="Main.CatrinBernius" version=""
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="createSubmitFiles.sh" path="createSubmitFiles.sh" size="798" user="StathisStefanidis" version="1.1"
Line: 252 to 253
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="This is an example of a jobOptions file" date="1152980708" name="simulate.py.txt" path="simulate.py.txt" size="1717" user="StathisStefanidis" version="1.1"
Changed:
<
<
META FILEATTACHMENT attachment="mygridsetup.sh" attr="" comment="Grid setup commands" date="1173955919" name="mygridsetup.sh" path="mygridsetup.sh" size="217" stream="mygridsetup.sh" user="Main.CatrinBernius" version="0"
>
>
META FILEATTACHMENT attachment="myAthenaSetup.sh" attr="" comment="Athena Setup commands" date="1173955959" name="myAthenaSetup.sh" path="myAthenaSetup.sh" size="339" stream="myAthenaSetup.sh" user="Main.CatrinBernius" version="0"

Revision 112007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 246 to 246
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="checkStatus.sh" path="checkStatus.sh" size="122" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="simulate.sh" path="simulate.sh" size="1659" user="StathisStefanidis" version="1.1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Setup commands for GANGA" date="1173955868" name="myGangaSetup.sh" path="myGangaSetup.sh" size="419" user="Main.CatrinBernius" version=""
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="createSubmitFiles.sh" path="createSubmitFiles.sh" size="798" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="A n example of a jdl file" date="1152980708" name="simulate.jdl" path="simulate.jdl" size="769" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="This is an example of a jobOptions file" date="1152980708" name="simulate.py.txt" path="simulate.py.txt" size="1717" user="StathisStefanidis" version="1.1"
Changed:
<
<
META FILEATTACHMENT attachment="myGangaSetup.sh" attr="" comment="Setup commands for GANGA" date="1173955868" name="myGangaSetup.sh" path="myGangaSetup.sh" size="419" stream="myGangaSetup.sh" user="Main.CatrinBernius" version="0"
>
>
META FILEATTACHMENT attachment="mygridsetup.sh" attr="" comment="Grid setup commands" date="1173955919" name="mygridsetup.sh" path="mygridsetup.sh" size="217" stream="mygridsetup.sh" user="Main.CatrinBernius" version="0"

Revision 102007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 251 to 251
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="This is an example of a jobOptions file" date="1152980708" name="simulate.py.txt" path="simulate.py.txt" size="1717" user="StathisStefanidis" version="1.1"
Added:
>
>
META FILEATTACHMENT attachment="myGangaSetup.sh" attr="" comment="Setup commands for GANGA" date="1173955868" name="myGangaSetup.sh" path="myGangaSetup.sh" size="419" stream="myGangaSetup.sh" user="Main.CatrinBernius" version="0"

Revision 92007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.

Revision 82007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 160 to 160
 

Running a Grid Job using GANGA

Changed:
<
<

*Brief description of GANGA components

>
>

Brief description of GANGA components

 
  • Job-registry component allows for storage and recovery of job information, and allows for job objects to be serialized
  • Script-generation component translates a job's work flow into the set of instructions to be executed when the job is running
Line: 170 to 170
  Detailed Information about GANGA can be found in http://documentation.hepcg.org/res/ap3/w_301106.pdf.
Changed:
<
<

* How to install, setup and use GANGA

>
>

* How to install, setup and use GANGA*

 
Changed:
<
<

(a) *Install GANGA

>
>

(a) Install GANGA

 To intall GANGA, follow the instructions from https://twiki.cern.ch/twiki/bin/view/Atlas/WorkBookGanga It is optional to modify the GANGA start-up parameters, descriped in Section 1.4 in https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427
Changed:
<
<

(b) *Setup Grid environment and GANGA

>
>

(b) Setup Grid environment and GANGA

 
Changed:
<
<

(c) *Setup Athena

>
>

(c) Setup Athena

 In the case for ttH analysis: myAthenaSetup.sh
Changed:
<
<

(d) *Run GANGA

>
>

(d) Run GANGA

 Start GANGA from the cmt or run directory of the Athena working area that has been setup before with just typing: ganga To execute a script to submit a job in GANGA, type in GANGA command line (not GUI version): execfile('/home/bernius/testarea/11.0.42/PhysicsAnalysis/AnalysisCommon/ttHHbb/run/mygangajob.py') (This script can be found here: mygangajob.py) The job can also be submitted by just typing: ganga mygangajob.py
Changed:
<
<

* Some GANGA commands and things to know:

>
>

Some GANGA commands and things to know

  *exit GANGA: ctrl-D *get online help: help *exit help: ctrl-D

Revision 72007-03-15 - CatrinBernius

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 157 to 157
  A detailed guide for the LCG is given in Link 5.
Added:
>
>

Running a Grid Job using GANGA

*Brief description of GANGA components

  • Job-registry component allows for storage and recovery of job information, and allows for job objects to be serialized
  • Script-generation component translates a job's work flow into the set of instructions to be executed when the job is running
  • Job-submission component submits work flow script to target (batch system, creates JDL file and translates resource request)
  • File-transfer component handles transfer between sites of input & output files, adds commands to work flow script on submission
  • Job-monitoring component performs queries of job status

Detailed Information about GANGA can be found in http://documentation.hepcg.org/res/ap3/w_301106.pdf.

* How to install, setup and use GANGA

(a) *Install GANGA

To intall GANGA, follow the instructions from https://twiki.cern.ch/twiki/bin/view/Atlas/WorkBookGanga It is optional to modify the GANGA start-up parameters, descriped in Section 1.4 in https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427

(b) *Setup Grid environment and GANGA

(c) *Setup Athena

In the case for ttH analysis: myAthenaSetup.sh

(d) *Run GANGA

Start GANGA from the cmt or run directory of the Athena working area that has been setup before with just typing: ganga To execute a script to submit a job in GANGA, type in GANGA command line (not GUI version): execfile('/home/bernius/testarea/11.0.42/PhysicsAnalysis/AnalysisCommon/ttHHbb/run/mygangajob.py') (This script can be found here: mygangajob.py) The job can also be submitted by just typing: ganga mygangajob.py

* Some GANGA commands and things to know:

*exit GANGA: ctrl-D *get online help: help *exit help: ctrl-D *repository for input/output files for every job is located by default at: $HOME/gangadir/workspace/Local *view job repository: jobs *view subjobs with: subjobs *to get info about specific jobs: j (if defined) or jobs[jobid] *get jobid number: print j.id *summary of a job: j *retrieve job ojects from a registry: print jobs[jobid].status *get output info: j.outputdata *kill running or queued job: jobs[jobid].kill() *remove job with status new: jobs[jobid].remove() *delete job with status failed/completed: del jobs[jobid] *view job output directory of finished jobs that is retrieved back to the job repository: jobs[jobid].peek() *view stdout log file of finished job: jobs[jobid].peek('stdout', 'cat') *export job configuration to a file: export(jobs[jobid], '~/jobconf.py') *load job configuration form file: load('~/jobconf.py') *status of job: j.status *IPython prompt tab expansion (methods with underscores are methods related to the implementation and should not be used directly): j. get list of backends: backends *get list of application: applications *Input Sandbox: *GANGA keeps the input sandbox for all jobs in $HOME/gangadir/workspace so there might be quota problems *The size is by default 10MB -> Submit failed because "JobSizeException: Job Size exceeds limits." , look at tarfile in /gangadir/workspace/Local/ how big the file is

 

Links

Revision 62007-01-31 - LilyAsquith

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 158 to 158
 A detailed guide for the LCG is given in Link 5.

Links

Changed:
<
<
>
>
 

Revision 52007-01-31 - LilyAsquith

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 158 to 158
 A detailed guide for the LCG is given in Link 5.

Links

Added:
>
>
 
Line: 182 to 183
 

Changed:
<
<
META FILEATTACHMENT attr="h" comment="A n example of a jdl file" date="1149697594" name="simulate.jdl" path="simulate.jdl" size="769" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149699300" name="simulate.sh" path="simulate.sh" size="1659" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="This is an example of a jobOptions file" date="1149702015" name="simulate.py.txt" path="simulate.py" size="1717" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149703592" name="createSubmitFiles.sh" path="createSubmitFiles.sh" size="798" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149703804" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149703825" name="checkStatus.sh" path="checkStatus.sh" size="122" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149703841" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"
>
>
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="checkStatus.sh" path="checkStatus.sh" size="122" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="simulate.sh" path="simulate.sh" size="1659" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="createSubmitFiles.sh" path="createSubmitFiles.sh" size="798" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="A n example of a jdl file" date="1152980708" name="simulate.jdl" path="simulate.jdl" size="769" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1152980708" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" autoattached="1" comment="This is an example of a jobOptions file" date="1152980708" name="simulate.py.txt" path="simulate.py.txt" size="1717" user="StathisStefanidis" version="1.1"

Revision 42006-06-09 - StathisStefanidis

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 39 to 39
 

Prepare a Grid Job

Added:
>
>
The detailed guide/manual for the LCG is given in Link 5.
 It is essential to understanbd that between you and the site that your job will eventually run, there is the Resource Broker (RB), which controls and distributes the jobs. For us, the RB is RAL. For example, you sumbit your jobs there, and you retrieve your job from there (you will that when you perform these requests). The concept is that the user doesn't have direct contact with the final site.
Line: 153 to 155
 
Added:
>
>
A detailed guide for the LCG is given in Link 5.
 

Links

Line: 163 to 167
 
Added:
>
>
 

Bibliography

Revision 32006-06-08 - StathisStefanidis

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"

Introduction

Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into.
Line: 103 to 103
 But there is the possibility that the site which hosts the file you want may not be available. Therefore you must make replicas of that file. A replica means that the file name will be same, but it will be hosted in different places. The command lcg-rep does this job.
Changed:
<
<
  • TIP Tip: It is good to have the file at quite few places so that you make sure that it will be copied successfully. It is also wise to try to copy it to sites
outside UK, since sometimes, the GRID problems are country-dependent.
>
>
  • TIP Tip: It is good to have the file at quite few places so that you make sure that it will be copied successfully. It is also advised to try to copy it to sites outside UK, since sometimes, the GRID problems are country-dependent.
 

This is what we do in the first line of the sh file. We give the sites that we made replicas of our generated samples and then, by checking each time if the

Revision 22006-06-07 - StathisStefanidis

Line: 1 to 1
 
META TOPICPARENT name="HEPGroup.AtlasStuff"
Added:
>
>

Introduction

 Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into. Here, I'll give rough instructions but I want to focus on tips and give scripts that will make a physicist run his/her jobs and get results fast.

A very good introduction to GRID is given by Steve Lloyd (see link 2) but also at the Atlas Wiki (see link 3).

Changed:
<
<
In terms of "bureaucracy" you will need (a) to get a GRID certificate and (b) to join a Virtual Organisation
>
>
In terms of "bureaucracy" you will need (a) to get a GRID certificate and (b) to join a Virtual Organisation (VO), which -for us- is the atlas VO
 The certicate will be valid only for the machine that you used to issue it. This is very important, since if you need to have access to web-pages (like for example the Grid User Support), you will need to do so using the machine you got the certificate from. Otherwise, you should log-in to that machine and from there do whatever you wish...
Line: 28 to 29
  grid-proxy-init -valid 24:30
Changed:
<
<
That will give you a proxy for 1 day (24 hours) and 30 minutes. Caution!! You should make sure that your proxy is valid for the whole period of the run of your jobs. If for example you have the above proxy and you run full simulation of 100 events, then your job will be killed at the expire of your proxy.
  • Tip: You can use the grid-proxy-(TAB) to get all the commands and --help at the end of the command to see the syntax.
>
>
That will give you a proxy for 1 day (24 hours) and 30 minutes.
 
Changed:
<
<
  • Tip: 3 days proxy is enough for full simulation of 100 events (multi-particle final state samples).
>
>
  • ALERT! Caution:You should make sure that your proxy is valid for the whole period of the run of your jobs. If for example you have the above proxy and you run full simulation of 100 events, then your job will be killed at the expire of your proxy.
 
Changed:
<
<
Running a Grid Job
>
>
  • TIP Tip: You can use the grid-proxy-(TAB) to get all the commands and --help at the end of the command to see the syntax.

  • TIP Tip: 3 days proxy is enough for full simulation of 100 events (multi-particle final state samples).

Prepare a Grid Job

  It is essential to understanbd that between you and the site that your job will eventually run, there is the Resource Broker (RB), which controls and distributes the jobs.
Changed:
<
<
For us, the RB is RAL. For example, you sumbit your jobs there, and you retrieve your job from there. The concept is that the user doesn't have
>
>
For us, the RB is RAL. For example, you sumbit your jobs there, and you retrieve your job from there (you will that when you perform these requests). The concept is that the user doesn't have
 direct contact with the final site.

To make the first step, run the HelloWorld, described at Steve's notes (link 2). There are three main componets of your job: (a) the .jdl file, which gives the instructions to the RB of what your jobs will need. (b) you .sh file , which is the executable script (the same that we submit at the PBS) and (c) your .py file, which is the normal python jobOptions file that we all run within ATHENA.

Added:
>
>
One example of these files can be found on the work-book. Another example is given here. The full simulation (GEANT) of (Pythia) generated events is used as a case study.

(a) *simulate.jdl

* simulate.jdl: That is an example of a jdl file

The first line declares which file is the executable one. The second and the third line define the names of the files to dump the errors and the print-outs. In the fourth line we give the files that we want to be copied at the site. For example, I copy the executable of course, and the jobOptions file which I will use to run ATHENA. The OutputSandbox variable defines which files I want to get back. One of these is of course the GEANT output (Hits). Finally, the Requirements variable gets all our options: we need to define within which VO we will run our jobs (here is atlas) and what release we want to use (here 11.0.4). So the RB will search all the sites in the atlas VO which have the release 11.0.4. And we finally require our jobs to run in the long queue by adding the other.GlueCEPolicyMaxCPUTime > 120

  • TIP Tip: You can check which sites satisfy all of your requirements by typing:

edg-job-list-match --vo atlas simulate.jdl

The next part of the line has the sites that we want to exclude (for instance because we noticed that they're not correctly setup).

  • ALERT! Caution: The requirements line must be continuous, without line breaks.

(b) *simulate.sh

* simulate.sh: That is an example of a sh file

This is a quite long file to explain, but it is very simple to understand, even for someone with a basic knowledge of bash commands. There few important things to mention here:

  • Since our generated files are very big to be transfered (GRID allows only up to a certain amount of MB to be transfered with the jdl file) we need to copy them
to the site that the job is running and which of course we don't know and control. Therefore we must first copy them to site(s), from where we can retrieve them. It is essential that we register the file to the grid. Say for example that you need to transfer the file /home/storage/fileGenEvents.pool.root to the site se1.pp.rhul.ac.uk. You will have to type:

lcg-cr -d se1.pp.rhul.ac.uk -l lfn:/grid/atlas/fileGenEvents.pool.root --vo atlas file:////home/storage/fileGenEvents.pool.root

The above command will make copy of that file with a name of /grid/atlas/fileGenEvents.pool.root. But the file will get a unique Identification Code (something like file7764465a-55ca-4396-85e8-655c86d2c1bd) which identifies where exactly it is.

* ALERT! Caution: All the file names should start with the /grid/atlas

* TIP Tip: You can check all the available storage elements by typing: lcg-infosites --vo atlas se. A --help will explain how to use this and all the lcg commands.

You can check the existence of the file by typing:

lcg-lr --vo atlas lfn:/grid/atlas/fileGenEvents.pool.root

and copy it by typing:

lcg-cp --vo atlas lfn:/grid/atlas/fileGenEvents.pool.root file:///home/storage/copiedFile.pool.root

But there is the possibility that the site which hosts the file you want may not be available. Therefore you must make replicas of that file. A replica means that the file name will be same, but it will be hosted in different places. The command lcg-rep does this job.

  • TIP Tip: It is good to have the file at quite few places so that you make sure that it will be copied successfully. It is also wise to try to copy it to sites
outside UK, since sometimes, the GRID problems are country-dependent.

This is what we do in the first line of the sh file. We give the sites that we made replicas of our generated samples and then, by checking each time if the copy has been successful, we loop over the site to get the file.

  • The other line that is important (just for the simulation step) is that the file geomDB_sqlite, which is needed by GEANT, must be copied at the local area:
 
Changed:
<
<
Links
>
>
cp $SITEROOT/atlas/offline/data/geomDB_sqlite $PWD

(b) *simulate.py

* simulate.py: This is an example of a jobOptions file

This is a well-known file. Nothing to stress.

Running a Grid Job

In order to run a grid job you will to type:

edg-job-submit --vo atlas -o simulate_jobIDfile simulate.jdl

To check the status:

edg-job-status -i simulate_jobIDfile

To retrieve the output

edg-job-get-output -i simulate_jobIDfile

The important key to mention here is the file simulate_jobIDfile, which includes the identification of the submitted job. Unfortunately, the code given to the job is a random one (like akqIkNdtGa4LPNUTUrsWgg). Therefore the book-keeping must be very carefull.

If you want to submit many jobs, it is wise to make first a template of you jdl, sh and py files. Then you can use the script:

to create as many files as you want.

You can use the following scripts to submit your jobs, check the status of the submitted jobs and retrieve the completed jobs:

Links

 
Line: 56 to 165
 

Changed:
<
<
Bibliography
>
>

Bibliography

 
  • 1. The GRID: Blueprint for a New Computing Infrastructure

-- StathisStefanidis - 07 Jun 2006

Added:
>
>

META FILEATTACHMENT attr="h" comment="A n example of a jdl file" date="1149697594" name="simulate.jdl" path="simulate.jdl" size="769" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149699300" name="simulate.sh" path="simulate.sh" size="1659" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="This is an example of a jobOptions file" date="1149702015" name="simulate.py.txt" path="simulate.py" size="1717" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149703592" name="createSubmitFiles.sh" path="createSubmitFiles.sh" size="798" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149703804" name="multiSubmitGrid.sh" path="multiSubmitGrid.sh" size="304" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149703825" name="checkStatus.sh" path="checkStatus.sh" size="122" user="StathisStefanidis" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1149703841" name="retrieveOutput.sh" path="retrieveOutput.sh" size="126" user="StathisStefanidis" version="1.1"

Revision 12006-06-07 - StathisStefanidis

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="HEPGroup.AtlasStuff"
Introductions to GRID can be found in several places (eg. see link 1 or bibliography 1), depending which level the user wants to get into. Here, I'll give rough instructions but I want to focus on tips and give scripts that will make a physicist run his/her jobs and get results fast.

A very good introduction to GRID is given by Steve Lloyd (see link 2) but also at the Atlas Wiki (see link 3). In terms of "bureaucracy" you will need (a) to get a GRID certificate and (b) to join a Virtual Organisation The certicate will be valid only for the machine that you used to issue it. This is very important, since if you need to have access to web-pages (like for example the Grid User Support), you will need to do so using the machine you got the certificate from. Otherwise, you should log-in to that machine and from there do whatever you wish... These steps now take few days (instead of few months, as it used to be)

After that, you will need to get an account on a User Interface and install your certificate. The final aim of doing these steps is to create your proxy certificate, which will allow you to have access to the grid for a desired period of time.

The guides on how to do the above steps are on our group's web-page (see link 4)

By now, you will be able to use the grid. Each time you log in, you should do:

source /usr/local/lcg/etc/profile.d/grid_env.sh

For convinience, you could add this command into your shell script (.bashrc usually)

You have also to get or check if you have a valid grid proxy and if so, for how long. To get a proxy type:

grid-proxy-init -valid 24:30

That will give you a proxy for 1 day (24 hours) and 30 minutes. Caution!! You should make sure that your proxy is valid for the whole period of the run of your jobs. If for example you have the above proxy and you run full simulation of 100 events, then your job will be killed at the expire of your proxy.

  • Tip: You can use the grid-proxy-(TAB) to get all the commands and --help at the end of the command to see the syntax.

  • Tip: 3 days proxy is enough for full simulation of 100 events (multi-particle final state samples).

Running a Grid Job

It is essential to understanbd that between you and the site that your job will eventually run, there is the Resource Broker (RB), which controls and distributes the jobs. For us, the RB is RAL. For example, you sumbit your jobs there, and you retrieve your job from there. The concept is that the user doesn't have direct contact with the final site.

To make the first step, run the HelloWorld, described at Steve's notes (link 2). There are three main componets of your job: (a) the .jdl file, which gives the instructions to the RB of what your jobs will need. (b) you .sh file , which is the executable script (the same that we submit at the PBS) and (c) your .py file, which is the normal python jobOptions file that we all run within ATHENA.

Links

Bibliography

  • 1. The GRID: Blueprint for a New Computing Infrastructure

-- StathisStefanidis - 07 Jun 2006

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback