Difference: AtlasDQ2 (1 vs. 4)

Revision 42010-05-06 - JamesRobinson

Line: 1 to 1

META TOPICPARENT	name="HEPGroup.AtlasStuff"

DQ2 Things

dq2 is handy software for copying files (things like AODs, ESDs, EVNTs, TAGs...) off the grid. To set it up, do this:

Line: 6 to 6

source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
voms-proxy-init -voms atlas

Changed:

<
<

export DQ2_LOCAL_SITE_ID=UKI-LT2-UCL-CENTRAL_LOCALGROUPDISK

>
>

export DQ2_LOCAL_SITE_ID=UKI-LT2-UCL-HEP_LOCALGROUPDISK

Then there are two main commands you'll probably feel like using:

Line: 18 to 17

The dq2-ls lists files in a dataset. The dq2-get gets them. The -g tells it to list files in the global catalogue, -r says look remotely, -v says tell me what the hell you're doing.

Changed:

<
<

Dataset names are either official things like mc12.005200.T1_McAtNlo_Jimmy.evgen.EVNT.v12000401 or your own job output like user.adamdavison.005667.ntuples.v6

>
>

Dataset names are either official things like mc12.005200.T1_McAtNlo_Jimmy.evgen.EVNT.v12000401 or your own job output like user.adamdavison.005667.ntuples.v6. You can also use "*" as a wildcard in dataset names.

If you want a certain number of files from the dataset without caring which ones, you can do

dq2-get -n<desired number> <dataset name>

You also sometimes want a whole dataset rather than just one or two files for testing, then you just do:

Changed:

<
<

dq2-get -r -v <dataset name>

>
>

dq2-get -r -v <dataset name>

Changed:

<
<

And hope you've got enough free disk space.

-- AdamD - 14 Nov 2007

>
>

And hope you've got enough free disk space!

Combining multiple datasets

Revision 32009-05-17 - SimonDean

  META TOPICPARENT 
 name="HEPGroup.AtlasStuff" 

 DQ2 Things 
dq2 is handy software for copying files (things like AODs, ESDs, EVNTs, TAGs...) off the grid. To set it up, do this:
- META TOPICPARENT
+ name="HEPGroup.AtlasStuff"
-<
<
+source /grid/LCG-share/UI/glite/external/etc/profile.d/grid-env.sh
source /grid/LCG-share/DQ2/endusers/setup.sh.UCL-HEP
->
>
+source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
 voms-proxy-init -voms atlas
->
>
+export DQ2_LOCAL_SITE_ID=UKI-LT2-UCL-CENTRAL_LOCALGROUPDISK
 Then there are two main commands you'll probably feel like using:
-<
<
+dq2_ls -g 
dq2_get -r -v
->
>
+dq2-ls -g 
dq2-get -r -v
-<
<
+The dq2_ls lists files in a dataset. The dq2_get gets them. The -g tells it to list files in the global catalogue, -r says look remotely, -v says tell me what the hell you're doing.
->
>
+The dq2-ls lists files in a dataset. The dq2-get gets them. The -g tells it to list files in the global catalogue, -r says look remotely, -v says tell me what the hell you're doing.
 Dataset names are either official things like mc12.005200.T1_McAtNlo_Jimmy.evgen.EVNT.v12000401 or your own job output like user.adamdavison.005667.ntuples.v6

You also sometimes want a whole dataset rather than just one or two files for testing, then you just do:
-<
<
+dq2_get -r -v
->
>
+dq2-get -r -v
 And hope you've got enough free disk space.
 dq2-register-location users.jamesmonk.test IN2P3-CC_DATADISK
-<
<
+(you can find out where the original dataset was present by using dq2-list-dataset-replicas
)
->
>
+(you can find out where the original dataset was present by using
dq2-list-dataset-replicas
)
 Since that's quite a lot to do if you plan on using data from very many datasets (just one FDR run was over 700 files) I have a little script that creates a new combined dataset for you.  You basically just create a file called users.$USERNAME.whatever.you.want and list all the datasets you'd like in it.  Then run ./user_dataset.sh users.$USERNAME.whatever.you.want and it creates a new dataset with the same name and contains all of the data from the datasets in that file.

Revision 22008-04-23 - JamesMonk

Line: 1 to 1

META TOPICPARENT	name="HEPGroup.AtlasStuff"

DQ2 Things

dq2 is handy software for copying files (things like AODs, ESDs, EVNTs, TAGs...) off the grid. To set it up, do this:

Line: 29 to 29

And hope you've got enough free disk space.

-- AdamD - 14 Nov 2007 \ No newline at end of file

Added:

>
>

Combining multiple datasets

I found it useful to be able to add all of the files from several datasets to one single large dataset. For example, the FDR data is split so that a single run is composed of several datasets in different locations. In order to run a job on a whole run's worth of data you would either need to set several Ganga jobs going or create your own dataset from the existing files. Since I could not find how to do that documented anywhere else I do so here:

Get access to some additional DQ2 tools:

source /afs/cern.ch/atlas/offline/external/GRID/ddm/current/dq2.sh

Create your new dataset (replace my grid user name with your own!):

dq2-register-dataset users.jamesmonk.test

You need to know the logical file names (lfn) and ids of the files you want to add by using the command

dq2-list-files fdr08_run1.0003070.MinBias.recon.ESD.o1_r12

and then register some of those files in the dataset you just created (you need the lfn and the id from the previous command as argument):

dq2-register-files users.jamesmonk.test fdr08_run1.0003070.MinBias.recon.ESD.o1_r12._lb0007._0001.1 688D2582-09DA-DC11-BEB6-000423D992A8

finally you will have to register a location at which those files are actually present

dq2-register-location users.jamesmonk.test IN2P3-CC_DATADISK

(you can find out where the original dataset was present by using

dq2-list-dataset-replicas

)

Since that's quite a lot to do if you plan on using data from very many datasets (just one FDR run was over 700 files) I have a little script that creates a new combined dataset for you. You basically just create a file called users.$USERNAME.whatever.you.want and list all the datasets you'd like in it. Then run ./user_dataset.sh users.$USERNAME.whatever.you.want and it creates a new dataset with the same name and contains all of the data from the datasets in that file.

Note that one feature of this is that the new dataset probably will not be complete in any single location because the original files were scattered around the grid. This should not matter too much since Ganga's DQ2 job splitter seems now to be splitting the jobs up and sending them to different locations.

-- JamesMonk - 23 Apr 2008

user_dataset.sh: script to create new user dataset from multiple datasets

META FILEATTACHMENT	attachment="user_dataset.sh" attr="" comment="script to create new user dataset from multiple datasets" date="1208968657" name="user_dataset.sh" path="user_dataset.sh" size="1463" stream="user_dataset.sh" user="Main.JamesMonk" version="1"

Revision 12007-11-14 - AdamD

Line: 1 to 1

Added:

>
>

META TOPICPARENT	name="HEPGroup.AtlasStuff"

DQ2 Things

dq2 is handy software for copying files (things like AODs, ESDs, EVNTs, TAGs...) off the grid. To set it up, do this:

source /grid/LCG-share/UI/glite/external/etc/profile.d/grid-env.sh
source /grid/LCG-share/DQ2/endusers/setup.sh.UCL-HEP
voms-proxy-init -voms atlas

Then there are two main commands you'll probably feel like using:

dq2_ls -g <dataset name>
dq2_get -r -v <dataset name> <file in dataset>

The dq2_ls lists files in a dataset. The dq2_get gets them. The -g tells it to list files in the global catalogue, -r says look remotely, -v says tell me what the hell you're doing.

Dataset names are either official things like mc12.005200.T1_McAtNlo_Jimmy.evgen.EVNT.v12000401 or your own job output like user.adamdavison.005667.ntuples.v6

You also sometimes want a whole dataset rather than just one or two files for testing, then you just do:

dq2_get -r -v <dataset name>

And hope you've got enough free disk space.

-- AdamD - 14 Nov 2007

View topic | History: r4 < r3 < r2 < r1 | More topic actions...