UCL E-Science Validation Work

atlas-ucl-offline

Last Update: 23/11/04 (BS)

Key:
    Not started
    In progress
    Awaiting input from some other activity
    Completed and tested

L1 Deliverables (from PMB 2/5/04)

Date Item In Charge Participants Status
10/2004 Requirement capture and preliminary design document.
01/2005 Prototype framework release.
04/2005 Prototype evaluation document. Pending
03/2006 Use of framework for all major packages in the ATLAS release. Pending
03/2007 Use of revised framework mandatory for acceptance into ATLAS software release. Pending


Fine Deliverables

Date Due Item In Charge Participants Status Status Date Comments
26/05/04 Add JiveXML jobs BS JD 17/06/04 -
01/06/04 Ensure all numbered releases from 8.0.0 on are present
on prod web page (8.0.0,8.0.1,8.0.2,8.0.3,8.2.0)x(opt,dbg)
BS BS 24/05/04 -
02/07/04 Copy option files to results directories when jobs end. PS   02/07/04 Will jobs that do not have jo be OK?
02/07/04 Dump environment variables into the log file. PS   02/07/04  
03/07/04 Provide a link to the RTT Documentation pages from the Results pages PS   03/07/04  
05/07/04 For jobs run under PBS: copy error log to results directory PS   07/07/04  
07/07/04 Write and post an Installation Guide PS   07/07/04  
09/07/04 Check the user guide intruction really work by following them. JD PS 09/07/04  
10/07/04 Add cmt show uses before running a job. Print to own log file. Print the env vars to there own log file. Both log files copied to results + displayed PS   10/07/04  
10/07/04 Added wild carded keep files. PS   10/07/04  
11/07/04 Write a jobs config file to XML convertor script. PS   11/07/04  
11/07/04 Find out how to parse a Jobs XML file PS   11/07/04  
12/07/04 Integrate jobs xml file into RTT PS   12/07/04 In RTTprod
23/07/04 Add JobOptions search path PS   12/07/04 In RTTprod
23/07/04 Make JobDescriptor only a base clase with class identifier variable. Clean out initialise PS   11/07/04  
23/07/04 Write RuleCheckerJobDescriptor class. PS   31/10/04 Add test on legal stating point in release tree
23/07/04 Write RuleCheckerScriptWriter class. PS   20/07/04  
23/07/04 Think how to handle 'user error' PS   20/07/04 eg: Bad config file; non existant option files; bad entry point RuleChecker
Provide a selfCheck method for descriptors
27/07/04 Correct the log checker test for RuleChecker PS   31/10/04  
04/06/04 Post-all job action: chaining together of ntuple files PS   31/10/04  
24/05/04 Ensure we can use histograms as reference items ?   31/10/04  
29/06/04 Add Trigger jobs to the RTT EN   31/10/04  
01/07/04 Place a startup script named startRTT in RunTimeTester/src PS   31/10/04  
01/07/04 Rename the top html page output by the RTT to "RTTpage1.html" from BS   31/10/04  
23/07/04 Fix the cmt uses problem: from the run script, need to cd to the Package/cmt file prior to issuing 'cmt show uses' and then cd back to the run directory. PS   31/10/04  
23/07/04 Place a startup script named startRTT in RunTimeTester/src PS   31/10/04  
25/07/04 Add a job to the nightlies that displays the use of histrogram superposition. Needed so we can show others, and so we know that this feature is working. PS   31/10/04  
25/07/04 Allow individual jobs to overide the refRelease PS   31/10/04  
08/08/04 Add trigger release jobs to RTT test runs. EN   31/10/04  
08/10/04 Provide ways of excercising the RTT without actaully running jobs. PS   31/10/04  
08/10/04 Provide ways of excercising the RTT without actaully running jobs. PS   31/10/04  
15/10/04 Provide a way to update parts of a RTT run - without rerunning all jobs EN, BS, PS   31/10/04  
15/10/04 Provide a single page visualisation of a RTT run (RTTmon) PS   31/10/04  
15/10/04 Implement a way of controlling error messgaes. Provide the possibility of urgency levels, and of straming to files, and the screen. PS   31/10/04  
26/10/04 Implement fix to enable RTT to run 8.8.0 and nightlies at UCL. EN   31/10/04  
31/10/04 Installation of numbered releases at UCL and adapting the RTT to run atlas numbered releases on the distribution kits. EN   30/04/05  
18/11/04 Setup a RTT current status website - daily reports of nightly running, numbered releases. BS   18/11/04  
18/11/04 Extract memory leak number into separate clearly labelled keep file for RecExCommon jobs BS   18/11/04  
ASAP RTT to run Valgrind BS   18/11/04  
ASAP Copy new 100 GeV single particle data files from CASTOR to UCL for egamma jobs BS   18/11/04 My account is not currently allowing the copy from CASTOR to local disk at CERN.
31/10/04 Clean up RTTmon exit. PS   18/11/04  
31/10/04 Clean up root macros used for Egamma. PS   18/11/04  
31/10/04 rewrite Anna'a macos 2+3 for egamma (Currently we only run 1 of her 3 macros) PS   18/11/04  
ASAP On the results web pages, currently groups that have no post-run checks show 0/tot.jobs rather than n/a. BS   18/11/04  
ASAP Create a keep file for each job which contains output of "ls -altF" on the run directory before and after athena is run. BS   23/11/04  
ASAP Add a summary job group table at the top of page 3 for each job group. BS   23/11/04  
14/12/04 Fix RTT LinuxInteractive running mode EN   16/12/04  
15/12/04 Implement a more informative status report mechanism for egammaWatcher. EN   30/12/04  
10/01/05 Allow running on any branch nightly (currently was N.X.0) BS   18/01/05  
10/01/05 Create a Histogram differ checker, write appropriate ROOT macros BS   18/01/05 Currently only bin-by-bin, but place holder for statistical checks is there.
10/01/05 Shift all Reporter message constructions into functions within the Reporter module BS   18/01/05  
18/01/05 Make the global reporter object global, rather than passing it around inside function calls BS   18/01/05  
10/01/05 Clean up the HTMLWriter module to hide more HTML code behind functions BS   18/01/05  
01/02/05 Remove the reporter keep files list feature (make use of minder keep files list directly) BS      
13/05/05 Modify RTT web results front page to accomodate kit running BS   13/05/05  
13/05/05 Modify RTT web results front page to contain start and complete date and time BS   13/05/05  
13/05/05 State all useful info (platform, build, etc) on all subsequent results page titles BS   13/05/05  
13/05/05 Change coloring scheme on results page 2 so that items colored red when there is a problem BS   13/05/05  
13/05/05 Provide a summary of the tests run at top of page 4 (individual job summary page) BS   13/05/05  
13/05/05 Provide a mechanism to allow information strings to be attached to keep files BS   13/05/05  
13/05/05 Set up a test area to validate latest RTT tags prior to transfer to prod BS   13/05/05  
13/05/05 Write DTDs for the unified configuration and top level RTT XML files PS   13/05/05  
13/05/05 Allow local config files to have an empty job list PS   13/05/05  
13/05/05 Place unified configuration file in RTT/share PS   13/05/05  
13/05/05 Recover failure reports BS   13/05/05  
13/05/05 Investigate (and fix) why page 2 'checks ok' column always says n/a BS   13/05/05  
20/05/05 Add branch info in all DB keys BS   13/05/05  
20/05/05 Sort DB keys e.g when using multiple datasets, to remover order dependancy BS   13/05/05  
20/05/05 Update RTT web documentation BS   18/10/05  
20/05/05 Investigate why info strings are not always attached to the 'top' job group keepfiles BS   07/07/05  
20/05/05 Better sorting algorithm for RTT front results page BS   17/05/05  
20/05/05 Why was 10.2.0 release added to N.0.X branch? BS   17/05/05  
20/05/05 DTDs for top level job group file and data set catalog BS   18/05/05  
20/05/05 Change all status.txt files from previous runs to include platform and run type info BS   07/07/05  
27/05/05 Investigate why (and stop) a package problem (e.g. invalid config. file) killing whole RTT process BS   07/07/05  
20/05/05 Shift addition of log, elog and JO files to keep file list into 'top' job group BS   13/05/05  
20/05/05 Amend page2GlobalFailure URL to correct one. BS   13/05/05  
27/05/05 Why does TestRun creation failure create a temporary page 2 with global failure message? BS   13/05/05  
13/05/05 Contact InDetRecExample people concerning their incorrect requirements file BS   13/05/05  
13/05/05 Add Steve Dallison new macros to InnerDetStats job BS   07/07/05 This job group is removed. Client now has own package.
20/05/05 Aid Moore group to add their RTT tests to their package BS   07/07/05  
20/05/05 Help LAr people add RTT tests to their package BS   07/07/05  
02/02/05 Investigate platform related features and add functionality to run on slc3 EN   20/03/05  
05/03/05 Add functionality to run the RTT on the atlas software distribution kits EN   29/04/05  
10/04/05 Running Atlfast on DC2 data EN   30/05/05  
06/04/05 Add functionality to run regression checks on installed files. These files would normally reside in the developer's package EN   03/05/05  
06/04/05 Allow user to call some RTT tools through configuration files. Through the same mechanism users can write their own tools which when installed can be run by the RTT EN   03/05/05  
05/05/05 Add MuonDigiExample job into the RTT EN   30/05/05  
11/05/05 Add a step by step example of running the RTT to the web documentation. It should be precise about things like RTT tag and configuration files. This would serve as hands-on exercise for potential users and possible refresher for customers EN   30/06/05  
11/05/05 Produce release notes for each major RTT release. This should correspond to the RTT tag in the corresponding atlas release. The document should describe the main features of the RTT and especially how things have moved on since the last release EN   30/06/05  
11/05/05 Ensure all packages are using the unified style configuration files and clean up the RTT EN   30/05/05  
11/05/05 Solve problems with atlfast configuration in the nightlies EN   20/05/05  
20/05/05 Put in ATN tags into the unified config file for AtlfastAlgs EN   17/06/05  
21/05/05 Make sure KV job runs to completion. Look for ways of minimising the adverse effects of huge logs EN   17/06/05  
20/05/05 Verify MuonDigiExample macros are corrected- Contact Developer EN   17/06/05  
20/05/05 Test RTT head version and report bugs, with aim of having a stable version with new features by 10.06.05 EN   10/06/05  
07/07/05 Append a column on results front page containing nicos date(link); BS   07/07/05  
07/07/05 Ensure that post-run tests (and actions?) are not performed if (Athena) job fails. Test column on web pages should then state "not run". BS   07/07/05  
07/07/05 Failed TestRuns should be reported to web page 2 BS   18/10/05  
07/07/05 Status log ---> XML format BS   07/07/05  
07/07/05 New data tags (, etc...) for the configuration XML file BS   07/07/05  
07/07/05 Each Minder should keep hold of the batch job ID it was attributed so that RTT can delete the job if nec. BS   18/10/05  
07/07/05 Page 2 group status should be cleaned up (started --> completed) if testrun killed (after launch) BS   18/10/05  
07/07/05 All web page content should be stored in XML files, and building the pages driven off of these. BS   07/07/05  
07/07/05 Wrapping the runscript so as to catch any problems. BS   07/07/05  
07/07/05 Separate log file containing info on TestRun start/end times, etc. so as to track any problems more easily. BS   18/10/05  
20/05/05 Extract relevant information from KV log and possibly delete the log because of its size EN   30/06/05  
20/06/05 Add new job group - G4AtlasApps to RTT nightly jobs EN   18/10/05  
20/06/05 Distinguish between actions and test: Action do not necessarily return a result. Test always return a result. Reporter keeps track of problems occuring when actions/tests are run EN   11/07/05  
28/06/05 Convert the configuration of the running of root macros to standard test format EN   18/10/05  
11/06/05 Allow actions and/or test to be configured for individual jobs EN      
25/05/05 Follow up the installation of Pyxml at CERN EN   18/10/05  
27/06/05 Document actions and tests EN   18/10/05  
30/06/05 Sample jobs in AtlfastAlgs running DC2 data EN   07/07/05  
18/10/05 Basic run timer factory for allowing different fixed global RTT run times BS   18/10/05  
18/10/05 Obtain RTT official email address (rtt@hep.ucl.ac.uk) BS   18/10/05  
01/11/05 Creation of a lock file in results and work dirs to prevent an RTT run overwriting another run processing the same release. Settable in config. file. BS      
01/11/05 Status "batchTimeOut" required for jobs who use more than the allowed CPU time on a given queue. BS      
01/11/05 Dynamic control of the global RTT run time. Minder objects chip in their "worth" in run time. BS      
01/11/05 New XML tag <email> in package config. file to indicate to whom to report job/test failures. BS   18/10/05  
18/10/05 New XML tag <noNightly> in package config. file to indicate that a given job should not be run for nightlies. BS   18/10/05  
01/11/05 Make a new RTT object that is a bag for run environment info (release, platform, etc.) BS      
18/10/05 'GMT' string on RTTpage1.html BS   18/10/05  
01/12/05 Split RTTpage1.html into the various branches. BS      
01/12/05 Failure Descriptors BS      
01/11/05 Allow in runScript.sh: athena [jobOptions.py] [properties.py] BS      
01/11/05 Allow user-specification of web page keep file sub folders BS      
01/01/06 RunTimer objects should be threaded (inherit from Python Timer object) BS      
01/01/06 Why are there so many repeated warning messages in failureReport.html? BS   22/11/05  
18/10/05 Web based validation service of package configuration XML files BS      
01/12/06 InDetRTTWatcher BS      
01/12/05 Put all remaining Checkers.py classes into RttLibraryTools BS      
18/10/05 Package XML configuration file added by default to web keep files. BS      
01/12/05 Allow specification of non-POOL data via new dataset tag. BS   18/10/05  
18/10/05 Clone Paths objects before constructing Descriptor objects in JobsXMLReader.py BS   18/10/05  
18/10/05 Improve calculation of branch value. BS   18/10/05  
18/10/05 Producing link on RTTpage1.html to RTT status page for daily monitoring plots. BS      
01/12/05 Generate from the RTT, the RTT status web page containing links to monitoring plots. BS      
18/10/05 Add new package web status 'globalTimeOut' and new indiv. job status 'jobTimedOut'. BS   18/10/05  
01/11/05 Ensure that nightly releases can be used as reference releases. BS   23/01/06  
01/11/05 Remove the basePath parameter from the code BS   23/01/06  
01/11/05 Split off numbered releases on RTTpage1.html into their own "branch" BS   23/01/06  
01/11/05 LinuxInteractive should get default setTimer behaviour (no time out). BS   04/11/05  
01/11/05 Create new branch 10.0.X and have N.0.X mean the current 11.0.X nightlies BS   26/10/05  
01/11/05 Remove Minder timers and have a timer after job has left the batch queue. BS   22/11/05  
01/11/05 Have TestRuns look at end of loop for existence of a file to mean shut down (because kill -15 no longer works) BS   04/11/05  
15/11/05 Setting up the RTT nightly running at Lancaster. This will serve as back-up for any disruptions at UCL EN   10/10/05  
15/11/05 Adding functionality to allow the RTT to plot history of specified events(information) EN   10/10/05  
30/11/05 Follow up the RTT gridification issues with Gianfranco EN   30/10/05  
30/11/05 Follow up the RTT kit running machinering in preparation for the running of short jobs needed for kit validation EN   30/10/05  
30/11/05 Think of ways to improve the current handling of chained athena jobs EN   30/10/05  
15/11/05 Run Sample RTT job on the grid EN/PS/GS   21/12/05  
15/12/05 Remove requirement for Athena + groupName in XML package files BS   23/11/05  
15/12/05 Use cmt show macros to speed up UserStuff Retriever PS   20/12/05  
15/12/05 Split up RTT.log into package log files PS   15/01/06  
17/01/06 Improve JiveXML TestConfiguration file PS   23/01/06  
17/01/06 Run 11.2.0 when available BS   23/01/06  
17/01/06 Run 11.2.0 Kit when available BS   23/01/06  
17/01/06 Run 11.0.4 when available BS      
17/01/06 Investigate why failure report for CaloRecEx refers to wrong job BS      
17/01/06 Investigate no DTD access from Lancaster EN      
17/01/06 Run Simulation Release jobs at Lancaster 11.2.0 EN      
17/01/06 Automatic file retrieval from Castor ??      
17/01/06 Athena options from config file ??      
17/01/06 Fix local running (CVSPackage constructor - version) PS      
13/02/06 Currently, res and work dirs for last week's, say, atlrel_2 are deleted on startup of this week's atlrel_2. This means that if the RTT process subsequently dies, the web pages are left in a horrible state. We should be checking first that we can put in place a meaningful page 1 and page2 for this new atlrel_2, and only then deleting the old res and work dirs. BS      
16/02/06 Replace os.system and all Popen commands with shellCommand.py BS      


Unscheduled ToDo

DateByItem
24/05/04PS run Atlfast jobs from Pool
24/05/04PS Use PI work to aid histogram comparisons
17/06/04BS Think about how to homogenise within the jobs config file, the different batch machine queue nomenclature.
29/06/04PS When working on histogram comparisons, consider what the online group is (will be) doing about comparisons.
30/06/04PS Think how to add a results presenter that shows hists, numbers in a table with the first column being the release identifier. This would mean bringing results from different releases together in one place.
25/07/04PS Think of ways to display results by check (and not just the and of all checks is is currently the case), and information about failures to help debug problems idenetified by the RTT.
31/10/04PS Add checker status to RTTmon.
15/12/04EN Think of ways to make RTT features(functionalities) callable from outside of the RTT.
15/12/04EN Add information on the versions of the files and external programs the RTT used for a specific RTT run.
06/01/05BS SimpleFileDiffer, HistogramDiffer and all checks should not loop over all binRef or asciiRefFiles. Users may wish to run over a subset.
04/03/05EN investigate why job group status jams on 'started'
04/03/05BS/EN RC claim success when they are not - fix
04/03/05EN User test for Atlfast in release
04/03/05BS Histogram differ in Atlfast (need to restructure AF macros)
04/03/05BS/EN Break up PostScripts.py into common tools that can be called from config files.
04/03/05EN When RTT kills jobs ensure all available kkeep files are copied
04/03/05EN check there are no sys.exit() calls that will kill the RTT
04/03/05BS/EN Simple Log Checker -> file grepper (user supplies file name)
04/03/05BS/EN SimpleFile Differ tag with optional subtags specifying files to diff. Default (no args) is all keepFiles


Comments

Date By Item
6/5/04PSxxx

Miscellaneous Design Thoughts