Data Analysis Sandbox


This is a simple example of analysing some (simulated) LHC data using C++. It is based on two samples: ZH production, with Z→μμ and H→bb (the signal), and Zbb production with Z→ee (the background). The aim is to find similarities and differences between the signal and background events, by making histograms of various quantities.

I run this on a linux machine, it should also work on a mac or with windows. To run this, you will need: a C++ compiler, like g++; a text editor (I use emacs); Root. The first two are readily available on linux/mac.

Root & Fastjet

For better or worse, Root is by far the dominant data analysis/histogramming package used in particle physics. It is freely available from CERN, so download the "Pro" version then follow the installation instructions. Online searching will probably turn up the answers to any problems you encounter.

Fastjet is a package to cluster particles into jets. You can install this anywhere on your system, but I have a short script to download, build and install it in the current directory. Execute it:

Again, some additional packages may be required.

Finally, some variables must be set for each new terminal or session: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/PATH/TO/FASTJET
. /PATH/TO/ROOT/bin/
these two lines could be copied into to your ~/.bashrc file, for example. The fastjet path should point to the directory containing, for example ./fastjet-install/lib, otherwise you will see errors like error while loading shared libraries:

The Code

The analysis code consists of analysis.cpp, analysis.h, and a Makefile to compile them. You will also need to download the plot.cpp file, more of which later. Finally, two data files, VH_events.root and Zbb_events.root. Download all 5 files to the same directory, then open the Makefile and check the paths to Then compile & run the code using:

By default, the code runs over VH_events.root, and writes to VH_histograms.root. To run over the other file: ./analysis Zbb_events.root Zbb_histograms.root

The result is a new root file containing histograms. These can be opened and viewed in root:

root VH_histograms.root
Or in root open the gui new TBrowser and click on the plots. To quit root, type: . q

The plot.cpp code will open two root files, overlay a plot and save the result as an eps file. To run this: ./plot

Changing the Code

The main file to edit will be analysis.cpp. This code loops over all the events in the file: for (Long64_t jentry=0; jentry<nentries;jentry++){ Inside this loop, it loops over all the particles in each event: for(int i=0; i<_pdgid.size();i++) { and it looks for stable particles (if( _status[i]==1)) that are muons (if(_pdgid[i]==13)). A full list of particle ID codes is available here.
Further down, the code makes the combined dimuon 4-vector (TLorentzVector sum = v1 + v2;) and fills the histogram with the invariant mass of this new 4-vector.

Here are some good exercises. After making changes to the code, you will need to recompile (make), fixing any compilation errors, before you can run the updated code.


Finally, it is quite often worth fitting a parametric curve to the data. This code is a basic fitting example provided by the root authors. You could try including this into the plot.cpp code and fitting the dimuon mass distribution.