QEana2

Using QE Events To Estimate The Neutrino Flux As A Funcion Of Energy:

Overview:

The aim of this work is to use our knowledge of the QE and DIS cross sections and our ability to select QE and DIS events with a good efficiency to estimate the neutrino flux as a function of energy. One method to do this is as follows:

The cross section for neutrino-nucleon deep inelastic scattering is fairly well know at high energies (>20GeV). It is also relatively easy to select a clean sample of DIS events in the region ~20GeV as the DIS cross section is dominant at these energies.
The neutrino flux can hence be calculated at this energy by 'dividing out' the DIS cross section from a sample of DIS events with reconstructed neutrino energies at ~20GeV.
The shape of the cross section for quasi-elastic neutrino-nucleon interactions is well known down to energies ~500MeV and is flat in energy down to ~1GeV but the normalization of this cross section is not so well known.
If a relatively pure sample of QE events can be selected at neutrino energies of ~20GeV and with a background that we understand and can correct for then the flux estimate from DIS events can be used to fix the normalization of the QE cross section. It may be that ~10GeV is better for our purposes due to there being less DIS events and a larger flux.
Then with a relatively clean sample of QE events (that ideally have a flat selection efficiency over the region ~1-20GeV), and using the cross section normalization from DIS events, the neutrino flux can be estimated as a function of energy by 'dividing out' the QE cross section in a number of bins in reconstructed neutrino energy.

Discriminating variables:

I first wanted to see what sort of a QE sample I could get for neutrino energies up to 20GeV. I have used some MC generated by Mike that has a flat energy spectrum. The following plot shows the number of true CC and true CCQE events in bins of 0.5GeV:

I then searched for some variables with QE discriminating abilities. An obvious one is the reconstructed invariant mass squared and the next plot shows this for 4 different ranges of reconstructed neutrino energy:

(black=QE,blue=RES,red=DIS)

The relatively narrow peak of QE events around the mass of the proton squared allows a lot of background rejection. Other useful background rejecting variables are the numbers of showers and tracks in an event. An event with no showers is most likely QE and the number of tracks is useful as most QE events will have only one track reconstructed.

I then wanted to remove the main track from an event and look at some variables that used the remaining hits. I use the NtpSRTrack object to identify the track hits and remove these from further consideration. Where a hit is shared between the track and a shower I keep the hit but subtract 1 MIP from its PH. I also remove any hits that are further than 2m away in z from the event vertex as protons/pions should not travel further than this in the detectors. In a further effort to remove 'crosstalk-like' hits I also disregard any hits with a PH of less than 1.5pes.

I then construct some variables with the remaining hits for each event. The next plot shows the number of >20pe hits remaining. The 'number of high PH hits' variable tries to use the fact that RES and DIS events will have more particles produced at the vertex (pion+proton/pions respectively) than QE (where there should just be the proton). Also protons will tend to leave just a couple of high PH hits whereas pions will range out a bit further into the detector leaving more high PH hits.

(black=QE,blue=RES,red=DIS)

The total PH remaining after the hit removal steps is also useful as a discrminating variable and is plotted next. For QE events where the majority of the event PH is on the track this variable is low and gets progressively higher for RES and then DIS events.

(black=QE,blue=RES,red=DIS)

If there is NC contamination present in the sample then this will also have low PH remaining after track removal as with a NC event we only see a small fraction of the initial neutrino energy. The following variable takes care of this by taking the fraction of PH remaining to total PH in the event before any hit removal steps. QE events would be expected to have low values as most of the event PH is on the track with RES, DIS and NC higher.

(black=QE,blue=RES,red=DIS)

The final discriminating variable that I have considered is obtained by performing a Hough transform over the remaining hits. At first I had hoped to be able to spot the stubby proton track from QE events but this is very hard to do. The next plot shows the size of the peak in Hough space for the 4 different energy ranges for QE,RES and DIS events. This variable is in some sense a measure of the length of a track found (if there is one) which should be smaller for QE events (proton) than for RES and DIS (pions).

(black=QE,blue=RES,red=DIS)

QE sample selection using a PID parameter based on a maximum likelihood analysis:

I decided to create a QE PID parameter using a maximum likelihood analysis based on these variables. At first I used all the variable distributions as one dimensional pdfs for the ML analysis but then decided to take a closer look at the correlations between my variables (and then to combine any highly correlated variables into two dimensional pdfs). I have combined the total remaining PH after the hit removal with the number of high PH hits remaining. I plan next to just use some linear combination of these two variables as a 1D pdf.

All variables will scale with energy with some functional form and so I also decided to perform a separate ML based PID analysis in asymmetric bins of reconstructed neutrino energy. A rather big caveat to the following results section is that I have used the same event set to construct the pdfs as went through the analysis and so have introduced some correlations. I did this because at the moment I don't have enough events to fill the pdfs otherwise. As such, the results shown will get worse when done properly. The PID parameter was constructed according to:

-sqrt(-log('probability to be QE')) + sqrt(-log('probability not to be QE'))
where these compound probabilities were formed by multiplying the individual probabilities from the pdfs

PID results:

The following plot shows an example of the PID parameter for true QE and non-QE events in the [1.5,2.0) GeV reconstructed neutrino energy range.

(black=QE,red=non-QE)

The following plot shows efficiencies and purities for some different samples. In the top left plot I have tried to flatten the efficiency of QE selection at ~80% to see what purities I can get. The top right and bottom left plots show samples for which I have tuned the PID cuts to give QE samples with purities of 70% and 80% respectively.

(black=efficiency, red=purity)

Methodology from here:

My immediate next tasks are:

to redo the analysis with a separate event set than that used to construct the pdfs. I shall update this page with these ASAP.
to estimate the backgrounds (RES,DIS,NC) to the QE sample in each bin.
to reweight the events to correspond to standard beams.

The method from here is to construct estimators for the numbers of true QE events in each bin according to:

n_i = ( m_i - b_i ) / e_i
where:
- m_i is the number of measured events in each bin i
- b_i is the estimated number of background events in each bin i
- e_i is the efficiency for selecting QE events in each bin i
or using the correction factors method described by Mike here

The high E estimator(s) can then be used, as described in the overview section, to normalize the QE cross section at relatively high energies. The cross section can then be 'divided out' from the remaining estimators to give a measurement of the neutrino flux in each of the bins of reconstructed neutrino energy.