2017 Intake Projects

Cheap deep-learning for photometric Supernova classification and beyond

Student: Tarek Allam

Supervisory Team: Dr. Jason McEwen (MSSL, CDT Director of Research), Prof. Ofer Lahav (Physics, Director of CDT), Dr. Denise Gorse (Computing)

The Large Synoptic Survey Telescope (LSST) is predicted to observe millions of Supernovae (SNe) events over its lifetime, orders of magnitude beyond the number discovered by all previous astronomical surveys (of order a thousand SNe have been discovered to date). These SN observations will be used for a variety of cosmological studies, for example to examine the nature of dark energy. However, to be cosmologically useful, it is critical to classify SN observations into the type of event (e.g., runaway thermonuclear fusion, core collapse). Historically, SN cosmology has required costly and time consuming spectroscopic classification for every SN, which is simply impossible for the millions of events that will be discovered by LSST. Instead, SN cosmology with LSST will require photometric techniques, where classification must be performed with flux levels observed in a small number of broad photometric filter bands only. We will focus on cheap deep-learning techniques for photometric SN classification and, in particular, address the problem of small, non-representative training sets.
Maximising the capabilities of a jet classification algorithm on LHC track and vertex data

Student: Greg Barbour

Supervisory Team: Dr. Andreas Korn (Physics)

The Large Hadron collider (LHC) collides hydrogen nuclei 40 million times per second at the highest artificially available centre of mass energy. These collisions are recorded by large detectors like ATLAS with 100 million channels, creating Peta bytes of data every second. The ATLAS Inner Tracking Detector is by far the largest contributor to this channel count. The reconstruction of particle trajectories and the vertices where they intersect from the limited hit information of the Inner Detector is one of the greatest computational challenges of the LHC. Jets containing bottom hadrons (b-jets) have been a very important window into unexplored physics. The identification of b-jets is needed to observe the as of yet unmeasured largest decay channel of the Higgs boson into bottom-quark pairs. Heavy new TeV-scale resonances might couple preferably to third generation particles, like bottom quarks. Such resonances are of renewed interests as they can act as mediators between dark matter particles and normal matter. Restricting the parameter space of the mediators also provides constraints on models of dark matter particles. B-jets are identified through the decay properties of b-hadrons. The decay chain of b-hadrons always involves a weak decay. The resulting long lifetime has b-hadrons fly a distance of a few mm up to a few cm before they decay and leads to displaced secondary vertices. B-tagging uses the properties of reconstructed large impact parameter tracks and identified secondary and tertiary vertices to distinguish b-jets from jets originating from lighter quarks. An important criterium for the quality of b-tagging is the misidentification rate for non-b jets. When searching for a tiny signal in a large dataset dominated by background even a moderate mistag rate can be fatal. Machine learning and multivariate techniques such as neural nets and boosted decision trees are being used extensively in the identification of b-jets. An important aspect of such techniques is a careful preparation and selection of the input variables used. Reconstructing the underlying b-hadron decay topology provides an advantage . In the current ATLAS reconstruction this is done via the JetFitter algorithm, that tries to reconstruct a string of secondary and tertiary vertices along the jet direction. The project focuses on improving the b-jet identification especially under difficult conditions like large boost. This is done by investigating the known decay topologies and implementing broader options into JetFitter.
Firmware development in the ATLAS Hardware Track Trigger for HL-LHC

Student: Lucas Borgna

Supervisory Team: Prof. Nikos Konstantinidis (Physics, Director CDT, ATLAS UK-PI, Director HEP Summer School), Dr. David Sankey (Rutherford Appleton Laboratory, STFC)
Machine Reading and Milky Model in the Gaia era

Student: Tom Crossland

Supervisory Team: Dr. Daisuke Kawatai (MSSL), Dr. Pontus Stenetorp (Machine Reading Group, Computer Science), Dr. Sebastian Riedel (Machine Reading Group, Computer Science), Dr. Thomas Kitching (MSSL), Dr. Jason McEwen (MSSL)

We will develop a tool to derive the Milky Way mass distribution with a Hierarchical Bayesian Model from the various kinds of observational constraints automatically extracted from the vast amount of literature published in the past with a novel machine reading tool. We will first train the machine to automatically extract the key observational measurements of the Milky Way, e.g. scale-length of the Galactic disk, from scientific articles. Then, we will feed them into the Milky Way mass model, statistically combining the literature values and their errors using Bayesian statistics, to derive the current best Milky Way model, including the mass distribution of the dark matter and stellar disk as a function of stellar populations. ESA's Gaia mission is going to make a full size dataset of position and velocity of more than one billion stars publicly available in April 2017. A large number of papers are expected to be published from such big data. The developed tool will be designed to use many pieces of scattered information, i.e. many individual publications by different groups, into a single unified statistically verified Milky Way model. We will compare the derived Milky Way models with the publications before and after the Gaia data, and evaluate the statistical impact of the Gaia mission.
Computer vision and machine learning for pattern recognition in LHC data

Student: Charlie Donaldson

Supervisory Team: Prof. Nikos Konstantinidis (Physics, Director CDT, ATLAS UK-PI, Director HEP Summer School), Dr. Dmitry Emeliyanov (Rutherford Appleton Laboratory, STFC)

The current track finding algorithms adopted in the LHC experiments are based on the combinatorial track following. The combinatorial stage of the algorithm combines hits from a subset of sensors into short track segments called seeds. The track following stage traces each seed through the detector volume and picks up hits belonging to a seeded track. As the seed number scales non-linearly with the number of hits, it leads to ever-increasing demand for CPU, as the LHC will continue to increase the beam intensities over the next decades. This motivates investigating novel, non-combinatorial approaches for track finding, which could lead to huge savings in CPU needs over the LHC lifetime. The aim of this project is to explore and compare pattern recognition methods based on integral transformations, in particular, iterative re-weighted Hough Transform (HT) and multiscale Radon Transform (RT) with a set of predefined hit patterns. Machine learning (ML) is a crucial component in both approaches. For example, a ML-based classifier can analyse feature vectors extracted from HT images in order to detect hits from tracks of interest. For the RT approach, pattern set optimization is crucial for performance and overall feasibility, especially, for hardware-based track finding. Using the ML techniques, the pattern set can be learned as an approximation of sparsely coded set of training reference tracks representing expected events of interest.
A statistical approach to astrochemistry

Student: Damien De Mijolla

Supervisory Team: Prof. Serena Viti (Physics)
Machine Learning in Direct Dark Matter Experiments

Student: Omar Jahangir

Supervisory Team: Dr. Chamkaur Ghag (Physics), Dr. Tim Scanlon (Physics), Dr. Ingo Waldmann (Physics)

Direct Dark Matter search experiments operate highly sensitive detectors in deep underground laboratories, seeking to detect rare and low-energy scatters from Dark Matter particles in our galaxy. The world-leading LUX experiment, based at the Sanford Underground Research Facility, S. Dakota, operates a xenon target that is the most radio-quiet environment on Earth in the hunt for Weakly Interacting Massive Particles (WIMPs). WIMPs are expected to produce characteristic single vertex elastic scattering signatures. The rate of candidate events that satisfy this requirement is low, about 2 per day, and this is consistent with background expectation. However, there are many WIMP and non-WIMP models of Dark Matter that may produce significantly different signatures. LUX triggers about 1 million times per day to record data from completely uncharted electroweak parameter space, potentially containing new physics and non-standard WIMP or non-WIMP Dark Matter signals. The techniques of Machine Learning and Deep Learning present opportunities to analyse this data efficiently, particularly where faint signals with unknown characteristics may be hidden amongst large backgrounds. Developing these techniques could prove to be the key for discovery in the next generation of leading Dark Matter experiment, LZ, presently under construction and set to begin taking data within the lifetime of this project. LZ will examine the bulk of the favoured theoretical parameter space for Dark Matter, uncovering unknown backgrounds never previously encountered and potential signals. This project will develop the routines to analyse the existing rich LUX data for any galactic signals or new backgrounds, and prepare the framework for a robust and rapid interpretation of the data from LZ, be it for the first discovery of WIMPs or any hints of physics Beyond the Standard Model.
Novel applications of machine learning in cosmology and beyond

Student: Ben Henghes

Supervisory Team: Prof. Ofer Lahav (Physics, Director of CDT)
Proton CT Image Reconstruction with X-Ray CT Priors

Student: Matthieu Hentz

Supervisory Team: Dr. Simon Jolly (Physics), Prof. Simon Arridge (Centre for Medical Image Computing), Dr. Jamie McClelland (Centre for Medical Image Computing)

Proton beam therapy (PBT) offers potential clinical advantages over conventional X-ray radiotherapy for localised cancer due to the interaction characteristics of protons. Prior to treatment, a comprehensive dose delivery plan is formulated with 3D X-ray CT images of the patient. However, these treatment plans are suboptimal due to the uncertainty in converting between absorption (in Hounsfield Units) of an X-ray CT and the Relative Stopping Power of protons, resulting in a 3% error that must be factored into treatment plans. A solution is to not only treat but also image with protons: by selecting an energy that is suitably high enough that the protons pass through the patient and deposit minimal dose, a proton CT image can be reconstructed by tracking the incoming and outgoing protons and measuring their residual energy. Despite these advantages, the resolution of proton CT images is inherently limited as the exact path of the proton between entry and exit is unknown. This project seeks to improve the resolution of proton CT images by the novel use of a prior X-ray CT image upon which to base the reconstruction. By improving both the quality of the reconstructed image and also the reconstruction time, the use of X-ray CT priors for proton imaging would both improve the quality of treatment and reduce the time the patient spends in the treatment room for imaging and therefore improving patient throughput. In advanced imaging methods such as proton CT, nonlinearities and ill-posedness necessitate the careful use of prior information, including cross-modality information. Simple methods based on enforcing sparsity of local features are being extended to multi-scale and information-theoretic priors which build on statistical descriptions of big-data. Developing reconstruction techniques for such priors will involve adapting methods from machine-learning, including non-parametric probability models and deep-learning techniques.
Harnessing the Potential of Machine Learning to Expand the Discovery Potential of the LHC

Student: Ava Lee

Supervisory Team: Dr. Tim Scanlon (Physics, CDT Director of Research), Dr. Gabriel Facini (Physics, ATLAS Exotics Convener), Michael Kagan (SLAC, ATLAS Machine Learning Forum Convener)

The Large Hadron Collider (LHC) has continued to push its search for new physics to higher mass ranges. However, it has so far failed to find any signs of new physics beyond the Standard Model (SM). This may mean that signs of new physics, assuming it exists in the mass range probed by the LHC, must be in more extreme regions of phase space or will require significantly more data to be discovered. The LHC has so far only probed ~1% of the data we expect to collect over the next ~20 years and there are vast regions of phase space that are not accessible either due to algorithmic constraints or insufficient data being collected so far. To ensure the data is fully exploited in the search for new physics, and that all the possible regions of phase space are explored, a paradigm shift must occur in the use of machine learning (ML) at the LHC to ensure that optimal use is made of all available data. This project will use lower level detector information coupled with cutting-edge ML techniques to boost the performance of the reconstruction algorithms, specifically those which identify b-quarks (b-tagging). As b-tagging is used in the majority of the results produced by ATLAS, improving the b-tagging algorithms will have a big impact on the vast majority of the physics programmes of the LHC, helping to significantly boost the discovery potential of the LHC.
Using switches to control clusters and data flows

Student: James Legg

Supervisory Team: Dr. Jeremy Yates (Physics, CDT Partners Liaison and Placements Coordinator, DiRAC Industrial Engagement Officer and Deputy Project Director), Dr. David Sankey (Rutherford Appleton Laboratory, STFC
Semantic segmentation for neutrino event reconstruction in NOvA

Student: Kevin Mulder

The NOvA experiment studies the changes ("oscillations") of neutrinos as they travel 810km from an accelerator at Fermilab, outside Chicago, to a huge detector in northern Minnesota. By studying these oscillations we hope to learn which of the three neutrinos is the heaviest, and if there is any difference between the oscillations of neutrinos and their antiparticles. Such a difference could provide an explanation for the mystery that the universe is dominated by matter, rather than consisting of equal parts matter and antimatter. In 2018/2019 NOvA will operate a small replica detector that we will expose to beams of particles with well-controlled properties. This project is to apply modern deep-learning techniques to the large collected data sample to develop a new, more-powerful, neutrino interaction classifier, and potentially a new technique for efficiently simulating neutrino interactions.
A generative model of cosmic large-scale structure

Student: Davide Piras

Supervisory Team: Dr. Benjamin Joachimi (Physics), Prof. John Shawe-Taylor (Computing, Head of Department)

Current and forthcoming galaxy surveys will provide unprecedentedly tight constraints on physics beyond the cosmological standard model. Just as important as the best-fit parameters obtained from these datasets are the statistical uncertainties associated with them. The measurement errors are usually determined from large suites of mock survey realisations, each built upon computationally expensive cosmological N-body simulations. 10,000s of such simulations will be required in the near future, but it is unlikely sufficient super-computer time will be available. This is a limiting factor in cosmological analyses of current surveys, and a critical unsolved problem for the next generation of surveys such as Euclid and LSST. In this project we will develop a deep-learning network that effectively compresses the input simulation information and subsequently generates statistically independent mock universes with characteristics identical to the input, thus enabling the calculation of realistic error models with minimal extra computational cost from just a few N-body simulations.
Novel spherical informatics techniques for studying cosmic evolution

Student: Patrick Roddy

Supervisory Team: Dr. Jason McEwen (MSSL, CDT Director of Research), Dr. Thomas Kitching (MSSL)

A general understanding of the cosmic history and evolution of our Universe has developed recently, yet we remain ignorant of many aspects of the scenario that has emerged. Little is known about the process of inflation in the early Universe, which is thought to have seeded cosmic structure. A complete understanding of dark energy and dark matter, which compose 95% of the energy content of the Universe and dominate its late evolution, also remains elusive. The observational signatures of new physics that would lead to a deeper understanding of cosmic evolution are exceptionally weak and difficult to resolve from observations. Moreover, cosmological observations are typically made on the celestial sphere and so the spherical geometry on which observations are acquired must be carefully taken into account. We will develop efficient, robust and principled informatics techniques designed on spherical manifolds. These techniques will be of considerable theoretical and practical interest in their own right. We will also apply them to analyse cosmological observations to better understand cosmic evolution.
Use of imaging techniques to study hadronic jets from the ATLAS experiment at CERN

Student: Alexander Sophio

Supervisory Team: Dr. Mario Campanelli (Physics), Dr. Arthur Grettoni (Gatsby Computational Neuroscience Unit)
Characterising exoplanet atmospheres using deep neural networks

Student: Kai Hou Yip

Supervisory Team: Prof. Giovanna Tinetti (Physics, ERC Grant: ExoLights, Co-director BSSL, Principle Investigator of the ESA M4 ARIEL space mission), Dr. Ingo Waldmann (Physics, ERC Grant: ExoAl)

In the last two and a half decades, we have undergone what is best described as a second Copernican revolution. The discovery of extrasolar planets - i.e. planets orbiting other stars - has fundamentally transformed our understanding of planets, solar systems, their formation histories and our place in the grander scheme of the Milky Way. With the avalanche of recent discoveries (over 3500 confirmed and counting), we have begun to expand comparative planetology from our small scale statistics of 8 solar-system planets to a galactic understanding of planetary science. As the field matures from its initial discovery stage, we are facing entirely new challenges of large-samples, high-dimensional parameter spaces and big data. In order to uniformly characterise large numbers of exoplanets, we require significantly faster and more accurate classification algorithms than what current models provide. In this project, we are developing deep learning and machine learning solutions to help characterise the atmospheres of planets, ranging from our solar system objects to the most extreme hot-Jupiters and lava worlds.

Physics and Astronomy »

Centre for Doctoral Training in Data Intensive Science

2017 Intake Projects

Cheap deep-learning for photometric Supernova classification and beyond

Maximising the capabilities of a jet classification algorithm on LHC track and vertex data

Firmware development in the ATLAS Hardware Track Trigger for HL-LHC

Machine Reading and Milky Model in the Gaia era

Computer vision and machine learning for pattern recognition in LHC data

A statistical approach to astrochemistry

Machine Learning in Direct Dark Matter Experiments

Novel applications of machine learning in cosmology and beyond

Proton CT Image Reconstruction with X-Ray CT Priors

Harnessing the Potential of Machine Learning to Expand the Discovery Potential of the LHC

Using switches to control clusters and data flows

Semantic segmentation for neutrino event reconstruction in NOvA

A generative model of cosmic large-scale structure

Novel spherical informatics techniques for studying cosmic evolution

Use of imaging techniques to study hadronic jets from the ATLAS experiment at CERN

Characterising exoplanet atmospheres using deep neural networks