Summer Research Placements
Overview
The Data Intensive Science (DIS) Centre for Doctoral Training (CDT) in the Department of Physics and Astronomy are offering up to five summer research placements to undergraduates interested in pursuing research projects over the summer. The key aims of the placements are to give motivated undergraduates the opportunity to work on a research project in the area of DIS (which is typical Astrophysics, High Energy Physics or an industry project), with the student gaining experience in data intensive science techniques, applying their skillset to a problem in fundamental physics or industry, of working in an international research group/environment and in directing/presenting their work in such an environment. Examples of the kind of projects our PhD students work on can be found here.
- Applicants should just have just finished their third year of undergraduate study in the department of Physics and Astronomy or UCL Natural Sciences undergraduates with Physics as main stream.
- The main supervisor of the project will be a current academic in the department of Physics and Astronomy, MSSL or one of our industry partners, who have previously worked on a project in the DIS CDT (which will typically cover industry, High Energy Physics and Astronomy projects).
- Students will be provided a stipend of £2000 during the studentship and are expected to work on the summer project for a total duration of 6–8 weeks, which can be split over the months July-September.
Details
- Applications should include:
- A 200 word (maximum) personal statement highlighting how you will benefit from and why you are interested in the programme. Please also list the names of three research projects you are interested on working in, in ranked order. The personal statement should also include your name, student number, fee status and your preferred work period (what dates you could undertake the project) at the beginning.
- 1 side of A4 CV, which should highlight relevant educational (include yearly weighted average marks and marks from computational/experimental modules), university/industry projects and computational skills.
- The personal statement and CV should be combined into one PDF document, with the filename for the PDF in the format SURNAME_FIRSTNAME_STUDENTNUMBER.pdf.
- The PDF should then be emailed to dis-cdt-summer-research-applications@live.ucl.ac.uk by the deadline.
- The deadline for applications is the 10am Monday 8th June.
- Applications will be judged upon academic record, relevant experience/expertise, demonstrated interest and benefit to the applicant from taking part in this programme. Priority for two of the posts will be given to candidates from underrepresented groups or those from disadvantaged backgrounds, please add any details you think are relevant with regards to this, to your personal statement. If successful, you will be approached by a participating academic in your area of interest to discuss potential projects. The project details and work pattern will be finalised on a project-by-project case during that conversation.
- As London’s Global University, we know diversity fosters creativity and innovation. We want our community to represent the diversity of the world’s academic talent, from local to global. We are committed to equality of opportunity, to being fair and inclusive, and to being a place where we all belong. We therefore particularly encourage applications from candidates currently underrepresented in UCL’s academic, research and teaching workforce. These include people from Black, Asian and ethnic minority backgrounds; disabled people; LGBTQI+ people; and women.
- Please ensure you carefully read and follow these instructions, as applications which do not adhere to them may not be accepted. If you still have any queries, then please send them to dis-cdt-summer-research-applications@live.ucl.ac.uk.
Application Instructions
Projects
-
COLLIDER/ATLAS - Use of Convolutional Neural Networks to discriminate the electromagnetic showers of electrons and photons from those of neutral pions in the Electromagnetic Calorimeter of ATLAS
Supervisor: Prof Nikos Konstantinidis
This is a computational/software/simulation project. You will use simulated data from the ATLAS experiment at CERN's Large Hadron Collider and will train/optimize a Convolutional Neural Network (CNN) to discriminate the electromagnetic showers of electrons and photons from those of neutral pions. You will then compare the performance of the CNN to the conventional algorithm currently used in ATLAS.
The aim is to use the new algorithm in the trigger of the ATLAS experiment, for the real-time selection of interesting events. If successful, it will give broad benefits to the entire physics programme of the ATLAS experiment. UCL will benefit from the increased visibility/influence in the ATLAS experiment. You will benefit by gaining significance experience in Machine Learning and in Data Intensive Science, which are hugely attractive skills to have in one's CV.
The project requires good software skills, particularly in Python programming, and is ideal for 3rd year students from the P&A department or the Natural Sciences programme, who have taken the courses: Have taken courses PHAS0040 and PHAS0056 (or similar courses in Machine Learning).
-
QUANTUM COMPUTING/ATLAS - Quantum machine learning to reconstruct tracks at the LHC
Supervisor: Dr Sarah Malik, Dr Tim Scanlon, Dr Gabriel Facini
The reconstruction of the trajectory of charged particles (tracks) at the Large Hadron Collider (LHC), is one of the most time consuming and challenging algorithmic tasks undertaken by the ATLAS experiment. With the advent of the High Luminosity LHC, the number of charged particles produced in each collision will increase by an order of magnitude. To allow all these tracks to be reconstructed in a reasonable amount of time/CPU, novel and innovative algorithmic approaches will need to be developed.
The project will employ machine learning techniques such as deep learning and graph neural nets on a quantum computer, in addition to devising a quantum algorithm to efficiently reconstruct trajectories of charged particles from detector hits in the ATLAS detector at the LHC. As well as making an important contribution to this research project, this placement will provide the student with experience of cutting-edge machine learning techniques, track reconstruction, quantum computing, working in an international research team (the studies will be conducted as part of a team consisting of two PhD students, one PostDoc and three academics) and experimental high energy physics.
This project would suit a student proficient in Python, who is comfortable at both hacking others code and developing their own, with some experience of machine learning.
-
ASTRO/GAIA - Finding ALL the carbon stars!
Supervisor: Prof Jay Farihi
In June 2022, the Gaia satellite will release DR3 which will include over 200 million low-resolution spectra for all stars in the sky brighter than 17th magnitude. With these unprecedented data, and using machine learning, the project will aim to uncover all carbon stars -- ambitious, but doable. Carbon stars are easily identifiable by their molecular-rich spectra, a few thousand of which are known. A combination of a significant data volume, multiple examples, and the uniformity of these space-based spectra makes the project well-suited for ML techniques. Never before has such a sample been available and it can be used to address many fundamental questions about the Milky Way, including questions about the origin of the oldest stars (which are often carbon-enriched). The student will work with faculty Jay Farihi and PhD (CDT) student Nik Walters to identify templates or models that will be used as a training set, and the best implementation of ML for the search. If time permits, the ML techniques will be compared with conventional approaches to determine efficiency and accuracy.
-
ASTRO/EXO-PLANETS+COSMOLOGY - Deep Learning for Exoplanets and Cosmology
Supervisor: Dr Alessio Spurio Mancini
The student will train Deep Learning algorithms to accelerate the forward modelling of key observables for exoplanets and cosmology. Specifically, they will investigate the use of fully-differentiable pipelines for the forward modelling of:
- transits for the detection and characterisation of exoplanets and
- power spectra of key cosmological observables such as Cosmic Microwave Background and weak gravitational lensing.
The student will build Deep Learning - accelerated pipelines for efficient gradient-based inference of model parameters. These will open up new avenues for the scientific exploitation of data from future experiments (such as Ariel and Euclid). The student’s contribution will build upon existing software infrastructure, specifically the software PyLightCurve-torch (Morvan et al., 2020), a PyTorch-based software for transit modelling, and CosmoPower (Spurio Mancini et al., 2022), a TensorFlow-based software for cosmological power spectra modelling. Both codes are Python-based, and a basic knowledge of this programming language is required to undertake this project. The student will work under the supervision of Dr. Alessio Spurio Mancini (UCL Space & Climate Physics) and in close collaboration with members of the exoplanets group at UCL Physics & Astronomy and the cosmology group at UCL Space & Climate Physics.
-
IMPACT/HEALTH - Identifying Predictors of Myocardial Infarction Complications with Interpretable Machine Learning
Supervisor: Dr Nikos Nikolaou
Myocardial Infarction (MI, commonly known as heart attack) is a serious medical emergency in which the supply of blood to the heart is suddenly blocked, usually by a blood clot. Acute MI is associated with high mortality in the first year after it and its high incidence —especially in urban populations, due to lifestyle factors— makes it a leading cause of death globally. The course of the disease in patients with MI can vary considerably; MI can occur with or without complications and said complications can worsen the long-term prognosis or not. Even experienced specialists cannot always foresee the development of these complications. This makes, predicting complications of MI an important task in order to timely carry out the necessary preventive measures.
The student undertaking the project will apply machine learning methods for identifying predictors of complications arising from MI on the UCI Myocardial Infarction Complications Database. The choice of machine learning method(s) applied will be decided by the student in coordination with the supervisor (Dr. Nikolaos Nikolaou, Lecturer in UCL’s Centre for Data Intensive Science & Industry). The student will use at least one method for interpreting the model’s predictions and identifying potential predictors of specific MI complications. Again, this will be decided in coordination with the supervisor. The student will get hands-on experience working with electronic health records data (handling issues such as presence of outliers, feature correlations, missing data etc.). They will get experience training and evaluating machine learning models and applying interpretability methods to understand the features driving their predictions.
The student should have some exposure to statistical analysis and good programming skills in Python. Previous experience with machine learning or interpretability methods is desired but not necessary.
-
INDUSTRY/ASOS.COM - Graph Neural Networks for customer return prediction
Supervisor: Dr. Fabon Dzogang
At ASOS.com we offer a unique digital experience for 20+ something fashion lovers, our 26.4 million active customers visited the website and the app 3 billion times in the last year.
We are constantly innovating and improving the experience for our customers by looking at new applications of cutting-edge AI. Helping our millions of customers find the right sizes and fit for their needs is one of our most impactful challenge here at ASOS.com.
The project will consist of exploring Graph Representation Learning for embedding products and customer information in a massive graph, while relying on the Transformer architecture to infer meaningful representations for the two types of nodes in the context of predicting the probability of a returned item at purchase time. The first phase of the project will consist in building the graph given the product description, sizing, material and fabric details, along with Customer information on their fit preferences, and transactional data on returns (feedback from our customers, return activity, sentiment and ratings).
The project goal is to explore recent advances in Deep Learning and Graph, at scale in a commercial environment to help us solve one of our most impactful challenges. The successful candidate will investigate using heterogenous Graph Neural Networks in the Transformer family such as Graphormer (GitHub - microsoft/Graphormer: Graphormer is a deep learning package that allows researchers and developers to train custom models for molecule modeling tasks. It aims to accelerate the research and application in AI for molecule science, such as material design, drug discovery, etc. ) for example, deployed at large scale in a commercial environment using Python. The results will be benchmarked against baseline models such as for example Node2Vec (PyG/PyTorch implementation - PyG Documentation — pytorch_geometric documentation ).
-
INDUSTRY/AUTONOMY - Analysing networks within the climate conversation
Supervisor: Dr. Will Stronge
The project will involve studying networks of influential voices within the UK's political conversation around Net Zero Carbon policies, in collaboration with researchers within Autonomy’s dedicated Data Unit. Drawing on a variety of publicly available data sets, from Hansard to Eventbrite, and from Twitter to news media articles, the team intends to develop data collection tools and applications in order to map public debates, identify perceptions, analyse sentiment and understand public engagement and response to climate issues.
The successful candidate will work on aspects of the project ranging from collecting data, exploring and visualising multiple networks structures that underlie the data, and using graph algorithms to map networks of politicians, media voices, industries, institutions and campaigners, understanding the core features of such networks, and how they relate to how the climate conversation in the UK evolves. The aim of the project is to build a social listening tool which will provide up-to-date information on the state of the climate conversation in the UK, which will aid climate movements and researchers in formulating messaging and policy.
The student will be helped every step of the way, will learn new skills and will have the chance to input creatively within the project.
-
INDUSTRY/THE LONDON DATA COMPANY - Decoding the meaning/context and sentiment of "slang" or "code" in text conversations
Supervisor: Dr Jasmine Grimsley
When people communicate via text using mobile phones or social media they often use non-English words as either slang or as a deliberate code to mask meaning. This project aims to facilitate forensic investigations to identify a dictionary of commonly used non words, or words used out of expected context. The anticipated dictionary feature will include the predicted sentiment of the word, the users of the term in this context, and an predicted context/meaning of the word by providing synonyms. This project will utalise Natural Language Processing, cluster analysis, and potentially neural networks. Expertise in Python is needed.
-
INDUSTRY/UKAEA - Enrichment of the metadata schema / ontology, and assembly of a machine learning time-series database for emulating the MAST-Upgrade tokamak
Supervisor: Nathan Cummings
The MAST-Upgrade is exploring the route to compact fusion power plants, testing reactor technology and addressing physics issues for the international ITER fusion project. It keeps the UK at the forefront of global research into fusion energy. The MAST-Upgrade tokamak is based on the original MAST (Mega Amp Spherical Tokamak) machine, which ran from 2000 to 2013. It has been rebuilt to enable higher performance – longer pulses, increased heating power and a stronger magnetic field – and an innovative new plasma exhaust system.
Arguably, the most important part of the AI/ML workflow is “data wrangling” – preparing cleansed data in a format suitable for efficient Deep Learning. As we set out to try and exploit the world’s largest supercomputers at the exascale, an important question exists around “how complex” a surrogate model (or emulator) can be built around a given physical system. In other words, how much input data and how much output data can be handled and how complex or non-linear can the system response itself be (whether for experimental data form the real world or simulation data from models that take weeks to run on the world’s largest supercomputers). This project involves the development of a metadata schema and formatting of the MAST-U tokamak experimental database, together with the development and application of algorithms to tag interesting events (e.g., Magneto Hydrodynamic (MHD) modes). The question at this stage we would like to address is “is it numerically tractable to construct an AI emulator (of adequate predictive efficacy) around the entire MAST database (which contains >30,000 plasma pulses, each made up of many hundreds of different diagnostic measurements including fast CCD movies of the plasma).