For the last three months, I have been working with researchers from across the physical sciences at the University of Sussex on some software to classify videos of court cases as either deceitful or truthful. Building on work by Yusufu Shehu, we have constructed a neural network (a type of software partly inspired by neurons in the human brain) that can classify these videos with an accuracy of over 80%. Figure 1 shows our network design. For comparison human ability to spot lying is often below 60%. The software could in theory be trained on other data sets and we are actively looking for people in commercial areas who might be interested.
One interesting aspect of neural networks is that we don’t tell the network how to work, but rather we train it. This means that the network could be trained on other features. Any combination of video, audio, and text can in principal be classified into any sets where there is information in the data. Some possible ideas for how this might be applied in other situations are classifying phone conversations according to the emotional tone or likelihood of a successful sale. Classifying music into genres, speakers by gender or age, specific speakers for security reasons. Figure 2 shows some clips form the court case videos. A court case is a very particular environment so we are interested to apply the software to other settings.
A paper in 2018 by Krishnamurthy et al. developed a ‘multi-modal neural network’ and trained it on 120 court case videos taken from the Miami University deception detection database (Lloyd et al. 2019). They showed that it was possible to detect deception with an accuracy of 75%, a significant improvement on human performance. We have further developed their model and achieved improved results. In the multi-modal network video, text transcripts, audio and ‘micro-features’ are treated independently and then the results are combined to get a final probability. Figure 3 shows how the network is designed.
We are interested to have conversations with potential industry partners who might wish to take this forward with us. Please don’t hesitate to get in touch if you think this research could be useful to you. We are interested in applying these networks both to video and also to audio only. We see particular possibility for collaboration with an industrial partner in an area that relies on large volumes of audio data from, for instance, telephone calls.
My work on deblending is concentrating on more expensive and robust methods that could not be applied in the pipeline which must be run essentially every night on incoming data. It was clear that other methods must be developed for each given science case. There will have to be more work on resolved objects for instance. The real challenge is that the objects are not point sources. It is this combination of resolved and confused images that makes deblending such a challenge.
The conference was also a chance to find out more about the project as a whole including updates on the construction. The telescope is really taking shape and images from the El Peñón peak of Cerro Pachón and it is extremely exciting to see all the work, by scientists and engineers, going into the project’s success.
“Is it not curious, that so vast a being as the whale should see the world through so small an eye, and hear the thunder through an ear which is smaller than a hare’s? But if his eyes were broad as the lens of Herschel’s great telescope; and his ears capacious as the porches of cathedrals; would that make him any longer of sight, or sharper of hearing? Not at all.- Why then do you try to “enlarge” your mind? Subtilize it.” – Moby Dick.
For the last year I have been working on the Herschel Extragalactic Legacy Project (HELP), an EU funded project to use far infrared imaging from the Herschel Space Observatory to understand galaxy formation and evolution. We are gearing up for our first data release, DR1 on 1 October but we are making a lot of the data available now for beta testing.
We are very keen for the astronomical community to start using this huge dataset comprising 170 million galaxies over 1270 square degrees of extragalactic sky and indeed using and developing the code used to produce it. We have released all the code to perform the reduction on GitHub in the spirit of open science and reproducibility. The data can be accessed as raw data files from the Herschel Database at Marseille (HeDaM) and queried from a dedicated Virtual Observatory server. Although Herschel imaging has been the main focus of the project, we have taken public data from many different instruments spanning all the way for ultraviolet to radio data. Tying together these different data sets is a major challenge and will be required to make the most of the upcoming wide surveys such as from the Large synoptic Survey Telescope (optical), the Euclid space telescope (optical) and the Square Kilometre Array (radio).
We are also in the process of setting up mirrors here at Sussex and I plan to blog more about that soon. There is a vast amount of data and we are working on squeezing every last ounce of science out of all the public data from a wide array of different instruments which make up the full multi-wavelength data we have collated.
If you have any questions about how to use this database please leave a comment or email me.
Last week I was in Valencia for a conference on statistical methods in modern cosmology. The week began with a summer school for PhD students and a few postdocs on machine learning, sparsity and Bayesian methods. I was familiar with the Baysian methods but sparsity (dealing with data matrices where the majority of elements are zero) was completely new and I am looking forward to implementing some of the Machine Learning methods perhaps for the Herschel Extragalactic Legacy Project or for work I am about to do for Public Health England (more about that in a later blog post).
The introductory lecture by Stephane Maillat (Ecole Normale Superieure) gave an overview of neural network approaches to scientific problems. One particularly striking example was calculating molecule energies to higher accuracy than Density Functional Theory (DFT) in very short times. My PhD research used DFT heavily and we were always limited by computer resources. The fact that a neural network can learn how to predict ground state energies without including any physics in the model (!) was remarkable to say the least. We are certainly entering a brave new world.
There were however some dissenting voices. Neural networks and machine learning in general needs some work to make results more reliable. Google has started work on Tensor Flow probability which aims to assign some measure of errors to results. These methods also in general require a representative sample. Often we know that our samples are not representative and we aim to model selection biases. I think these issues both need to be addressed before ‘classical’ methods such as Bayesian inference are consigned to history.
I also presented a poster on ongoing work on deblending. Now that we have a prototype algorithm I need to get on with implementing and testing. It was great to see talks by Peter Melchior (Princeton) and Rachel Mandelbaum (Princeton) which both brought attention to the problem of blending for pretty much all science cases from the Large Synoptic Survey Telescope (LSST) and the space telescope Euclid. Clearly this problem is not going to go away and analysis of galaxy images will be limited by blending issues in the near future.
I would recommend any PhD students or post docs to attend future summer schools and conferences. It was excellent to see so many researchers from around the world working on problems related to my research. The summer school offered an excellent introduction to modern statistical methods that can be quite simple to implement and may help you with your research.
The Herschel Extragalactic Legacy Project (HELP) is a European research initiative to capitalise on the vast imaging data that was collected by the Herschel space telescope. The figure below shows the 23 fields that comprise HELP overlaid on the Planck map of galactic dust. These are mainly the famous extragalactic fields and come in different sizes and depths.
Last week we had a conference here at Sussex to show the astronomy community the data we are about to release, discuss the methods used to create it and talk about the science results from Herschel and HELP, past, present and future.
I gave a talk on the HELP masterlist the slides for which are available below.
We have a great deal of work to do to finish running the whole data pipeline for all 23 fields, containing photometry, photmetric redshifts, a full analysis of the Herschel fluxes and fitted galaxy spectral energy distributions for all the Herschel objects. It will all be worth it when we start to see the science results come through from this very wide area data release covering around 1300 square degrees.
I spent last week at the Institute of Astronomy in Cambridge discussing how the UK can take advantage of the incredible imaging data that promises to be produced by the Large Synoptic Survey Telescope. The telescope is set to receive first light in 2019 and there is a vast amount of work to do to prepare for the deluge of data that is about to flow out of Chile. One of the challenges is making sure we make best use of UK expertise and work in close collaboration with the majority of LSST scientists in the US.
We were meeting to discuss how best to target UK research to complement work being done elsewhere. There are some definite niches available to us, partly because of access we have to some UK data and partly for the expertise in multiwavelength science that has been built up here.
There were a number of excellent talks about Active Galactic Nuclei (AGN) and galaxy formation based on studies right across the wavelengths (x-rays to radio waves). There were a number of talks about photometric redshifts which is of direct relevance to the Herschel Extragalactic Legacy Project (HELP) that we are currently working on in Sussex. Ultimately it seems that building some software within the LSST stack that can handle UK near infrared images may be the best first step to preparing for possible multiwavelength LSST science.
We have around two years to prepare for the first LSST images and it is vital that we work to have software in place ready for it. On a personal note I think developing any code for multiwavelength pixel-based image analysis within the LSST software stack is an opportunity for us early career scientists to build expertise that will make us employable over the lifetime of LSST.
On a completely separate note; being back in Cambridge was a great chance to have a look around the West Cambridge site which has changed drastically since I was an undergraduate at the Cavendish. I visited the Department of Chemical Engineering and Biotechnology which was extremely impressive. There has clearly been a massive investment in the various science departments that have been built/extended there. I look forward to seeing how it continues to develop and all the research that will be generated there by what is essentially a load of geeks in a field.
I wasn’t very familiar with the MeerKAT International Giga-Hertz Tiered Extragalactic Exploration (MIGHTEE) survey or even the Karoo Array Telescope (MeerKAT)* which is a precursor to the enormously ambitious Square Kilometre Array (SKA). Gotta Love Physics Acronyms (GLPA). It reminded me what an exciting time to be doing astronomy it is with some huge data sets on the way at unprecedented scales. It was a chance to think about how to tie together the quite disparate data from various wavelength regimes which fed in quite well to the LSST meeting the following week.
A lot of the fields overlap with the LSST deep drilling fields as well as the Herschel extragalactic fields. The four fields are XMM-LSS, COSMOS, ELAIS-S1 and CDFS (names of areas on the sky that have been previously imaged)**. The challenge will be to move beyond the catalogue based cross matching done so far and towards dealing directly with pixel data.
I did my masters project on the SKA back in 2006 and it is amazing to see it starting to take shape with actual radio dishes on the ground in South Africa.
Being in Oxford was also a useful opportunity to meet with other members of the Herschel Extragalactic Legacy Project (HELP) to talk about the last stages of the project and how we are going to deliver all the final data. Something we can talk about further at the HELP meeting in Sussex in October.
* I can’t find where the Meer in MeerKAT comes from. I think there are actual meerkat populations near the telescope but this might be a prime example of acronym nesting.
** XMM-LSS: X-ray Multi Mirror telescope Large Scale Structure survey
COSMOS: Cosmological Evolution Survey***
ELAIS-S1: South 1
CDFS: Chandra Deep Field South
A couple of weeks ago I was in Boston for a meeting of the SERVS team and I thought I should get round to blogging about it. The small conference was organised by Anna Sajina at Tufts and was concerned with determining priorities for presenting and analysing data from the Spitzer telescope. I was there because a large part of my work is concerned with building a multiwavelength catalogue for the Herschel Extragalactic Legacy Project (HELP) and we are ingesting a number of Spitzer surveys including SERVS.
SERVS data is a key part of the HELP pipeline because we typically use the Infrared Array Camera (IRAC) fluxes to select objects to define our samples. It was also a chance to hear about all the research being done with these Spitzer fluxes which cover the mid infrared part fo the spectrum.
I have spent the last two days in Hull for the National Astronomy Meeting (NAM). I have seen a number of excellent talks already. In particular, the session on low surface brightness galaxies yesterday was fascinating and had a number of gems in it. David Valls-Gabaud gave a great overview of the field. It was particularly interesting to me because I realised the problem I work on (deblending) will be increasingly important as we move into the era of deeper and deeper surveys. As our telescopes become more sensitive we can observe fainter and fainter objects and the sky becomes more full of things. This means they are more likely to overlap and the problem of determining where light comes from becomes harder and harder. It was good to remind myself of the final aim of my work.
The day finished with a fantastic public talk by Chris Lintott. It was pitched at a perfect level for the public, but I also think a lot of us early stage career scientists and PhD students found it refreshing. Sometimes, particularly in fields you don’t directly work on, you learn a lot more by starting at the beginning. Perhaps unsurprisingly, some of the questions by 5-10 year olds were very probing and highlighted the vast amount there is still to learn. They have more courage to admit that they don’t know something!