Andrea DeMarco writes about the best decision of his life
In 2008, I ventured into the area of speech technology during the final year of my B.Sc. I.T. (Hons)(Melit.). I did not know what I was getting into. Through my four-year course, I studied Artificial Intelligence. However, I ventured into speech technology (as well as human language technology) because there seemed to be a renewed interest, and an increased reliance of speech and language technology on statistical modelling techniques. The particular area I studied was speaker identification, which is when you identify a person from a voice sample.
A little over a year later I enrolled for a Master of Science by research in Computer Science at the University of Malta in the same topic. The reasons were two-fold. Firstly, I liked the idea of research but was unsure about the long-term commitment required for a Ph.D. Secondly, my undergraduate research project spawned many ideas that I had no time to implement. However, this M.Sc. gave me time to do so. I was inspired by cognitive scientists on how language and phonetics could be processed in humans. I then loosely applied these ideas to algorithmic equivalents for speaker identification.
In this project I discovered salient fractions of phrases which are important for algorithms to identify speakers. I cut down the amount of data required for proper identification making it more efficient. A few months before completing my M.Sc. I started contacting a number of research labs in the UK. By that time I had realised that I loved solving research problems. That was when I felt ready for a longer-term commitment to research, and started a Ph.D. at the University of East Anglia, in the field of native accent and speaker identification.
During my Ph.D. I developed a state-of-the-art classifier for accent identification from speech. The classifier does not require any speech transcription, which is how accent identification usually works. I collaborated with researchers from the University of Birmingham to adapt baseline speech recognition to work better for regional accents. I am now exploring the combination of accent identification with speaker identification systems.
I am currently in the final year of my Ph.D. studies at the University of East Anglia, and employed as a senior researcher in algorithms that can identify emotions. We are developing a mobile app that tracks and keeps a diary of your mood using your voice. This project is funded by a Technology Strategy Board grant. Taking the leap from artificial intelligence into speech technology might have been the best decision of my life.
Comments are closed for this article!