Computers are learning our language

Between September 19th and 21st, Lisbon welcomed researchers from all over the world who are pushing at the boundaries of how we can use computers to process text and speech.
Keep on reading to learn about some of the latest and most exciting applications.

At the conference on Empirical Methods in Natural Language Processing (EMNLP 2015), organised by the Association for Computational Linguistics, the range of topics covered included automated translation, the use of computers to extract information from documents or sentiments from social media content, and the generation of text and speech in natural language.

Plenty of written material is being produced by politicians and their communication offices at a rate that makes it impossible for humans to handle.

Between the more cryptic talks about techniques to handle plain text data, there were many others pointing to very interesting applications of this field of research. For example, language technologies can have an impact on political science, and on politics at large, said Justin Grimmer, of Stanford University, one of the invited speakers.

Plenty of written material is being produced by politicians and their communication offices at a rate that makes it impossible for humans to handle. In only two years, the United States Senate issued 65,000 press-releases.

Language technologies are able to process all that text into quantifiable data, Grimmer said. Analysts can then extract patterns and trends and answer interesting questions about the political landscape and across several years.

Detecting sentiments in social media and finance

Politics was a topic covered also in a few other talks. Nathanael Chambers and his colleagues of the United States Naval Academy wanted to know if the population of a nation state agrees with their government’s formal relations with other countries.

This is a highly relevant question for international policy makers, and these researchers looked for answer analysing over two billion posts, or tweets, from the popular Twitter website. Specifically, they searched for expressions that conveyed opinions of one nation towards another, and they did so for hundreds of country pairs. Their findings can be explored in an interactive map.

Culturgest, the EMNLP 2015 conference venue in Lisbon
The EMNLP 2015 conference venue in Lisbon, Culturgest (main entrance).

According to the authors of this study, an obvious benefit of this automated approach would be to perform a continuous tracking of political sentiment within the public opinion.

However, they acknowledge the fact that the study misses what people say on the streets, which may deviate from the patterns found in social media discussions. In addition, the study used only tweets written in English, and future work could address other languages.

Another interesting application of language technologies was tested by Clemens Nopp and Allan Hanbury of the Vienna University of Technology, Austria. These researchers recommend that textual data published by banks should be used as a complementary estimation method to assess financial risks in the european banking system.

Risk sentiments expressed in reports from european banks reflect major economic events within the last decade, even before they became widely noticed.

In their automated analysis of forward-looking reports from european banks, the authors of the study found that risk sentiments expressed in the domain-specific language of this reports “reflect major economic events within the last decade,” including the financial crisis of 2007-08, even before they became widely noticed.

Nopp and Hanbury used their findings to ground the claim that the automated processing of financial textual data produced by banks can be used to predict the financial evolution in the short term.

Evolution of positivity, negativity, and uncertainty in CEO letters over time.

Evolution of positivity, negativity, and uncertainty in CEO letters over time.
Source: Nopp and Hanbury, 2015, “Detecting Risks in the Banking System by Sentiment Analysis

Beyond language and into psychology

Some of the talks reported interdisciplinary studies showcasing the use of computers to analyse language in tangent research fields. In a unusual experiment, nearly three hundred fictional characters from classic and contemporary popular literature had their personality profiled.

This study, conducted by Lucie Flekova of the Darmstadt University of Technology, Germany, and Iryna Gurevych of the German Institute for Educational Research, has interesting applications in literary studies and in commercial recommendation systems matching readers’ preferences to fiction books.

The authors have put in place an online tool where anyone can classify their favourite fictional character according to personality traits. Simultaneously with this growing dataset, Flekova and Gurevych analysed the direct speech, actions and attributes of those same characters in the text of the books where they are taken from.

EMNLP2015 Best research papers
Best paper awards from the EMNLP 2015 conference

The researchers then compared the personality labels given by online users with verbs, adjectives and adverbs used in the original text to refer to, and to describe the character, as well as the most frequent words in its direct speech.

For now, the authors limited their research to the extraverted-introverted personality attribute. One of their conclusions was that extraverted characters may not always express positive emotions. It will also be interesting to know if fictional characters are more stereotyped, or less nuanced than real humans according to psychology studies into personality.

“See what I mean?”

Pairing language with vision, Andrei Barbu of MIT and his colleagues developed a way to resolve ambiguities in language by means of short video clips representing the possible interpretations of an ambiguous sentence. The authors justify their approach with the fact that children also learn languages inferring the correct meaning from what they can perceive in the visual context.

We write and talk more and more, and machines are reading us and listening to us, and helping to make sense of all that humming.

Using techniques from computer vision and language processing, Barbu and his colleagues developed an algorithm that tracks objects and events in the video scene and maps those to nouns, verbs and predicates in the sentence so that the video tracking system knows what to look for.

A possible application of their work, according to the authors of the study, would be to query image or video databases using natural language instead of keywords. Could we also use it to help robots with vision to understand instructions in natural language rather than code?

Other presentations at the conference addressed topics like domestic abuse expressed through social media, translation systems customised to readers’s preferences, and guidelines for the simplification of texts targeted at children or people with cognitive disabilities.

Name your area of interest and there might be computers doing great stuff with language. We write and talk more and more, and machines are reading us and listening to us, and helping to make sense of all that humming.

Have your say