Existing language technology can interpret the meaning of words and sentences to a certain extent, but the technologies primarily work in the English language. One of the main objectives of the new projects is therefore to develop a data set of Danish texts. (Photo: Shutterstock)

Scientists prepare future language technology

If we want to communicate through phones and computers in the future, they need to learn to understand what we say. New research aims to teach machines to distinguish between polysemantic words – now also in Danish.

Anders Boas

Published 28 March 2014 - 06:12

Getting other people to understand what you’re saying can sometimes be hard enough. But when you’re communicating through a machine such as a phone or a computer, the risk of misunderstandings increases massively.

This is because our languages are filled with polysemantic words – i.e. words with many different meanings – and although the machines are quickly becoming better at analysing language, they still have a hard time understanding exactly what the words mean.

However, a new Danish research project has now set out to make life easier for these machines.

The researchers will attempt to lay the foundation of the language technology of the future.

There is currently no data set for the Danish language. This means that Danes will be starting from scratch. (Photo: Shutterstock)

With this new technology, phones and computers can learn to decipher the exact meaning of what people say.

”A lot of technology works better if it not only has to recognise words, but also has to understand, to some extent, what they mean,” says the head of the 'Semantic processing across domains' project, Professor Bolette Sandford Pedersen, of the Centre for Language Technology at the University of Copenhagen, Denmark.

”Many people are familiar with the SIRI program, which allows users to speak and ask questions to their smartphone. This type of program relies on the ability to understand what the words mean in a specific context and to identify which parts of a sentence do what – this is also known as the semantic function.

Imagine that your car is damaged and you want to know how you can get it to drive again. If you ask a program like SIRI where you can repair the damage, the program would work best if it knows, for instance, that the word ‘damage’ applies to the car, not you, and that it’s the car that needs to be repaired, not you.

Language technology can already interpret the meaning of words and sentences to a certain extent, but the existing technologies work primarily in English. One of the objectives of the new project is therefore also to develop a data set with Danish texts.

The hope is that these new data sets will make it possible for Danish to ‘catch up’ with English in this context.

“When you develop these language technologies, you feed a computer with a lot of texts, where all the words are have been manually marked out with information about their individual function and meaning in the text,” says Pedersen’s colleague at the Centre for Language Technology, Associate Professor Anders Søgaard, who is also part of the new project.

“On this basis, it is then possible to infer some models that can automatically analyse new sentences that the machine has not seen before.”

-------------------

Read the Danish version of this article at videnskab.dk