Eurovision Song Contest to be part of Denmark's cultural heritage
Danish researchers are harvesting the internet for all information about this year's Eurovision Song. Contest The goal is to ensure good historical research - but the researchers are also learning to predict the winner several months in advance.
If you want to make a killing at the bookmakers when the Eurovision Song Contest (ESC) goes live you should probably pay attention to the research currently undertaken by The Royal Library in Copenhagen and The State and University Library in Aarhus.
Researchers and curators are going through everything – literally everything – uploaded on to the Internet about ESC.
Every message on Twitter, open Facebook accounts and other social media, every YouTube video where fans have uploaded the soundtrack of a song, fan pages, bookmakers' odds, blogs, and news that form part of the world's sea of information.
Armenian videos become part of Denmark's cultural heritage
All content is stored on Netarkivet.dk – an online archive of Denmark's digital cultural heritage. And yes, Denmark's cultural heritage also includes the content produced elsewhere in the world, for instance when an Armenian makes a mash-up video on the theme of the Vampire Diaries using the soundtrack of Armenia's contribution to ESC (see the following video), which is being held in Denmark with its final contest on 10 May 2014.
Such contributions are being regarded as part of Denmark’s cultural heritage in the current project, says Henrik Smith Sivertsen, research librarian with The Royal Library, who is responsible for the project.
"We’ve struggled some with the definitions of what qualifies as Danish," he says.
History is never better than the available sources
"ESC is the largest international event that is taking place in Denmark and if we don't collect everything that we can, and omit parts of the Internet, it will disappoint researchers who 50 years from now will want to analyse the event," says Sivertsen. "Therefore we are including as much as possible."
As a cultural institution The Royal Library is obliged to ensure that source material is available so researchers can study what happened at a certain point in time, says Sivertsen. History will never be better than the available sources and if there’s no proper recording of events, it’s only contemporaries who can write about what is happening, he says. “And we know from history that we don't always discover what is important while it happens.”
Can we predict who will win - long before the final?
But surprisingly -- all the data on ESC might actually allow the researchers to find the winner of ESC several months before the finals go down, says Sivertsen.
"I’d like to be able to compare the odds with YouTube video likes, the individual contributions on the social media, and discussions on the Internet in general to see if they predict who’d win the final -- as we had a feeling they would ahead of the ESC final in 2013, which Denmark won," says Sivertsen.
This year, he says, all indications were that Aram Sargsyan from Armenia would win – but then he made some unfortunate comments about the Austrian participant, Conchita Wurst’s life as a drag (he called it 'unnatural') and that led to a storm of criticism aimed at the Armenian participant.
"Instead of simply cruising to the final Aram has other things to deal with, so the situation has been muddied," says Sivertsen. "What looked like a self-strengthening system on the Internet has been more or less smashed. But he’s still the favourite and the money is on him so it'll be interesting to see if his comments actually have any influence."
Robots and researchers harvest information
Every day Sivertsen and Netarkivet.dk staff monitors the harvest of information from the Internet which has been automated by computer programs. This is quite a comprehensive task for several reasons:
The programs do not recognise everything on the Internet, so there is a need for independent searches and research to piece everything together – even the information that the programs do harvest is not always usable so the researchers must continually deselect information.
Users of social media, and especially of Twitter, often delete earlier messages. It is therefore important to try and harvest all new messages as soon as they are sent -- they can have disappeared later, and thus the correct picture of life on the Internet up to and during ESC no longer exists. "If we don't harvest something within an hour, we’ll miss a large amount of information. This is very exciting, but it's also a giant challenge," says Sivertsen.
- According to Sivertsen, the researchers face a couple of technological challenges. Programs that harvest information were compiled at a time before it was possible to embed YouTube videos, quotations and comments. This makes it very difficult to store the content of websites when the content is embedded from many different sources.
"In a way, the project is also a test of what can be harvested by computer programs and how much manual work is needed," says Sivertsen. "For example, we hope that having a researcher selecting what should be stored all the time makes a difference -- but we don't know."
Read the Danish version of this article at Videnskab.dk
- Henrik Smith Sivertsen's homepage
- Netarkivet.dk (in English)
- The Royal Library (National Library of Denmark and Copenhagen University Library)
- Statsbiblioteket (State and University Library of Aarhus)