April 25, 2014 - 06:55

Randomness is to a computer programmer what a kitchen knife is to a chef and what a screwdriver is to a carpenter: an indispensable tool that is used constantly for all kinds of tasks.

Random functions are required for our Google searches, our smartphones and our computer programs to work – and the more random they are, the better.

”I have found the most random function that is currently realistic to implement,” says Professor Mikkel Thorup, of the Department of Computer Science at the University of Copenhagen.

Professor Rasmus Pagh, who specialises in algorithm research at the IT University in Copenhagen, was not involved in the new discovery, but he thinks Thorup’s function – a so-called hash function – is ground-breaking:

“This is the world’s best function in many cases. Just like there is no screwdriver that fits all screws, we cannot say that this function will be the best in all cases. But it will be the best for a great number of screws,” he says.

“Hash functions are used in just about all software. This is one of the first things that programmers learn when trying to create effective software. These functions have a very wide variety of applications.”

In 2014, Thorup returned to Denmark after many years of research in the US, and the world’s most random function is the first result of his research project, which started last October.

Random hash functions are, for instance, used to find information quickly. Just like we place items in various rooms in our house – books in the living-room bookcase, toothbrushes in the bathroom, etc. – the functions classify data to be stored in the computer’s memory in a way that makes it easy to retrieve.

“There isn’t a lot you can do with a computer without using randomness,” says Thorup.

If all the data was stored in the same place, it would take a long time to retrieve it, just like it would take a while to find the book you borrowed from Aunt Edna if you had stacked all your other belongings on top of it and stuffed it all into the attic. The advantage of using hash functions is that data is distributed evenly across the ‘rooms’ when they are placed randomly.

However, the functions cannot be entirely random, says Thorup, because then it would not be possible to retrieve the data. You cannot just roll a die and then the result is that the sofa ends up in the bathroom. The trick is to let the hash function perform the same calculation when it places items and when it retrieves them. The idea is to let the calculation include an element of randomness that is remembered for later recalculations.

By looking at the result of the calculation, the computer knows exactly which room to check, and since the random function has distributed the items evenly, each room has a manageable amount of items, making it easy to retrieve a particular item.

“Saving data is the one application of hash functions that is easiest to grasp. But what really makes hash functions fun is all the magic you can perform with them,” says Thorup, adding that the functions are also essential for calculating statistics, for encryption to work, for enabling programs to learn from data – also known as machine learning – and for calculations on Big Data.

“I really wish I had a hash function in my head, because the amount of time I spend on searching things is totally unacceptable. The place I have picked for an item is not the result of a mathematical function, so if I do not remember, I cannot just recalculate the place. Instead, I have to guess where I placed it.”

There isn’t a lot you can do with a computer without using randomness.

The challenge, then, is to create a function that can place items in the same place, while at the same time distributing them at random. This is why we are talking about functions that can be more, or less, random. The function that Thorup has created gives the greatest certainty for a random result.

The professor illustrates the problem of less random functions with a reference to volleyball: if you program a computer to divide a group of players into two teams, it could be done using a hash function which places all the players born on an even date on one team, and those born on an odd date on the other.

As long as you feed the function with birth dates, the players will be distributed quite evenly across the two teams – and then you can easily find out which team a particular player is on simply by looking at the birth date.

“The problem arises if, instead of using birth dates, you use personal identification numbers for the function, which distinguishes between odd and even. Then suddenly you have split the teams into boys against girls, and then it’s not so random anymore,” he says. [In Scandinavia, boys have odd personal identification numbers; girls have even numbers.]

--------------------

Country
Denmark