The European: Mr. Wolfram, what do you think of the term big data? Has it become a mere buzzword or does it imply a promise?
Wolfram: Why don’t we look up the term itself? It implies – rightfully – that there is a lot of data in the world. The question is: Can you go from big data to big knowledge?
The European: Can you?
Wolfram: Big data can be a precondition for it. The challenge is to make it computable. In the history of systematic data, the two biggest creators of data were Babylon and the US government.
The European: Only one of them is still around. What lessons can be learned?
Wolfram: It shows that civilization creates a flow of data: Lots of textual data has been stored and saved for ages. Today the amount of data on the web might be stabilizing. But now there is automatic data collection and lots of sensors that can measure oodles of data.
The European: What does that mean?
Wolfram: There is a lot of knowledge in the world. But it’s definitely finite. The knowledge produced is comparably small. The rate of additional sensor data is pretty high. But once you have a framework for dealing with that, it will be a homogenous thing.
The European: Are you excited about this huge amount of data or does it frighten you?
Wolfram: Oh, data is what I like! I have been collecting data about myself for many years. I’m a bit embarrassed to say this, but I have probably collected more data about myself than anybody else. Data is my world. To me it is exciting to see more of it.
The European: What did you find out about yourself?
Wolfram: About a year ago, I sat down and tried to analyze some of the data. I discovered all kinds of terrible things about myself.
The European: For example?
Wolfram: Mainly that I am much more habitual than I could have possibly imagined. But it was fun to see when I first thought about an idea. I found out by looking at my notes and analyzing when a specific word first showed up. That is why I scanned all of my paper documents – about a quarter million pages.
The European: Excuse me?
Wolfram: I generate lots of stuff. Luckily, I later switched from paper to electronic.
The European: When was this?
The European: Strange coincidence…
Wolfram: I suppose so.
The European: Your search engine Wolfram Alpha runs on big data – could it help us make similar discoveries about ourselves?
Wolfram: It depends. Over the last month we have collected a lot of data via Facebook and it seemed to confirm many stereotypes: people get more interested in politics and the weather, as they get older; video games are mostly interesting to men; that kind of information.
The European: Many companies use such information to predict how a person will act in the future. The American retail chain Target famously predicts pregnancies based on consumer buying behavior – and sometimes knows about them even before the customer does. Has data collection become too excessive?
Wolfram: I think it is inevitable. It’s just going to happen. Let me give you a personal example; you can find tons of data about me but until recently you could not find the names of my kids. I know enough about data and decided not to publish their names on the web. So they weren’t out there. But now they have their own Internet presence and can be found quite easily. When it comes to data, it is all about what we are willing to share.
The European: But can we really trust companies with our personal data?
Wolfram: It’s nice if people trust companies that collect data and hope they don’t do something terrible with it. Trust has been good for the Wolfram Alpha project. People do trust us, probably because we are not as commercial as other companies. And we are not going to use the personal data that we have. We have never done so and we do not intend to do so in the future. But maybe that is a bad idea.
The European: Why?
Wolfram: We could do a lot of good things with it. In Wolfram Alpha we could do a really good people search. We have a lot of data about people and could show a very complete picture of them. But it is not clear that this has any positive value. On the other hand, it is clear that it has some negative value. If someone else did it and did it badly, you could argue that we should do it better first.
The European: But?
Wolfram: So far, we haven’t wanted to link up every person. We could link their real estate holdings, their appearances in different places, and many more. It would be a complete report on the person. That might make your lives easier but we think it’s not desirable to have it out in the world.
The European: One of the promises of big data is that understanding it would help us make a lot of money. How would you measure the value of data?
Wolfram: I’m not a big expert on the money aspect. Luckily, my company is successful without concentrating on that. I am more interested in intellectual questions about data. We have a lot of data. One of the things that I find most interesting is when you crosscheck a lot of things.
The European: For example?
Wolfram: We are probably the biggest reporter for bugs in data sets. We look at whether the data is plausible or not. Did the wind speed at a measuring station actually go to zero or did a bird just sit on it? If you crosscheck it with other data you can find out.
The European: What has been the most surprising observation so far?
Wolfram: That the world is far more computable than we think. There is more data on a lot of things than I’ve ever imagined.
The European: Was there a correlation between data that you would never have expected?
Wolfram: Baby names! There is this huge correlation between baby names and happenings – especially in pop culture. What’s your first name?
The European: Lars.
Wolfram: Wolfram Alpha works better with American names than European ones, but let me check (types in “Lars” at Wolfram Alpha). It turns out that there aren’t many Larses in the US. Most of them were born in the sixties and seventies. It predicts only 2000 living Larses in the United States. But what you see here is typical. There must be a reason in why that name got so popular in the sixties. For example there might have been a person famous person with that name at the time.
The European: People fall for the supposed fame and glamour that such a name promises.
Wolfram: Exactly. Let’s try out a very outdated name (types in “Adolf”). See: The name was only popular in the 1920s and 1930s. The most common age for Adolfs living in the US is 82 years. That was to be expected.
The European: What else did you discover by looking at data?
Wolfram: A bizarre fact about correlated data is that much of it tends to look absolutely stupid. Most of the time those correlations are probably coincidental – but sometimes there is a great discovery. We don’t know yet how to filter between random coincidences and really interesting stuff.
The European: Do we sometimes overestimate the power of computing?
Wolfram: One thing being discussed right now is the automated grading of student essays. A crazy idea! If that were to be implemented, people would soon learn how to write essays that are rather opaque to humans but will score really well with computers. It is equally mistaken as teaching people how to do calculus by hand. When one of my children showed me what they did in calculus class I said: “I didn’t think that humans still do that! I haven’t done something like this in 30 years.”
The European: What’s wrong with it?
Wolfram: It is bizarre: We are programming people to do the work of machines. But it’s supposed to be the other way around. We should program machines to do our work.
The European: What do you think about the idea that data should get an expiration date? Currently, data of people who have died never vanishes from the web.
Wolfram: That question has been overlooked for a long time. I know that some people and companies are finally working on it. For years, I have been wondering whether you could build a “digital time safe” to safely store data that only becomes accessible 20 years later. Can you put information away? In some way, that is the inverse of what you’re talking about.
The European: Interesting…
Wolfram: Yesterday I visited Berlin’s Egyptian Museum. The Egyptians had this whole idea of the afterlife. They put some stuff away for their life after death. While I was standing in this museum I realized that this is it: The museum is their afterlife. They left the stuff and now it is here.
The European: But the Egyptians haven’t come back from the dead.
Wolfram: No. They had a funny conception of space and time. We know a lot about Egyptians and Babylonians because they left us so much stuff to study. There are other civilizations that didn’t choose to leave any data and we do not know anything about them; it’s a shame. Still, we have to accept that some people do not want to leave anything behind when they die.
The European: Franz Kafka wanted his writings destroyed – luckily, his closest friend never fulfilled this wish and published them instead. That is the upside of keeping record. Unfortunately, data isn’t always a treasure trove. Imagine what a totalitarian system like the GDR government would have been able to do with today’s possibilities…
Wolfram: I am spoiled in the way that I grew up in countries where you may not agree with what the government does without having to fear harsh consequences. Clearly in the history of Germany, data was a key for criminal regimes. But I’m not too sure that more data would have been all that useful for this particular regime or others.
The European: Why is that?
Wolfram: Let me give you an analogy: Many people think that they are perfectly healthy, but they are wrong! There is something wrong with everyone’s health. If you don’t have a lot of data on it, you don’t think about it. But if you measure the blood levels of 200 different parts of your body, it is inevitable that at least five of them will be out of the norm. The same holds true for people in a society: if you watch every last step of everyone, you will find something wrong with each person – even though there is room for interpretation. At some point, more data is not helpful at all.