What do Wal-Mart, Google and Lady Gaga all have in common? They are all fans of Big Data.
I didn’t know that until today, when Prof. Ping Jiang, who has just joined the department, gave his inaugural lecture. Prof. Jiang was talking about properly large amounts of content. Google create around 25 Petabytes of data every day (that’s a 1 with fifteen zeroes after it). Wal-Mart are registering over 1 million customer transactions an hour. And Lady Gaga (or more probably her manager Troy Carter) are getting input from 31 million Twitter followers and 51 million Facebook fans when they consider what to do next.
Big Data holds useful nuggets of information and lets you do do lots of powerful things. But the problem with big data is that it is, well, er, big. And we are not talking about sheer size here, we also need to consider the rate at which we are adding to the data, and the speed that we want to get useful things from the raw numbers.
It seems that the best way to decide when you are dealing with big data is when conventional techniques break down. If it would take your network of servers several hundred years to deliver the result of one query on your data set, then you are dealing with big data. And the only way to really deal with this is to divide and conquer by spreading the processing around as much as you can, and doing the maximum amount of work you can when you first get the data in.
Prof. Jaing took as an example the problem of machine vision, in the context of robots that can navigate around autonomously. This is a complex problem, with huge amounts of data coming in from the robot’s visual sensors alone. An intelligent robot would need to be very intelligent indeed just to be able to find its way from one office to another.
But if you spread the vision sensors around the building, getting them to perform all the motion and object tacking, you can reduce the intelligence that you need in the robot itself and lose a lot of complexity. Your robot can move a lot more confidently, as the systems controlling it can “see” much further ahead and react to changes in the environment. You are dealing with the big data coming into your system by processing the raw information as it arrives and converting it into a useful form that could be shared by all the devices navigating in an area.
It’s early days, but it did look to me like this did hold the prospect of actually having useful robots working with us.