Big data seems to be the new catch word. What’s so special about it and why does it matter?
‘Big data’ refers to datasets so large that it becomes difficult to process them using conventional methods. Big datasets arise partly from ‘big science’, for example astrophysics and genetics. They also arise from measurements of human activity which have not only become very large but highly granular (e.g. data on online activity). The third reason is that sensor networks are becoming ubiquitous, and they can generate huge amounts of data. Examples of the size of some of these very large datasets include Walmart’s customer transaction dataset which contains the equivalent of 160 times all the information in all the books in the world, and the large Hadron Collider which has 150 million sensors delivering data 40 million times a second.
Big data are difficult to work with using most relational database management systems and standard statistics packages, requiring instead parallel software systems running on tens, hundreds, or even thousands of servers and using the web to harness the power of multiple computers. Not only that, conventional approaches to analysis are slow, with models taking hours or even days to run. In contrast, some applications, for example analysis of transactions to detect credit card fraud, require results in seconds. A range of new techniques are going to be needed for storing, processing, sampling and analysing these very large datasets and distinguishing signal from noise.
Some of this is going to be relevant to healthcare, with applications that we could not even have dreamt of a few years ago. Examples might include:
- tracking epidemics from social network data or using body temperature sensors embedded in smartphones
- routine sequencing of DNA to support developments in stratified medicine
- analysing food, alcohol or tobacco purchasing data
- data mining from hundreds of thousands of electronic medical records
Find out more
- Read about the recent Big Data Week
- Go to a conference in Singapore!
- Read “Critical Questions for Big Data” in Information, Communication & Society