What’s all the fuss about ‘big data’?

Big data seems to be the new catch word. What’s so special about it and why does it matter?

‘Big data’ refers to datasets so large that it becomes difficult to process them using conventional methods. Big datasets arise partly from ‘big science’, for example astrophysics and genetics. They also arise from measurements of human activity which have not only become very large but highly granular (e.g. data on online activity). The third reason is that sensor networks are becoming ubiquitous, and they can generate huge amounts of data. Examples of the size of some of these very large datasets include Walmart’s customer transaction dataset which contains the equivalent of 160 times all the information in all the books in the world, and the large Hadron Collider which has 150 million sensors delivering data 40 million times a second.

Big data are difficult to work with using most relational database management systems and standard statistics packages, requiring instead parallel software systems running on tens, hundreds, or even thousands of servers and using the web to harness the power of multiple computers. Not only that, conventional approaches to analysis are slow, with models taking hours or even days to run. In contrast, some applications, for example analysis of transactions to detect credit card fraud, require results in seconds. A range of new techniques are going to be needed for storing, processing, sampling and analysing these very large datasets and distinguishing signal from noise.

Some of this is going to be relevant to healthcare, with applications that we could not even have dreamt of a few years ago. Examples might include:

  • tracking epidemics from social network data or using body temperature sensors embedded in smartphones
  • routine sequencing of DNA to support developments in stratified medicine
  • analysing food, alcohol or tobacco purchasing data
  • data mining from hundreds of thousands of electronic medical records

Find out more

This entry was posted in Blog and tagged , . Group: . Bookmark the permalink. Both comments and trackbacks are currently closed.
  • The Cambridge Centre for Health Services Research (CCHSR) is a thriving collaboration between the University of Cambridge and RAND Europe. We aim to inform health policy and practice by conducting research and evaluation studies of organisation and delivery of healthcare, including safety, effectiveness, efficiency and patient experience.