Big Data Industry: Implication for the Library and Information Sciences
DOI:
https://doi.org/10.4314/d2js5559Keywords:
NilAbstract
University of Pittsburgh (2007) defines Big Data as the sets of data that are so large and complex to use effectively and efficiently. Chen and Zhang (2014) on their part define Big Data as a collection of very huge data sets with great diversity of types, that it is difficult to process by using state of the art processing approaches or traditional data processing platforms. They point out that a data set can be called Big Data if it's formidable to capture, curate, analyse and visualise using current [or conventional] technologies. Penn State College of Information Science and Technology (nd) looks at Big Data differently as a process that is concerned with the exploration, development, and applications of scalable algorithms, infrastructures, and tools for organising, integrating, retrieving, analysing, and visualising, large, complex, and heterogeneous data. SAS Institute (nd) characterises Big Data in five ways which can deciphered in 4Vs and C, namely: Volume (that organisations collect data from variety of sources including business transactions, social media, sensor or machine to machine data); Velocity (that data streams in at unprecedented speed and must be dealt with in timely manner); Variety (that Big Data comes in all types of formats-structured, numeric, unstructured text documents, email, video, audio, and more); Variability (that Big Data flows Editorial Feature Big Data Industry: Implication for the Library and Information Sciences can be highly inconsistent); and Complexity (that Big Data comes from multiple sources, which make it difficult to link, match, cleanse and transform across systems).