Cloud and Data Science

(Image source: International Journal of Computer Applications (0975 – 8887) Volume 160 – No 9, February 2017)

Data Science is a field that comprises of everything that is related to data cleansing, preparation, and analysis. It is the umbrella of techniques used when trying to extract insights and information from data.

Data science and cloud computing essentially go hand in hand. A Data Scientist typically analyzes different types of data that are stored in the Cloud. With the increase in Big Data, organizations are increasingly storing large sets of data online and there is a need for Data Scientists.

Cloud computing can help a data scientist use platform such as Windows Azure, which can provide access to programming languages, tools, and frameworks, both for free as well as for a fee.

A Data Scientist typically analyzes different types of data that are stored in the Cloud.  With the increase in Big Data, organizations are increasingly storing large sets of data online and there is a need for Data Scientists.

  • Cloud computing can help a data scientist use platforms such as Windows Azure, which can provide access to programming languages, tools and frameworks, both for free as well as for a fee.
  • Data scientists typically are comfortable in using MapReduce tools, like Hadoop to store data, and retrieval tools, such as Pig and Hive. They also use other languages such as Python and Java to write programs.
  • Typically, it is seen that data scientists use two types of tools – the open source ones, such as R, Python, Hadoop frameworks, and several scalable machine learning tools and other more commercially available ones like MS SQL, Tableau, Oracle RDB, and BusinessObjects.
  • Given the size of the data sets and the availability of tools and platforms, understanding cloud is not just pertinent but critical for a data scientist.

Types of data that a data scientist is likely to work in the cloud:

  • Look at structured, semi-structured and unstructured data
  • Look at varied sets of data, irrespective of the size, format, etc.
  • Analyse them to draw insights

The fact that data scientists and data analysts can rely on data stored on the cloud truly makes their life so much easier. In addition, most cloud providers allow data scientists to access readily installed open-source frameworks right away. This is not only super convenient but can also be a huge time saver.