The Rise of ‘Big Data’ in Cloud Computing

9 min readJun 1, 2023

I took a class in Cloud Computing as a first year grad student and have since been amazed at the massive growth in the scale of data or big data generated through cloud computing. The continuous increase in the volume and detail of data captured by organizations, such as the rise of social media, Internet of Things (IoT), and multimedia, has produced an overwhelming flow of data in either structured or unstructured format. Data creation is occurring at a record rate, referred to herein as big data, and has emerged as a widely recognized trend. Cloud computing is one of the most significant shifts in modern ICT and service for enterprise applications and has become a powerful architecture to perform large-scale and complex computing. The advantages of cloud computing include virtualized resources, parallel processing, security, and data service integration with scalable data storage. Cloud computing can not only minimize the cost and restriction for automation and computerization by individuals and enterprises but can also provide reduced infrastructure maintenance cost, efficient management, and user access. As a result of the said advantages, a number of applications that leverage various cloud platforms have been developed and resulted in a tremendous increase in the scale of data generated and consumed by such applications. Some of the first adopters of big data in cloud computing are users that deployed Hadoop clusters in highly scalable and elastic computing environments provided by vendors, such as IBM, Microsoft Azure, and Amazon AWS.

Big Data Analytics Cycle for Cloud Computing

In traditional environments, data is first explored then a model design as well as a database structure is created. It starts by gathering data from multiple sources, such as multiple files, systems, sensors and the Web. This data is then stored in the so called” landing zone” which is a medium capable of handling the volume, variety and velocity of data. This is usually a distributed file system. After data is stored, different transformations occur in this data to preserve its efficiency and scalability. Afer that, they are integrated into particular analytical tasks, operational reporting, databases or raw data extracts.

ETL (Extract, Transform, Load) is about taking data from a data source, applying the transformations that might be required and then loading it into a data warehouse to run reports and queries against them. This approach is that is characterized by a lot of I/O activity, a lot of string processing, variable transformation and a lot of data parsing.

ELT (Extract, Load, Transform) is about taking the most compute-intensive activity (transformation) and doing it not in an on-premise service which is already under pressure with regular transaction-handling but instead taking it to the cloud. This means that there is no need for data staging but this approach employs the concept of” data lakes” because they do not require the transformation of data before loading them

Cloud Architecture

Private clouds are dedicated to one organization and do not share physical resources. The resource can be provided private cloud deployments are security requirements and regulations that need a strict separation of an organization’s data storage and processing from accidental or malicious access through shared resources.

Public clouds share physical resources for data transfers, storage, and processing. However, customers have private Security concerns, which entice a few to adopt private clouds or custom deployments, are for the vast majority of customers and projects irrelevant. Visualization makes access to other Real-world problems around public cloud computing are more mundane like data lock-in and fluctuating performance of individual instances.

The hybrid cloud architecture merges private and public cloud deployments. This is often an attempt to achieve capabilities. Some organizations experience short periods of extremely high loads, e.g. as a result of seasonality like black Friday for retail, or marketing events like sponsoring a popular TV event. These events can have huge economic impact to organizations if they are serviced poorly. The hybrid cloud provides the opportunity to serve the base load with in-house services and rent for a short period a multiple of the resources to service the extreme demand.

Relationship between Cloud Computing & Big Data

Cloud computing and big data are conjoined. Big data provides users the ability to use commodity computing to process distributed queries across multiple datasets and return resultant sets in a timely manner. Cloud computing provides the underlying engine through the use of Hadoop, a class of distributed data-processing platforms. Large data sources from the cloud and Web are stored in a distributed fault-tolerant database and processed through a programing model for large datasets with a parallel distributed algorithm in a cluster. The main purpose of data visualization, is to view analytical results presented visually through different graphs for decision making.

Relationship Big Data and Cloud Computing

With the generation of an enormous amount of data, cloud computing is playing a significant role in the storage and management of that data. It’s not only about the growth of big data but also the expansion of data analytics platforms like Hadoop. As a result, it is creating new opportunities in Cloud computing. Hence, the service providers like AWS, Google and Microsoft are offering their own big data systems in a cost-efficient manner which is scalable for businesses of all sizes. This, in turn, has led to a new service model which is known as Analytics as a Service (AaaS). This will provide a faster and scalable way to integrate different types of structured, semi-structured and unstructured data, analyze them, transform and visualize them in real time. Additionally, Big data cloud computing relationship can be assessed from these perspectives;

1 A cloud computing environment usually has several user terminals and service providers. From the collection terminals, the user collects the data using the big data. tools. On the other hand, from the service provider end it saves, stores and processes the big data. Hence, cloud computing provides a big data infrastructure.

2 Since the cloud environment is scalable, hence it can provide adequate data management solution irrespective of the volume of the data. If the necessary cloud computing service provider can also offer security policies as per the user demands.

3 Identity management and access control are two major Cloud computing can meet this security requirement using a simple software interface by abstracting internal details of the information. Additionally, this guarantees access to the authorized users.

4 Big data for data processing can be located across the global locations and maintaining such huge servers in different locations is a costly measure for an organization. As cloud computing can store and process data through geographically dispersed and as well as virtual servers it reduces the cost of big data processing significantly.

5 Cloud computing uses high-level software and applications which do not depend on the efficiency of the user devices. Furthermore, it depends on the network servers and their strength. Hence, big data cloud computing service is beneficial.

Big Data has emerged in the past few years as a new paradigm providing abundant data and opportunities to improve and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. While Big Data is responsible for data storage and processing, the cloud provides a reliable, accessible, and scalable environment for Big Data systems to function. Big Data is defined as the quantity of digital data produced from different sources of technology, for example, sensors, digitizers, scanners, numerical modeling, mobile phones, Internet, videos, social networks. Cloud Computing and Big Data are complementary to each other. Cloud Computing provides solutions and addresses problems with Big Data but big data and Cloud computing technologies are valuable on its own. Furthermore, many businesses are targeting to combine the two techniques to reap more business benefits. While Cloud manages the local software, Big data helps in business decisions.

Big Data Role in Cloud Computing

Big data and Cloud computing relationship can be categorized based on these service types;

Cloud computing is a terrific solution for enterprises that wish to have state of the art technology running their operations under a limited budget. Maintaining a big data center to perform Big Data analytics can quickly drain an IT budget. Nowadays, companies have the option to avoid investing heavily in setting up the IT department and maintaining hardware infrastructure. With the cloud computing, the responsibility shifts to the cloud providers and the company only have to pay for the storage space and power consumption. A modern data-management platform brings together master data management and big data analytics capabilities in the cloud so that business can create data-driven applications using the reliable data with relevant insights. The principal advantage of this unified cloud platform is faster time-to-value,keeping up with the pace of business.

Storage — One of the biggest concerns with big data is its storage. Physical infrastructure is not enough to store this huge amount of data properly. Even if the capacity is not the issue, the scalability of physical storage cause issues for the users. Cloud computing provides reliable, secure, and scalable storage facilities to store and access big data.

Accessibility — The SaaS, PaaS, IaaS models delivered by cloud services are all virtual services hosted by third parties. The users can modify them and access them from their browsers without installing and running the software. The ease of accessibility is coupled with the swift transfer of data through many channels without an external source.

Security — Data security is a big issue in today’s world of information technology. According to Statista, the number of cases of data breaches in the US alone stood at 1001 in 2020. Cloud services are open-sourced and accessible, thus secure storage is a challenge.Cloud services provide various levels of security based on users’ needs. In this way, cloud computing eases the storage, handling, and accessibility of big data. With the increasing penetration of electronic devices, the need for cloud computing is only going to increase.

Big Data has emerged in the past few years ideal for providing abundant data and opportunities toimprove and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Big Data presents challenges for digital earth to store, transport, process, mine and serve the data. Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. Cloud computing is a powerful technology to perform massive-scale and complex computing. It eliminates the need to maintain expensive computing hardware, dedicated space, and software. Massive growth in the scale of data or big data generated through cloud computing has been observed. Addressing big data is a challenging and time-demanding task that requires a large computational infrastructure to ensure successful data processing and analysis. Cloud computing provides enterprises a cost-effective & flexible way to access a vast volume of information we call the Big Data. Because of Big Data and cloud computing, it is now much easier to start an IT company than ever before. However, it is important to note that cloud-based big data analytics success depends on many factors. An important factor is a reliable cloud provider with extensive expertise, offering highly robust services.

References

S. Del. Rio, V. Lopez, J. M. Bentez and F. Herrera, On the use of mapreduce for imbalanced big data using random forest, Information Sciences, 285 (2014), pp. 112–137.
MH. Kuo, T. Sahama, A. W. Kushniruk, E. M. Borycki and D. K. Grunwell, Health big data analytics: current perspectives, challenges and potential solutions, International Journal of Big Data Intelligence, 1 (2014), pp. 114–126.
R. Nambiar, A. Sethi, R. Bhardwaj and R. Vargheese, A look at challenges and opportunities of big data analytics in healthcare, IEEE International Conference on Big Data, 2013, pp. 17–22.
Z. Huang, A fast clustering algorithm to cluster very large categorical data sets in data mining, SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997.
Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In OSDI, pages 205–218, 2006