Big Data in Big Companies

Srasthy Chaudhary
6 min readSep 17, 2020

--

INTRODUCTION

Big data burst upon the scene in the first decade of the 21st century, and the first organizations to embrace it was online and startup firms. Arguably, firms like Google, eBay, LinkedIn, and Facebook was built around big data from the beginning. They didn’t have to reconcile or integrate big data with more traditional sources of data and the analytics performed upon them, because they didn’t have those traditional forms. They didn’t have to merge big data technologies with their traditional IT infrastructures because those infrastructures didn’t exist. Big data could stand alone, big data analytics could be the only focus of analytics, and big data technology architectures could be the only architecture.

Consider, however, the position of large, well-established businesses. Big data in those environments shouldn’t be separate but must be integrated with everything else that’s going on in the company. Analytics on big data have to coexist with analytics on other types of data. Hadoop clusters have to do their work alongside IBM mainframes. Data scientists must somehow get along and work jointly with mere quantitative analysts.

How new the big data is?

Big data may be new for startups and for online firms, but many large firms view it as something they have been wrestling with for a while. Some managers appreciate the innovative nature of big data, but more find it “business as usual” or part of a continuing evolution toward more data. They have been adding new forms of data to their systems and models for many years, and don’t see anything revolutionary about big data. Put another way, many were pursuing big data before big data was big. When these managers in large firms are impressed by big data, it’s not the “bigness” that impresses them. Instead it’s one of three other aspects of big data: the lack of structure, the opportunities presented, and low cost of the technologies involved.

“It’s About Variety, not Volume: Big companies are focused on the variety of data, not its volume, both today and in three years. The most important goal and potential reward of Big Data initiatives is the ability to analyze diverse data sources and new data types, not managing very large data sets.”

Firms that have long handled massive volumes of data are beginning to enthuse about the ability to handle a new type of data — voice or text or log files or images or video. Companies can have a much more complete picture of their customers and operations by combining unstructured and structured data.

Objectives for Big Data

Big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings. Like traditional analytics, it can also support internal business decisions. The technologies and concepts behind big data allow organizations to achieve a variety of objectives, but most of the organizations were focused on one or two. The chosen objectives have implications for not only the outcome and financial benefits from big data, but also the process — who leads the initiative, where it fits within the organization, and how to manage the project.

Big Data’s moving parts

No single business trend in the last decade has as much potential impact on incumbent IT investments as big data. Indeed big data promises — or threatens, depending on how you view it — to upend legacy technologies at many big companies.

Companies are not only replacing legacy technologies in favour of open source solutions like Apache Hadoop, but they are also replacing proprietary hardware with commodity hardware, custom-written applications with packaged solutions, and decades-old business intelligence tools with data visualization. This new combination of big data platforms, projects, and tools is driving new business innovations, from faster product time-to-market to an authoritative — finally! — single view of the customer to custom-packaged product bundles and beyond.

The Big Data Stack

As with all strategic technology trends, big data introduces highly specialized features that set it apart from legacy systems.

Each component of the stack is optimized around the large, unstructured and semi-structured nature of big data. Working together, these moving parts comprise a holistic solution that’s fine-tuned for specialized, high-performance processing and storage.

Hadoop

Hadoop is an important part of the NoSQL movement that usually refers to a couple of open source products — Hadoop Distributed File System (HDFS), a derivative of the Google File System, and MapReduce — although the Hadoop family of products extends into a product set that keeps growing. HDFS and MapReduce were co-designed, developed, and deployed to work together.

Hadoop adoption — a bit of a hurdle to clear — is worth it when the unstructured data to be managed reaches dozens of terabytes. Hadoop scales very well, and relatively cheaply, so you do not have to accurately predict the data size at the outset. Summaries of the analytics are likely valuable to the data warehouse, so interaction will occur.

The user consumption profile is not necessarily a high number of user queries with a modern business intelligence tool and the ideal resting state of that model is not dimensional. These are data-intensive workloads, and the schemas are more of an afterthought. Fields can vary from record to record. From one record to another, it is not necessary to use even one common field, although Hadoop is best for a small number of large files that tend to have some repeatability from record to record.

Record sets that have at least a few similar fields tend to be called “semi-structured,” as opposed to unstructured. Web logs are a good example of semi-structured. Either way, Hadoop is the store for these “nonstructured” sets of big data.

Big Data and Data Warehouse Coexistence

This minimizes disruption to existing analytics functions while at the same time accelerating new or strategic business processes that might benefit from increased speed. Above figure shows that the data warehouse can serve as a data source into the big data environment. Likewise, Hadoop can consolidate key data output that can populate the data warehouse for subsequent analytics.

Big Data at Bank of America

Given Bank of America’s large size in assets (over $2.2 trillion in 2012) and customer base (50 million consumers and small businesses), it was arguably in the big data business many years ago. Today the bank is focusing on big data, but with an emphasis on an integrated approach to customers and an integrated organizational structure. It thinks of big data in three different “buckets” — big transactional data, data about customers, and unstructured data. The primary emphasis is on the first two categories. With a very large amount of customer data across multiple channels and relationships, the bank historically was unable to analyze all of its customers at once and relied on systematic samples. With big data technology, it can increasingly process and analyze data from its full customer set. Other than some experiments with analysis of unstructured data, the primary focus of the bank’s big data efforts is on understanding the customer across all channels and interactions and presenting consistent, appealing offers to well-defined customer segments.

For example, the Bank utilizes transaction and propensity models to determine which of its primary relationship customers may have a credit card or a mortgage loan that could benefit from refinancing at a competitor. When the customer comes online, calls a call centre, or visits a branch, that information is available to the online app, or the sales associate to present the offer. The various sales channels can also communicate with each other, so a customer who starts an application online but doesn’t complete it, could get a follow-up offer in the mail, or an email to set up an appointment at a physical branch location. A new program of “BankAmeriDeals,” which provides cash-back offers to holders of the bank’s credit and debit cards based on analyses of where they have made payments in the past. There is also an effort to understand the nature of and satisfaction from customer journeys across a variety of distribution channels, including online, call centre, and retail branch interactions. The bank has historically employed a number of quantitative analysts, but for the big data era, they have been consolidated and restructured, with matrixed reporting lines to both the central analytics group and to business functions and units.

--

--