The Conexis Blog


The explosion of data has created the next big opportunity to leverage EXISTING enterprise big data with technology to achieve business objectives
There are amazing new opportunities over nearly every industry vertical in the burgeoning explosion of big data. Witness that a company like J.P. Morgan Chase has over 30,000 databases and 15,000 applications spread over 7 business units and you can begin get the picture. J.P Morgan has been seizing on this big data opportunity and have adopted Hadoop as the basis for their Common Data Platform and are looking to establish a 360 degree view of the customer to identify upselling and cross-selling opportunities in a big way.

Hadoop has become a household name in the Big Data sector these The Big Data world is complex and there are many components of it, but at the core of it all is Apache Hadoop’s revolutionary compute and storage architecture for data. We think that this might be one the most significant trend in data architecture in a decade.

To understand Apache Hadoop’s impact on compute and storage architectures, it’s important to understand where it came from and why it was needed in the first place. While applications have always produced a lot of data, we only ever stored the “valuable” data. This data was labeled and tagged and ultimately stored in neatly architected databases. Meanwhile, we simply ignored data like application logs and website logs, which offered the potential for in depth user insights, but were too voluminous and lacked the structured necessary for data systems at the time. It’s also important to realize that this unstructured data is increasing at exponential rates. As IT infrastructures across the economy get instrumented with Internet-scale solutions, the amount of data product increases exponentially – which further exacerbates the problem and expands the opportunity.

Google was one of the first companies to tackle the challenge of parsing through vast amounts of data in an attempt to make sense of it, all in a cost-efficient way. Not only did they have to contend with massive volumes of data, they also had to deal with large amounts of unstructured data, as websites didn’t have nice labels or tags. They came up with a concept called MapReduce, which solved both the scale problem and the cost problem for crawling and indexing unstructured data across the web.

MapReduce is the modern version of the “grid computing” that we talked about for years, but with a twist. The old way of parsing lots of data was to have a large computer (expensive) that fetched (expensively stored) unprocessed data, processed it and then returned it. MapReduce parsed up both the compute load and the stored data in moderately sized chunks and distributed it out across cluster of cheap/commodity computers with cheap disks to process. The processed data would then be reassembled into intelligible “knowledge” that helped users run their business. This is a big deal because it completely changes the cost and scale equation to computing big amounts of data.

Google published a white paper of MapReduce and a few smart engineers in the open source community cooked up Hadoop. Since the original creation, the little elephant has moved in as the heart and soul of many Internet companies’ computing architecture. Yahoo was the biggest and most active user, but Facebook, Twitter, Amazon, Hulu, and many others have joined the herd. The uses are varied and unique, but all of them center on using Hadoop to get insight from data that was previously discarded.

Hadoop has clearly become one of the most vibrant sets of projects in the Apache open source universe. While open source projects thrive on the contributions made by the community; in our experience with MySQL and others, the greater the contributions than an organization makes to the project, the more they can shape the destiny of the project.

Open source in is our blood at Conexis and our sister companies. While Apache Hadoop has been something of a Silicon Valley phenomenon, the trends that enterprises face everywhere are undeniable – all enterprise face a tsunami of unstructured data coming their way. Big data is challenge and an opportunity in virtually every sector of the economy: healthcare (drug discovery, patient care), public sector (taxation, demography), retail (demand signal, supplier management), financial services (sales and trading, analytics), and so forth. Corporations in most of these sectors are struggling to find a scalable and cost effective solution, not just to “deal with” the data, but also to transform it into a competitive advantage.

Across all of these industry verticals, big data and it’s ecosystem will undoubtedly be the solution. While most enterprises are still in the early stages of adopting Hadoop, and other big data platforms, it’s clear to us that if we fast-forward a few years, there will be winners and losers running around enterprise architectures and Big Data will be in the center of it all.