Fortune Cookies, Data Temperature and Scoville Scales
Fortune Cookies, Data Temperature and Scoville Scales: My son absolutely loves going out for Chinese food. He always looks forward to the times we go and knows exactly what he wants when he gets there. His favorite, which is no surprise for 7-year old, are the fortune cookies. We all take turns reading our fortunes to each other and they’re almost always uneventful. The last time we went things were different however. My fortune cookie was particularly intriguing. It read:
“The farther backward you can look, the farther forward you are likely to see”
Being a Solution Engineer, this got me thinking about the technology that I work with on a daily basis. Of note, it reminds me of how there is a clear need for our customers to retain as much meaningful information as possible.
The best illustration of this came during a presentation from one of our partners on Predictive Analytics, specifically as it relates to Big Data. The presenter was telling of how he worked very closely with Fannie Mae following the subprime mortgage crisis in 2008. He detailed how they took their data, worked closely with the agency’s analytics experts to understand their predictive models and summarized a post-mortem. The findings were sobering. What they found was that the agency could have predicted the mortgage crisis. Given the data and models that they were being used, they knew that something was going to happen but couldn’t identify an event with any precision. The “eureka moment” came when they brought data once considered to be “cold” back online. The same algorithms were applied against an additional 10 years of data and the conclusion was that the exact year and quarter the crisis was going to hit and they could have foretold this had they not aged data out of their systems.
From that day forward I never looked at the topic of data temperature the same. At the very least the concept of “cold data” forever left my vernacular. At the very best, the coolest you could get is “warm”. To that end I’ve begun referring to data as white hot (in-memory, frequently accessed), hot (integrated near-line storage, often accessed) and warm (Hadoop/other RDBMS, potentially accessed). As part of my new charter to change the way we look at data temperature, I have created my own Scoville scale for the different temperatures of data. The Scoville scale is apropos since, by definition, there is no such thing as a “cold” pepper in the scale. Everything has some degree of heat, it’s just a matter of how much.
From a technical perspective, there are a number of solutions to support the Scoville scale of data temperature. For example, a modern business platform like SAP HANA brings the power of in-memory technology to deliver interactions in real-time, integrated near-line solutions that allow the storage of vast amounts of data and the connectors necessary to access infinite amounts of data stored elsewhere. Let’s take a simple architectural view of what’s described above with SAP HANA:
SAP HANA + SAP IQ
SAP HANA is a platform for modern business applications. The platform boasts many features to support the most demanding use cases. Its capabilities include: a powerful in-memory database, native data processing engines (calculation, predictive, unstructured text, geospatial, etc), connectivity to Hadoop and other relational databases and integration with the statistical computing language R.
What makes this platform special is its tight integration with SAP IQ, a columnar database technology obtained in the Sybase acquisition, via dynamic tiering using extended tables. In effect, HANA manages these tables as if they are local to the platform when in reality the physical data is stored in IQ on commodity hardware. This allows petabytes of data to be stored in a cost effective manner while delivering business class performance users expect.
Taking this back to data temperature for a moment, this combination satisfies the top two temperatures on my Scoville scale; white hot (HANA) and red hot (IQ).
Hadoop + Smart Data Access
Lastly let’s clear out our coolest – warmest? – data tier of the three, warm data. The big player here is Hadoop. Hadoop provides the distributed processing of large amounts of data. Like IQ it also uses commodity hardware. Think of this as the catchall for potentially used data that can be called upon at a moment’s notice from HANA to provide more history on a report or deeper pool of reference data for an algorithm.
Business technology is the most interesting and exciting it has ever been. Organizations are using technology to not only transform their businesses but entire industries. The combination of Big Data, Predictive Analytics, Data Science and the ever-decreasing cost of storage media means that we need to re-visit our preconceived notions regarding data temperature. It is clear that the technology is available to ensure that data is never regarded as “cold” again. I hope everyone considers the Scoville scale to help change the way our customers look at their data. Who knows? You just might change their fortune!