Every company wants to produce valuable data, but not all companies are ready or capable. Too often, they believe marketing jumps by clicking and clicking, without code around the connectors. Set it and forget it, and all the hard work will be done – right? The hard truth is that there are a number of critical steps that all information must go through in order for teams to get to what they want: valuable insights.
Concentration in many respects the information is easy part. There are more options than ever to quickly and easily build pipelines that swallow data from hundreds of sources and upload them to a data warehouse or data lake. However, one important topic has been overlooked in today’s tool marketing: enabling the transition from concentrating data to delivering valuable insight. This is hard work in data science. Valuable insights are not free, and companies can only get these insights by taking information through their pace.
The transition from the process “we have information” to “we have value” is complex and can be grouped into four key steps:
- Extract and download: centralize data from multiple sources into a data warehouse
- Convert: understand data by converting – cleaning, combining, combining – data from different sources
- Learn: Build machine learning models now that you understand the knowledge. Technically, “learning” is just a more advanced variation
- Service: send insights and conclusions to applications and other systems to “verify” the value of data
These steps are the progress of data-specific work, not the compilation of processes or tools. Let’s talk about how to move from “Extract and Download” to “Serve” and what it means to “take data through at speed”.
The task of data groups is to build a data stack to support business needs, we classify it as a “Data-Driven Company needs hierarchy”. The hierarchy looks like this:
Business-critical data is in dozens, if not hundreds, of different sources. In some cases, service providers such as Datacoral and Fivetra provide connectors for transferring data to centralized repositories and lakes. And in others, design teams build these pipelines with custom code. But even if the data passed, it requires expertise to figure out what this information really means. Companies can’t just move on to data and start offering insights that predict and improve business results. Basic work must be done first.
Basic work in computer science includes experimentation, hypothesis testing, normalization, purification, transformation, visualization, analysis, and more. Insights cannot be provided without an in-depth understanding of the data. Let’s see how this typically occurs.
Imagine a small startup that has just registered its first few customers. They have product usage data stored in their PostgreSQL database and they have started thinking about analytics and gaining value from that data. Currently, they are at the bottom of the data hierarchy. The first step as they move into the data hierarchy is to extract data from their database or other data sources and upload them to a data warehouse such as Snowflake.
Once the data is available in its warehouse, the team can begin researching and refining the data with simple transformations. Because their team is small, this work is likely to be done as a side project by one of their software engineers (or a highly technical product manager), resulting in a small summary screen that shows important information about how users are using the launcher product. At this point, the startup has been able to not only extract and download data from its sources, but also convert it to produce product-specific KPIs and metrics. This already creates significant value. As the team size and sophistication grows, they can begin to develop machine learning models and provide data into applications.
Now imagine a bigger, more mature company. This company already has connectors to collect data from various sources. The data is already connected to the data warehouse. The team has spent months researching and refining their data, and they have simple dashboards that show key metrics. The team relies on the quality of the data enough to rely on these metrics in internal reporting. They now identify use cases for proactive analysis and decide to make the first Data Science lease. The new Data Scientist will be able to rely on clean and understandable data in the inventory and help the team climb the data hierarchy by training the first ML model.
Once the ML model is in production, the team is able to serve internal and external use cases: customer lifetime value (LTV) estimates for sales and marketing teams or time-saving features for their customers. At this point, the team successfully navigates the data hierarchy; they always go from source-locked raw data to providing insights in their applications. While there is always more work to be done to scale to more data sources, new use cases, and larger amounts of data, this is a successful data journey that very few companies are able to complete.
The idea of building a stack of information can be daunting. Using the information hierarchy as a general design framework can bring to mind a relaxed one. It is tool-agnostic, indefinite, and focuses on high-level design and implementation processes.
The first layer of the frame (“Unpack and Load”) contains the centralization of the data to the repository or lake. The second layer (“Convert”) includes data collected from multiple sources and finding meaningful insights. The third layer (“Learn”) is based on these insights with machine learning models and artificial intelligence. With the fourth layer (“Service”), AI / ML work data can add value to internal and external services, customers, dashboards, and more.
By identifying which parts of the data stack fit each layer of the frame, the data group can use a conceptual approach to generate value. Nor is it a personal relationship. Some data tools can serve multiple layers of a frame. For example, Fivetran can work in the “Extract and Load” and “Convert” layers. Datacoral can work on all four floors. The custom code can be used to serve the Extract and Download and Learn levels. The main takeaway is to use a framework to make the design easier instead of using multiple tools to figure out where the value is.
Moving from “Extract and Download” to “Serve” is what we call “exporting data on the fly.”
We often see confusion when using a tool-driven mindset instead of a data hierarchy mindset. The number of different components and interconnections may seem astronomical. (See data quality for more information.) Finding the right combinations for specific needs is overwhelming for small groups of data just starting out – and there is a lot at stake. This network of tools must survive and allow the company to grow over the next two or three years. The properties of the tools must also scale with this growth.
Companies that want to generate valuable insights need to set the pace. The information hierarchy of needs provides “what” and as we have examined in previous simplicity posts a modern stack of informationand using metadata capabilities for the future of said stack, the metadata first, three-layer framework provides a “how to”.
The data stream layer of a frame represents the transition from “data” to. But that is just the beginning. Together, these ideas provide a way of thinking objectively and abstractly about your information.
The hierarchy makes it clear that there are a number of dependencies in the transition from “getting information” to gaining valuable insights from information. This is how data-driven companies work. It is no coincidence. They transfer data at its pace. It requires expertise and experimentation.
Examining data using a needs information hierarchy allows each data group to develop a meaningful understanding of where they are in the hierarchy, how to assess the strengths and weaknesses of their current position, and how to move on to the next step.
If you’re looking for examples of how data is exported in practice, we have two stories to share. A platform for acquiring corporate talent Greenhouse rose to the data hierarchy to use ML models that improved their interview scheduling experience and solved their “pointillist spaghetti data problem.” And the start of the gig economy Jyve used their intelligence tell “Jyvers” on their platform and expand into new markets. Both of these use cases show how little data can go a long way!