The ETL Process vs The ELT Process
Is the ETL process (Extract, Transform, Load) still a vital and relevant approach? As a holdover from an era before Big Data, automation, machine learning, employee-focused service management, the Internet of Things, and AI, this process maintains an inherent focus on the retention of important data via a reliance on business logic, decisions, and aggregation.
Or should it naturally give way to the ELT process (Extract, Load, Transform)? Would retaining data in its natural state, preserved in all its imperfect glory for potential use someday, be a better contemporary approach?
In many ways, shifting from ETL to ELT seems a logical and simple step in the evolution of data retention and management. However, what may on the surface seem like a somewhat insignificant change in the order of phases can have some tremendous effects in data infrastructure, tool and other necessary resources, as well as in the processes of data management. In this post, we’ll take a close look at both approaches and examine their key differences.
The ETL process
To really understand the ETL process in its true context, you either need to have had a long history in IT and a good memory, or you’ll have to build yourself a time machine and travel back to the 1970s. After all, ETL came up in an era when data was primarily generated as a result of transactions.
As a result, the data captured by the process was largely what could be called records – records of the actions of people, the places where they did things, and what they did in those places. This data had little intrinsic value in its raw form, and needed to be analyzed and contextualized, extracting its value prior to storing it for later use in a data warehouse.
The ELT process
By contrast, the ELT process captures data and preserves it in its contextualized raw form. This data is then stored in either a data warehouse, or (preferably) a data lake, where it can be accessed and analyzed along with all of its context (unstructured, somewhat structured, or structured data). The way in which the data are accessed can take many forms, depending on the need behind its access.
The ELT process allows for data to be accessed by analytics developers and data scientists approaching it at a later date and from a variety of perspectives, utilizing a variety of tools. Simply put, it preserves data in a more raw state (which is inherently bulkier) keeping everything, with the understanding that our ability to interpret and our interpretation of that data will naturally evolve into the future.
Key Differences Between ETL Process and ELT Process
The key difference between the two processes, if not already obvious, comes down to the assigning of value. ETL dictates the value of the data, whereas ELT preserves all possible value in the data. As to which is the best process to employ, that is largely dependent on what the data are, who will need to access the data, and why and how they will be accessing it at a later date.
Are you getting the data insights you need across your tech stack? You can learn more about how the observability solutions built on the
SolarWinds Platform are designed to unlock the full potential of your data.