Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's unfortunate that "ETL" stuck in mindshare, as afaik almost all use cases are better with "ELT"

I.e. first preserve your raw upstream via a 1:1 copy, then transform/materialize as makes sense for you, before consuming

Which makes sense, as ELT models are essentially agile for data... (solution for not knowing what we don't yet know)



I think ETL is right from the perspective where E refers to “from the source of data” and L refers to “to the ultimate store of data”.

But the ETL functionality should itself lives in a (sub)system that has its own logical datastore (which may or may not be physically separate from the destination store), and things should be ELT where the L is with respect to that store. So, its E(LTE)L, in a sense.


For those confused as to whether ETL or ELT is ultimately more appropriate for you… almost everyone is really just doing ETLTLTLT or ELTLTLTL anyways. The distinction is really moot.


Maybe my understanding is incorrect, but expansion on the distinction.

Assumptions -- We're talking about two separate systems (source and destination) with non-neglible transfer time (although perhaps "quick")

ETL -- Performing the transform before/during the load, such that fields in the destination are not guaranteed to have existed in the source (i.e. 2 db model)

ELT -- Performing a 1:1 copy of source into an intermediary table/db (albeit perhaps with filtering), then performing a transform on the intermediary table/db to generate the destination table/db (either realized or materialized at query time), with the intermediary table/database history retained (i.e. 3 table/db model)

In short distinction, if regeneration or altering the destination is required, ETL relies on history being available in the upstream source.

ELT pulls control of that to the destination-owner, as they're retaining the raw data on their side.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: