Why Databricks calls CDC 'continuous data corruption' - and what it built instead
Shanku Niyogi, Vice President of Product Management at Databricks, has a new name for an old acronym. CDC, the streaming pipeline technique that has shuttled operational data into the analytics warehouse for years, is – in his words – "continuous data corruption."
The original meaning is continuous data capture, and most data engineers will recognize where Niyogi is going. CDC pipes a copy of every change in a transactional database – the live system running orders, payments and stock – over to the analytics warehouse. That way, analysts can query yesterday's data without slowing down today's customers. It is a workaround for a 40-year-old split that exists because the two kinds of database were built for incompatible jobs. The workaround, by Niyogi's account during an interview on the week of Databricks' Data and AI Summit in San Francisco, has not aged well.
Niyogi says:
CDC was slow, and it was...
Copyright of this story solely belongs to diginomica.com. To see the full text click HERE