DataBus (data synchronization component)

DataBus (data synchronization component)

Databus is a low-latency, reliable, transaction-supported, and consistent data change capture system. Open sourced by LinkedIn in 2013. By mining database logs, Databus pulls database changes from the database in real time and reliably. The business can obtain changes in real time through a customized client and perform other business logic.

Databus has the following characteristics:

Isolation between data sources and consumers.
Data transmission can guarantee sequentiality and high availability of at least one delivery.
Consumption from any point in the change stream, including all data obtained through bootstrap. Consistent storage of the
source by partition , consumption will continue until the consumption is successful if the consumption is unsuccessful

2. Functions & Features

Source independence: Databus supports change capture of multiple data sources, including Oracle and MySQL.
Scalable and highly available: Databus can be expanded to support thousands of consumer and transactional data sources while maintaining high availability.
Transactions are submitted in order: Databus can maintain the integrity of the transactions in the source database, and deliver change events in accordance with the transaction grouping and the source's submission order.
Low latency and support for multiple subscription mechanisms: After the data source is changed, Databus can submit the transaction to the consumer within milliseconds. At the same time, consumers can only obtain the specific data they need by using the server-side filtering function in Databus.
Unlimited backtracking: Support unlimited backtracking capabilities for consumers, for example, when consumers need to produce a complete copy of data, it will not cause any additional burden on the database. This function can also be used when the consumer's data is significantly behind the source database.
The overall system architecture and main components

3.1 Overall system architecture

Insert picture description here

The figure above introduces the composition of the Databus system, including Relays, bootstrap services, and Client lib. Bootstrap services include Bootstrap Producer and Bootstrap Server. Fast-changing consumers take events directly from Relay. If a consumer's data update lags far behind, the data it needs is not in the Relay log, but needs to request the Bootstrap service, and it will return a snapshot of all data changes since the last time the consumer processed the change.

Source Databases: MySQL and Oracle data sources
Relays: Responsible for capturing and storing database changes, full memory storage, and can also be configured to use mmap memory mapping file mode.
Schema Registry: a conversion table from database data type to Databus data type
Bootstrap Service: a special The client of, the function is similar to Relays. It is responsible for storing database changes, mainly disk storage.
Application: Database changes consumption logic, pulls changes from Relay, and consumes changes
Client Lib: Provides APIs for selecting and focusing on changes to consumption logic
Consumer Code: Change the consumption logic, which can be self-consumption or send the changes to downstream services

3.2 Main components and functions

The overall architecture of the system in the above figure is relatively simple. After downloading the source code and observing the project structure, it is not difficult to find that the databus is mainly composed of the following four components:

Databus Relay:
Read the changed rows from the Databus source in the source database and serialize them as Databus change events and save them in the memory buffer.
Monitor Databus client requests (including bootloader requests) and transmit Databus data change events.
Databus Client:
Check new data change events and handle specific business logic callbacks on the Relay.
If they fall too far behind the relay, run a retrospective query to the bootstrap service.
Individual clients can handle all Databus streams, and they can also be part of a cluster and each client handles a portion of the stream.
Databus Bootstrap Producer:
Just a special client.
Check the new data change event on the Relay.
Save data change events to the Mysql database, which is used to guide the program and trace the data for the client.
Databus Bootstrap Server:
listens to requests from Databus clients and returns a long retrospective data change event for guidance and traceability.