Introduction
Reverse ETL is the concept of writing data back into business applications, using the data warehouse as a source. This approach allows to use curated clean data from the data warehouse to keep data in business applications in sync. For example you could sync customers and invoices from your ERP to your accounting software (using the data warehouse as in-between).
Data flow overview
The entire data flow consists of 2 phases:
- ELT pipeline: Data is synced from the source into the Peliqan data warehouse
- Reverse ETL: Data from the data warehouse is written to a business application
Phase 1 is done by adding a connection in Peliqan to the source system. This will automatically set up an ELT pipeline and sync the data into the data warehouse.
Phase 2 is done by adding a connection in Peliqan to the target system and using the Peliqan Reverse ETL app. Install the app in your Peliqan account, configure the Reverse ETL flow and set a schedule on the app for ongoing syncs. All the steps of phase 2 are described below.
You can also implement Custom Data Syncs (Custom Reverse ETL flows) in Peliqan using low-code Python scripts, based on one of the provided templates. For more information visit:
Custom data syncsPrerequisites
Before usign the Peliqan Reverse ETL app, make sure you have a good understanding of following basic concepts in the Peliqan data platform:
- ELT pipelines: sync data into the data warehouse
- Data explorer: explore source data
- Data apps: low-code Python scripts in Peliqan
Before installing the Reverse ETL app, complete following steps:
- Add a connection to the source: this will sync data into the data warehouse, which will be the source data for the Reverse ETL app.
- Add a connection to the target: this connection will be mainly used for making API calls to the target (inserts & updates) as well as optionally doing QA checks on the target data.
Install the Reverse ETL app
In Peliqan, go to the Build section (rocket icon in left).
Find the Reverse ETL app and click on Install Template.
Bookmark the app to your Peliqan homepage.
Open the app on your Peliqan homepage.
Add syncs
In the app, in the left pane, add a sync for each object type, for example a sync for “Companies” and a sync for “Contacts”.
You can change the ordering of Syncs under “Manage sync”. Make sure to sync parent records first, before syncing child records. For example Contacts are linked to a parent Company, so Companies need to sync first.
General tab
Sync mode
For each sync, choose the sync mode, e.g. “Insert & Update records”.
Incremental sync
Enable incremental sync, if the source data has a “timestamp last update” field. Incremental sync means that on each scheduled run, only new & updated records from the source are processed, as opposed to processing all records from the source on each run.
Peliqan pipelines always add a timestamp _sdc_batched_at so this is the default timestamp source column to use.
If you use a source query (scope query), make sure to include this column in your query so that it is available for the Reverse ETL app to read.
Run pipeline first
Use this to make sure the source data (data warehouse tables) is up to date before running the Reverse ETL logic.
Write back of id
Use this feature to write the new id (after insert in the target) back into a field in the source. Usually a custom field is used for this. Example: sync Contacts from Hubspot to Odoo. After insert of a new Contact in Odoo, the Odoo id is written to a custom field odoo_id in Hubspot.
Note that this is optional, the id’s of source and target are also stored in Link tables, see below.
Error handling
Choose whether or not the Reverse ETL run should stop if an error is encountered while writing to the target.
Enable “Alert emails” on the Reverse ETL script (under Build) to receive emails when an error occurs in a scheduled run. Note that this option is only available once a schedule is active.
2-way sync
Use this when syncing in 2 directions. In this case one sync is configured for each direction and each sync is linked to its “equivalent” sync in the other direction. This is to avoid unnessary updates and eternal loops. Note that the “hash” feature (see below) also avoids eternal loops.
Example syncing Companies and Contacts between Hubspot and Odoo in both directions. Syncs to configure:
- Companies - Hubspot to Odoo
- Contacts - Hubspot to Odoo
- Companies - Odoo to Hubspot
- Contacts - Odoo to Hubspot
In the above example sync 1 and 3 are linked under the 2-ways sync configuration, as well as sync 2 and 4.
Source tab
Select a source table. See below on how to write a “source query” in order to transform data and map it to the target model. When you have created a source query, make sure to select this as the source table.
Target tab
Select a target. You have to configure a Target API Object, this will be used to perform writeback to the target (using Peliqan’s writeback functions in the selected Connector).
Generic Target Object
If you cannot find the Target Object that you need, you can select “Object”, if it is available in the target connector that you selected. Example:
This is a generic writeback, that can be used for multiple object types. When doing so, you will have to set the object name in the Field Mapping tab. To do so, click on the Pencil icon on the left, and enter the name of the object manually, for example “organizations” in the screenshot below (in order to insert/update organizations in the target Pipedrive):
Field mapping tab
Map fields from the source to the target.
Do not use “id” fields in the target (column on the right). Those fields are often FK relations in the target, use the “Relations (FK)” tab instead.
Static source values
Click on the edit icon (pencil) in the left column to enter a static value.
For example enter a static value 555 that needs to be used for all records, and whic is mapped to a target field division_id (assuming all synced records in the target need to have division_id=555).
Custom fields
Click on the edit icon (pencil) in the right column to enter a target column name. Use this e.g. to write into a custom field in the target.
Relations tab (FKs, foreign keys, parent relations)
If you have an FK (foreign key) in the source, which is a relation to a parent object, you probably need to map that to another FK field in the target. You cannot map FK fields directly because the parent object will have another id in the source and in the target.
Here is an example where first sync Companies, then we sync Contacts. Each Contact (child) is linked to a Company (Parent):
In the above example, you can see that:
- Source Company id=100 is synced to target Partner id=789.
- Source Contact id=500 is synced to target Contact id=321.
- You cannot map source company_id directly to target partner_id.
Instead the source FK (500) must be translated to the target FK (789) when the Contact is synced. This is depicted as “Step 3” in the drawing above, in green.
The Reverse ETL app will perform this type of lookup when you configure the FK’s for source and target under “Relations”:
The lookup to find the FK for the tarfget, is done in the link table of the parent object (green line in the above drawing). See below for more info on link tables.
Data tab
Test the Reverse ETL sync
You can test the sync under “Data” by selecting one source record and performing a manual sync. If there are errors, adjust the field mapping and test again.
View the status of a sync
View the status of each sync under “Status”. This shows the link table which contains all source ids that have been synced, the timetamps and errors if any.
Custom code tab
Examples to use custom code:
- Write nested JSON to the target API
- Conditional updates
- Etc.
Write nested JSON to target API
The Reverse ETL app performs API calls to the target for inserts and updates. The JSON payload is a flat objects with keys and values, as defined in the field mapping. Example insert:
POST https://mycrm.com/api/contacts
{
"name": "John Doe",
"email": "john@doe.com",
"phone": "+1 555 6666"
}Custom Python code can be used to write a nested JSON structure. Example custom code:
def transform(source_record, target_record, pq, target_id=None):
# Add nested data
target_record["address"] = {
"street": source_record["address_street"]
"city": source_record["address_city"]
}
# For updates
if target_id:
final_payload['id'] = target_id
return target_recordExample of final insert:
POST https://mycrm.com/api/contacts
{
"name": "John Doe",
"email": "john@doe.com",
"phone": "+1 555 6666",
"address": {
"street": "1 Somestreet",
"city": "NY"
}
}Note: sending nested JSON can also be done with a JSON column in a source query (without using custom code).
Conditional updates
Example custom code for conditional updates:
def transform(source_record, target_record, pq, target_id=None):
final_payload = target_record
if target_id: # meaning an update (not an insert)
if source_record["status"] == "deleted":
return None # returning None makes sure the record is skipped
else:
final_payload["id"] = target_id
return final_payloadLink tables
The Reverse ETL app creates a link table under schema link_tables for each sync. The link table keeps track of source ids and target ids. Example of a link table:
source_id | target_id | status | First sync | Last sync |
100 | 567 | Success | … | … |
200 | 890 | Success | … | … |
Possible values for “status”:
- Success
- Error
- Skipped - “Same hash”: a hash is calculated before every update in the target, based on the input fields. If the hash is identical to the previous update, the update is skipped. This is to avoid unnecessary updates, especially when the source timestamp “last update” is increased but not because of fields that are in scope for the sync.
Seeding the link tables
The Reverse ETL app performs an insert in the target, for each source record, and subsequently it performs updates for these source ids. However, the Reverse ETL app does not “match” source records with pre-existing target records. For example, if you sync companies from a source to a target, the Reverse ETL app will ignore Companies that are already present in the target (not insterted by the Reverse ETL app). In order to take these into account, you can add those target records in the link table first. You could e.g. match source & target companies on VAT number and insert matching rows in the link table.
Example query:
WITH matching_companies AS (
SELECT
source.id AS source_id,
target.id AS target_id
FROM hubspot.companies AS source
INNER JOIN odoo.partners AS target
ON source.vat = target.vat
)
INSERT INTO link_tables.companies_1234 (source_id, target_id, status)
SELECT source_id, target_id, 'matched' AS status
FROM matching_companies;Settings
Sync order
Make sure to put parent object types above their children, so they sync first. For example sync Customers first, then Invoices that are linked to Customers.
Schedule
Once the syncs are working fine, you can add a daily schedule to the Reverse ETL app.
The Reverse ETL app will run in the background and process source records for each sync. If the sync is configured as incremental, the source data will be processed incrementally, based on the timestamp last update that is configured.
Set a custom schedule
You can also set a custom schedule (e.g. every 6 hours). Go to the Build section in Peliqan (rocket icon in left pane), open the app and set a schedule. Make sure to use incremental syncs and configure a daily or hourly sync.
Import/export
Use this to backup your entire Reverse ETL configuration as JSON and to move configurations between Peliqan accounts.
Writing a source query (source view)
You can write a source query (view) to prepare data for mapping to the target. Write the query first, and next use it as the source table in the Reverse ETL app.
Examples of transformations that are typically handled in a source query:
- Transform date formats from the source format to the target format.
- Casting of data types from the source to the target data type
- Combining data from multiple source tables into a single source query
- Mapping of values from source to target
- Adding a columns with a static value for fields that need to have a static value in the target
Below is an example where data from a source table contacts is mapped to a target table people.
Example source table contacts:
id firstname lastname country_code _sdc_batched_at
1 Bob Davis FR 20205-01-01 00:00:00
2 Anne Crome UK 20205-01-01 00:00:00
3 ...Example target table people (id not shown, will be set automatically on insert):
fullname country category
Bob Davis France VIP
Anne Crome United Kingdom VIPExample query sync_contacts_source to use as source table in the Reverse ETL app:
Nested data in a source query
Nested data can be created in a source query and mapped to a JSON field in the target. An example is invoice lines for invoices.
Notes: You can often also sync child objects (e.g. invoice lines) as a separate Object Type. Sync the parents first (e.g. invoices), next sync the children (e.g. invoice lines) with an FK relation (e.g. invoice_id). You can also handle nested JSON with custom code.
Below is an example source query for sales invoices from Odoo, which creates a JSON array with invoice line objects in one column. This column can be mapped to a field “invoice lines” in the target (e.g. Exact Online). Make sure to use the correct keys in the invoice line objects, as needed by the target.
Example result (some columns omitted):
id | date | invoice_lines_json |
1 | 2025-01-01 | [ { “Quantity”: 4, “Name”: “Shoes” }, { “Quantity”: 1, “Name”: “Socks” } ] |
2 | 2025-01-02 | [ { “Quantity”: 3, “Name”: “Shoes” } ] |
Make sure the column is of type “json” (jsonb) in the source query and not a text field (varchar). Example where a string representation of JSON is casted to a json column:
SELECT '{"name": "An", "id": 12}'::jsonb FROM my_tableMapping “enum” / select fields
Enum fields (or “Single select fields”) are fields that have a value from a preselected list. For example a field “size” with possible values (options) “Small”, “Medium”, “Large”.
The source options often have to be mapped to different options in the target. Example:
Source option | Target option | |
Extra small | —> | S |
Small | —> | S |
Medium | —> | M |
Large | —> | L |
This can be accomplished with a scope query (used as the source).
Static mapping
New options must be added manually.
Example scope query:
SELECT
c.id,
c.name,
c.email,
c.company,
CASE
WHEN c.customfield = 'Option A internal name'
THEN 'Option A value'
WHEN c.customfield = 'Option B internal name'
THEN 'Option B value'
WHEN c.customfield = 'Option C internal name'
THEN 'Option C value'
ELSE NULL
END AS target_option
FROM hubspot.contacts AS cDynamic mapping
New options are automatically mapped.
Below example assumes the options are stored in a separate table and that each option has both a “label” and a “value” and the labels on source & target match.
Source table field_options:
Label | Value |
Section A | 100 |
Section B | 200 |
Target table field_options:
Label | Value |
Section A | 333 |
Section B | 444 |
Example scope query:
QA checks
You can compare source and target data to check if all data was synced as expected. Make sure to run the pipeline of the target connection before performing a QA compare.
Write a JOIN query that compares source & target data, using the link table as in-between:
source table —> link table —> target table
Example show source records with missing target record:
SELECT
source_table.id
FROM hubspot.contacts AS source_table
LEFT JOIN link_tables.contacts_1234 AS link_table ON
source_table.id = link_table.source_id
LEFT JOIN oodo.partners AS target_table ON
link_table.target_id = target_table.id
WHERE
target_table.id IS NULLProject scoping
Make sure to plan your Reverse ETL app projects properly. Example scoping:
Task | Description | Number of hours |
Project management | Planning, documentation, status calls etc. | 5 - 20 hours |
Cleaning of source data | Make sure required fields are always filled in etc. | Depends on data quality |
Write requirements | Objects to include, fields to map per object, required business logic etc. | 5- 20 hours |
Write scope queries | Define input data per sync | 5 - 40 hours |
Configure Reverse ETL app | Field mapping etc. | 5 - 20 hours |
Testing | Run tests in Reverse ETL app for individual records | 5 - 20 hours |
Monitoring | Check for errors in scheduled runs | Ongoing |
Bug fixing | Fix edge cases etc. | 5 - 40 hours |
TOTAL | 5 - 20 days |
FAQ & Troubleshooting
My record was not synced. why ?
- Is it included in your source table ?
- If you use a source query, is the record included in this query ?
- Is it skipped because of incremental sync (e.g. bookmark in Reverse ETL is higher than _sdc_batched_at in source table.
- Is the record in the link table (source_id) ? If yes, is it in error status ?
My record was synced once, but not updated in the target. Why ?
- Did you change the scope of the source query ? Is the record still included in the scope of the source query ?
- Is it skipped because of incremental sync (e.g. bookmark in Reverse ETL is higher than _sdc_batched_at in source table.