Search

Connectors

Contact support

Helpdesk portal

Reverse ETL

Click here to expand the table of contents

Introduction

Reverse ETL is the concept of writing data back into business applications, using the data warehouse as a source. This approach allows to use curated clean data from the data warehouse to keep data in business applications in sync. For example you could sync customers and invoices from your ERP to your accounting software (using the data warehouse as in-between).

Data flow overview

The entire data flow consists of 2 phases:

  1. ELT pipeline: Data is synced from the source into the Peliqan data warehouse
  2. Reverse ETL: Data from the data warehouse is written to a business application
image

Phase 1 is done by adding a connection in Peliqan to the source system. This will automatically set up an ELT pipeline and sync the data into the data warehouse.

Phase 2 is done by adding a connection in Peliqan to the target system and using the Peliqan Reverse ETL app. Install the app in your Peliqan account, configure the Reverse ETL flow and set a schedule on the app for ongoing syncs. All the steps of phase 2 are described below.

image

You can also implement Custom Data Syncs (Custom Reverse ETL flows) in Peliqan using low-code Python scripts, based on one of the provided templates. For more information visit:

Custom data syncs

Prerequisites

Before usign the Peliqan Reverse ETL app, make sure you have a good understanding of following basic concepts in the Peliqan data platform:

  • ELT pipelines: sync data into the data warehouse
  • Data explorer: explore source data
  • Data apps: low-code Python scripts in Peliqan

Before installing the Reverse ETL app, complete following steps:

  1. Add a connection to the source: this will sync data into the data warehouse, which will be the source data for the Reverse ETL app.
  2. Add a connection to the target: this connection will be mainly used for making API calls to the target (inserts & updates) as well as optionally doing QA checks on the target data.

Install the Reverse ETL app

In Peliqan, go to the Build section (rocket icon in left).

Find the Reverse ETL app and click on Install Template.

image

Bookmark the app to your Peliqan homepage.

image

Open the app on your Peliqan homepage.

Add syncs

In the app, in the left pane, add a sync for each object type, for example a sync for “Companies” and a sync for “Contacts”.

image

You can change the ordering of Syncs under “Manage sync”. Make sure to sync parent records first, before syncing child records. For example Contacts are linked to a parent Company, so Companies need to sync first.

General tab

Sync mode

For each sync, choose the sync mode, e.g. “Insert & Update records”.

Incremental sync

Enable incremental sync, if the source data has a “timestamp last update” field. Incremental sync means that on each scheduled run, only new & updated records from the source are processed, as opposed to processing all records from the source on each run.

Peliqan pipelines always add a timestamp _sdc_batched_at so this is the default timestamp source column to use.

If you use a source query (scope query), make sure to include this column in your query so that it is available for the Reverse ETL app to read.

image

Run pipeline first

Use this to make sure the source data (data warehouse tables) is up to date before running the Reverse ETL logic.

Write back of id

Use this feature to write the new id (after insert in the target) back into a field in the source. Usually a custom field is used for this. Example: sync Contacts from Hubspot to Odoo. After insert of a new Contact in Odoo, the Odoo id is written to a custom field odoo_id in Hubspot.

Note that this is optional, the id’s of source and target are also stored in Link tables, see below.

image

Error handling

Choose whether or not the Reverse ETL run should stop if an error is encountered while writing to the target.

Enable “Alert emails” on the Reverse ETL script (under Build) to receive emails when an error occurs in a scheduled run. Note that this option is only available once a schedule is active.

2-way sync

Use this when syncing in 2 directions. In this case one sync is configured for each direction and each sync is linked to its “equivalent” sync in the other direction. This is to avoid unnessary updates and eternal loops. Note that the “hash” feature (see below) also avoids eternal loops.

Example syncing Companies and Contacts between Hubspot and Odoo in both directions. Syncs to configure:

  1. Companies - Hubspot to Odoo
  2. Contacts - Hubspot to Odoo
  3. Companies - Odoo to Hubspot
  4. Contacts - Odoo to Hubspot

In the above example sync 1 and 3 are linked under the 2-ways sync configuration, as well as sync 2 and 4.

Source tab

Select a source table. See below on how to write a “source query” in order to transform data and map it to the target model. When you have created a source query, make sure to select this as the source table.

image

Target tab

Select a target. You have to configure a Target API Object, this will be used to perform writeback to the target (using Peliqan’s writeback functions in the selected Connector).

image

Generic Target Object

If you cannot find the Target Object that you need, you can select “Object”, if it is available in the target connector that you selected. Example:

image

This is a generic writeback, that can be used for multiple object types. When doing so, you will have to set the object name in the Field Mapping tab. To do so, click on the Pencil icon on the left, and enter the name of the object manually, for example “organizations” in the screenshot below (in order to insert/update organizations in the target Pipedrive):

image

Field mapping tab

Map fields from the source to the target.

Do not use “id” fields in the target (column on the right). Those fields are often FK relations in the target, use the “Relations (FK)” tab instead.

image

Static source values

Click on the edit icon (pencil) in the left column to enter a static value.

For example enter a static value 555 that needs to be used for all records, and whic is mapped to a target field division_id (assuming all synced records in the target need to have division_id=555).

Custom fields

Click on the edit icon (pencil) in the right column to enter a target column name. Use this e.g. to write into a custom field in the target.

Relations tab (FKs, foreign keys, parent relations)

If you have an FK (foreign key) in the source, which is a relation to a parent object, you probably need to map that to another FK field in the target. You cannot map FK fields directly because the parent object will have another id in the source and in the target.

Here is an example where first sync Companies, then we sync Contacts. Each Contact (child) is linked to a Company (Parent):

image

In the above example, you can see that:

  • Source Company id=100 is synced to target Partner id=789.
  • Source Contact id=500 is synced to target Contact id=321.
  • You cannot map source company_id directly to target partner_id.

Instead the source FK (500) must be translated to the target FK (789) when the Contact is synced. This is depicted as “Step 3” in the drawing above, in green.

The Reverse ETL app will perform this type of lookup when you configure the FK’s for source and target under “Relations”:

image

The lookup to find the FK for the tarfget, is done in the link table of the parent object (green line in the above drawing). See below for more info on link tables.

Data tab

Test the Reverse ETL sync

You can test the sync under “Data” by selecting one source record and performing a manual sync. If there are errors, adjust the field mapping and test again.

image

View the status of a sync

View the status of each sync under “Status”. This shows the link table which contains all source ids that have been synced, the timetamps and errors if any.

image

Custom code tab

Examples to use custom code:

  • Write nested JSON to the target API
  • Conditional updates
  • Etc.

Write nested JSON to target API

The Reverse ETL app performs API calls to the target for inserts and updates. The JSON payload is a flat objects with keys and values, as defined in the field mapping. Example insert:

POST https://mycrm.com/api/contacts
{
  "name": "John Doe",
  "email": "john@doe.com",
  "phone": "+1 555 6666"
}

Custom Python code can be used to write a nested JSON structure. Example custom code:

def transform(source_record, target_record, pq, target_id=None):
    
    # Add nested data
    target_record["address"] = {
        "street": source_record["address_street"]
        "city":   source_record["address_city"]
    }

    # For updates
    if target_id:
        final_payload['id'] = target_id

    return target_record

Example of final insert:

POST https://mycrm.com/api/contacts
{
  "name": "John Doe",
  "email": "john@doe.com",
  "phone": "+1 555 6666",
  "address": {
     "street": "1 Somestreet",
     "city": "NY"
  }
}

Note: sending nested JSON can also be done with a JSON column in a source query (without using custom code).

Conditional updates

Example custom code for conditional updates:

def transform(source_record, target_record, pq, target_id=None):
	  final_payload = target_record
    
    if target_id: # meaning an update (not an insert)
        if source_record["status"] == "deleted":
            return None # returning None makes sure the record is skipped
        else:
            final_payload["id"] = target_id
    return final_payload

Link tables

The Reverse ETL app creates a link table under schema link_tables for each sync. The link table keeps track of source ids and target ids. Example of a link table:

source_id
target_id
status
First sync
Last sync
100
567
Success
200
890
Success

Possible values for “status”:

  • Success
  • Error
  • Skipped - “Same hash”: a hash is calculated before every update in the target, based on the input fields. If the hash is identical to the previous update, the update is skipped. This is to avoid unnecessary updates, especially when the source timestamp “last update” is increased but not because of fields that are in scope for the sync.

Seeding the link tables

The Reverse ETL app performs an insert in the target, for each source record, and subsequently it performs updates for these source ids. However, the Reverse ETL app does not “match” source records with pre-existing target records. For example, if you sync companies from a source to a target, the Reverse ETL app will ignore Companies that are already present in the target (not insterted by the Reverse ETL app). In order to take these into account, you can add those target records in the link table first. You could e.g. match source & target companies on VAT number and insert matching rows in the link table.

Example query:

WITH matching_companies AS (
  SELECT 
     source.id AS source_id,
     target.id AS target_id
  FROM hubspot.companies AS source
  INNER JOIN odoo.partners AS target
     ON source.vat = target.vat
)
INSERT INTO link_tables.companies_1234 (source_id, target_id, status)
SELECT source_id, target_id, 'matched' AS status
FROM matching_companies;

Settings

Sync order

Make sure to put parent object types above their children, so they sync first. For example sync Customers first, then Invoices that are linked to Customers.

Schedule

Once the syncs are working fine, you can add a daily schedule to the Reverse ETL app.

The Reverse ETL app will run in the background and process source records for each sync. If the sync is configured as incremental, the source data will be processed incrementally, based on the timestamp last update that is configured.

Set a custom schedule

You can also set a custom schedule (e.g. every 6 hours). Go to the Build section in Peliqan (rocket icon in left pane), open the app and set a schedule. Make sure to use incremental syncs and configure a daily or hourly sync.

image

Import/export

Use this to backup your entire Reverse ETL configuration as JSON and to move configurations between Peliqan accounts.

Writing a source query (source view)

You can write a source query (view) to prepare data for mapping to the target. Write the query first, and next use it as the source table in the Reverse ETL app.

Examples of transformations that are typically handled in a source query:

  • Transform date formats from the source format to the target format.
  • Casting of data types from the source to the target data type
  • Combining data from multiple source tables into a single source query
  • Mapping of values from source to target
  • Adding a columns with a static value for fields that need to have a static value in the target

Below is an example where data from a source table contacts is mapped to a target table people.

Example source table contacts:

id    firstname       lastname      country_code     _sdc_batched_at
1     Bob             Davis         FR               20205-01-01 00:00:00
2     Anne            Crome         UK               20205-01-01 00:00:00
3     ...

Example target table people (id not shown, will be set automatically on insert):

fullname        country          category
Bob Davis       France           VIP
Anne Crome      United Kingdom   VIP

Example query sync_contacts_source to use as source table in the Reverse ETL app:

Nested data in a source query

Nested data can be created in a source query and mapped to a JSON field in the target. An example is invoice lines for invoices.

Notes: You can often also sync child objects (e.g. invoice lines) as a separate Object Type. Sync the parents first (e.g. invoices), next sync the children (e.g. invoice lines) with an FK relation (e.g. invoice_id). You can also handle nested JSON with custom code.

Below is an example source query for sales invoices from Odoo, which creates a JSON array with invoice line objects in one column. This column can be mapped to a field “invoice lines” in the target (e.g. Exact Online). Make sure to use the correct keys in the invoice line objects, as needed by the target.

Example result (some columns omitted):

id
date
invoice_lines_json
1
2025-01-01
[ { “Quantity”: 4, “Name”: “Shoes” }, { “Quantity”: 1, “Name”: “Socks” } ]
2
2025-01-02
[ { “Quantity”: 3, “Name”: “Shoes” } ]

Make sure the column is of type “json” (jsonb) in the source query and not a text field (varchar). Example where a string representation of JSON is casted to a json column:

SELECT '{"name": "An", "id": 12}'::jsonb FROM my_table

Mapping “enum” / select fields

Enum fields (or “Single select fields”) are fields that have a value from a preselected list. For example a field “size” with possible values (options) “Small”, “Medium”, “Large”.

The source options often have to be mapped to different options in the target. Example:

Source option
Target option
Extra small
—>
S
Small
—>
S
Medium
—>
M
Large
—>
L

This can be accomplished with a scope query (used as the source).

Static mapping

New options must be added manually.

Example scope query:

SELECT
  c.id,
  c.name,
  c.email,
  c.company,
  CASE
    WHEN c.customfield = 'Option A internal name'
    THEN 'Option A value'
    WHEN c.customfield = 'Option B internal name'
    THEN 'Option B value'
    WHEN c.customfield = 'Option C internal name'
    THEN 'Option C value'
    ELSE NULL
  END AS target_option
FROM hubspot.contacts AS c

Dynamic mapping

New options are automatically mapped.

Below example assumes the options are stored in a separate table and that each option has both a “label” and a “value” and the labels on source & target match.

Source table field_options:

Label
Value
Section A
100
Section B
200

Target table field_options:

Label
Value
Section A
333
Section B
444

Example scope query:

QA checks

You can compare source and target data to check if all data was synced as expected. Make sure to run the pipeline of the target connection before performing a QA compare.

Write a JOIN query that compares source & target data, using the link table as in-between:

source table —> link table —> target table

Example show source records with missing target record:

SELECT
   source_table.id 
FROM hubspot.contacts AS source_table
LEFT JOIN link_tables.contacts_1234 AS link_table ON
    source_table.id = link_table.source_id
LEFT JOIN oodo.partners AS target_table ON
    link_table.target_id = target_table.id
WHERE 
    target_table.id IS NULL

Project scoping

Make sure to plan your Reverse ETL app projects properly. Example scoping:

Task
Description
Number of hours
Project management
Planning, documentation, status calls etc.
5 - 20 hours
Cleaning of source data
Make sure required fields are always filled in etc.
Depends on data quality
Write requirements
Objects to include, fields to map per object, required business logic etc.
5- 20 hours
Write scope queries
Define input data per sync
5 - 40 hours
Configure Reverse ETL app
Field mapping etc.
5 - 20 hours
Testing
Run tests in Reverse ETL app for individual records
5 - 20 hours
Monitoring
Check for errors in scheduled runs
Ongoing
Bug fixing
Fix edge cases etc.
5 - 40 hours
TOTAL
5 - 20 days

FAQ & Troubleshooting

My record was not synced. why ?

  • Is it included in your source table ?
  • If you use a source query, is the record included in this query ?
  • Is it skipped because of incremental sync (e.g. bookmark in Reverse ETL is higher than _sdc_batched_at in source table.
  • Is the record in the link table (source_id) ? If yes, is it in error status ?

My record was synced once, but not updated in the target. Why ?

  • Did you change the scope of the source query ? Is the record still included in the scope of the source query ?
  • Is it skipped because of incremental sync (e.g. bookmark in Reverse ETL is higher than _sdc_batched_at in source table.