Connectors

Contact support

Handling deleted rows

Handling deleted rows

A regular ETL pipeline cannot detect if records were deleted in the source. This means that when a record is deleted in a source, it will remain in the DWH. Depending on how the API of the source works, there are various ways to handle rows that were deleted in the source.

Soft deletes

Some APIs have a field “is deleted” or “is archived” or similar, which is set to True when a record is deleted. In this case you will find a column with the same name in the DWH. Filtering on this column allows you to exclude deleted rows.

API provides a list of deleted records

In rare occasions, the API will have an endpoint to retrieve a list of deleted records. If this is the case, it’s possible to delete records in the DWH based on the list from the API.

The DWH will have a table of deleted row IDs. Use a scheduled Python script to delete rows based on this table.

For example the Exact Online connector uses this method.

Webhooks

Some platforms can send out Webhooks when specific events occur, such as deleting a record. In that case, you can set up Webhooks and send them to Peliqan. Peliqan will receive webhooks and store them in a table in the DWH. Use a Python script to delete records in the DWH based on incoming webhooks. More info:

Incoming webhooks

Script to delete rows

The following low-code Python script will delete rows in a table in the DWH, when the id is no longer in existence in the source SaaS application.

Click here to expand the Python script

Full Resync

When you perform a Full Resync on a connection in Peliqan, the pipelines will be reset. All tables in the DWH will be made empty and all data will be re-synced. Doing a Full Resync can be used as a way to make sure that deleted rows are no longer in the DWH.

It’s also possible to schedule e.g. a weekly Full Resync using a Python script.

Finally, you can contact Peliqan Support, to put in place a custom connector that will automatically do a Full Sync for one or more given tables on each run (as opposed to using a regular incremental sync).

Note: this means the sync will take much longer to run, so the schedule will have to reduced.