'How to manually checkpoint a delta table using PySpark?

I have a delta table, and am trying to append data to it and then checkpoint that table. By default I believe it checkpoints every 10 commits, but I would like to override this behaviour and checkpoint manually.

Currenly my code looks like

df = get_some_source_data()
df.write.format("delta").mode("append").saveAsTable(f"{db_name}.{table_name}")

I would like to add a line either as part of, or else after the write operation to create a new delta-table checkpoint in the _delta_log.



Solution 1:[1]

Change checkpointInterval table property to 1 (before and after saveAsTable).

Otherwise, you'd have to write some code that uses Delta Lake's internal API to trigger checkpointing of the table. I have never done it before though so no idea how viable it is (if at all).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jacek Laskowski