This is a beginner’s guide to checkpoints in Apache Flink® and provides all the necessary information about how to use Flink’s checkpointing mechanism for distributed, stream processing applications.
Every stream processing application, whether this is a streaming data pipeline or a streaming SQL application, can be stateful; meaning that it involves some sort of state.
To persist state in an easy-to-manage way and recover from a failure, Apache Flink implements a mechanism that allows reprocessing only the events from a specific point in time (previously-stored state) instead of replaying the entire history of the application.
Readers of this guide will learn:
– Why checkpoints are necessary for event streaming applications
– How checkpointing in Apache Flink® works
– How to configure checkpoints in Apache Flink® by choosing an application’s state backend and checkpoint storage
– What are the differences between the available state backend options in Apache Flink®
– How to set up checkpoint intervals in Flink