We can use checkpointing while writing write streams. From the offset, the stream will be resumed where it left off.
Ex:
query = streaming_df.writeStream.format(“parquet”)\
.option(“checkpointLocation”, checkpoint_location) \
.foreachBatch(writeData)\
.trigger(once=True)\
.start()\
.awaitTermination()
“sources” : [ {
“description” : “com.mongodb.spark.sql.connector.read.MongoMicroBatchStream@6793d7d6”,
“startOffset” : {
“version” : 1,
“offset” : {
“$timestamp” : {
“t” : 1713179033,
“i” : 0
}
}
},
“endOffset” : {
“version” : 1,
“offset” : {
“$timestamp” : {
“t” : 1713179298,
“i” : 0
}
}
},
“latestOffset” : {
“version” : 1,
“offset” : {
“$timestamp” : {
“t” : 1713179298,
“i” : 0
}
}
},
Please refer to startOffset - which is the the offset from which stream starts.