Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

Version 1 Next »

Usually there are S3 access errors in the logs, meaning that the checkpoint snapshot is corrupted. We have fixed the root cause, but a “factory reset” is required:

  1. Delete ConfigMap gv-flink-cluster-config-map

  2. It’s a good idea to delete gv-flink-*-config-map as well

  3. Delete flink-taskmanager pod, it will recreate

  4. Wait 15-20 sec

  5. Delete flink-jobmanager pod, it will start successfully

In terminal:

kubectl scale --replicas=0 deployment/flink-jobmanager
kubectl scale --replicas=0 deployment/flink-taskmanager
kubectl get configmap -n default
kubectl delete configmap gv-flink-cluster-config-map -n default
kubectl scale --replicas=1 deployment/flink-jobmanager
kubectl scale --replicas=1 deployment/flink-taskmanager
  • No labels