Whenever we remove a pod from Kubernetes, what does it do to prevent outside traffic from entering the dying pod? How does the pod internally sense that it is about to be removed and perform a graceful shutdown? And what is the sequence and relationship between these actions and the hook?
In this post, I will first introduce the whole process of pod deletion, and then make hands dirty to verify three scenarios:
- The timing of
postStart
andpreStop
execution in the main container of a pod. - How
terminationGracePeriodSeconds
affectpreStop
and graceful shutdown? - Whether the API Server can be requested during a Pod graceful shutdown?
Pod Deletion Process
As shown above, when you type kubectl delete pod
, the pod record in ETCD will be updated by the API Server, for example, add deletionTimestamp
and deletionGracePeriodSeconds
. According to the updated ETCD record, the pod is displayed as Terminating status. Next, the pod will carry out two processes in parallel.
- First, the endpoint controller watched the pod is marked as Terminating. Then it will remove the endpoint of the pod from the associated service to prevent external traffic from entering the pod through the service again. At the latest, the endpoint starts getting removed from Kube-proxy, Iptables, Ingress, CoreDNS and all these things hold the endpoint information.
- In the meanwhile,
kubelet
is notified of the pod being updated (Terminating). If thepreStop
exists, the hook is executed, if not, thekubelet
immediately sends a SIGTERM signal to the main container. Then after waiting for a graceful shutdown period, which is determined by the terminationGracePeriodSeconds with default 30 seconds, the container is forcibly stopped. And finally, the API Server removes the pod from ETCD completely.
Since the endpoint controller flow and the pod shutdown flow are happening independently. Before removing Pod IP from kube-proxy, iptables and others, it may still be in use. At the same time, the main container received the SIGKILL and stopped. Then it will not be able to fulfill these ongoing requests. The solution to the issue is to extend the graceful shutdown period, like kubectl delete pod name — grace-period=100. It adds a bit more gap between the endpoints removed from all the consumers and the Pod deleted.
Scenario 1: The Timing of postStart
and preStop
Execution
Step1. Create the main container with files main.go
, postStart.sh
and preStop.sh
postStart=5s
preStop=5s
termiationGracePeriodSeconds=10s
gracefulShutdown=2s
func min(}i( signalchan t= make(chon os.Signol, 1) Signal lotify (signalChan, syscall: SIGH, syscall SIGUIT, syscall. SIGTEM, syscall. SIGINT, syscall. SIGSEGY, syscall. scam) fat.Printr("s [ 35 ] main container start running \a*, tise.Nou().Foraat("2006-91-02 15:04:05°)) ticker := tine Nevicker(tine. Second) cout i= Toop: for setece ¢ case sig = <-signachan: fAUPAAREI("s [%s | receive signals As = Ad \1", tise.No).Foraat ("2606-01-02 15:04:05"), sig.Strina(), sia) cout = break toon case <-ticker.: fat.printe(* (3a seconds) a, cout) » > fat.Printr("s [ As ] graceful shutdown \W", tine.Nou() Fornat("2986-01-02 15:64:05") tine.Steep(2 = tine.econd) fat.Printr("s [ 35 ) sain container finished \n", tine. Now()-Farsat "2005- 3 102 15:08:05%))
main.go
#l{bin/bash set =e pipefail echo ness (date "48Y-ta-4d HAAS") echo = echo "start tine: Stine" > fusc/share/prestop while ([ second ne 0 11; do steep 1 tine=s (date "eY-An-Ad MAS") echo [ stine | prestop is processing. (second) done tiness (date "4Y-ta-td HNIAS") eco = cho “end tine: Stine’ >» /usr/sharel/prestap prestop end: [ stine | =
preStop.sh
#1/bin/bash. set pipefail ness (date "4SY-tm-td SH: 14S echo poststart start: [ stine | echo "start tine: Stine" > /usr/share/poststart secondss while [1 $second =ne 0 11; do steep 1 time=s(date "+¥Y-va-%d MHEMHIRS") echo " [ Stine | poststart is processing ((second—)) done. tiness (date "4Y-ta-td AH:AS") echo echo “end tin poststart end: [ stine | + Stine" >> /usr/share/poststart
postStart.sh
Step2. Wrap all the files with Dockerfile
# Stage 1: build the target binaries FROY golang:1.18 AS builder WORKDIR /workspace COPY go.mod go.sun ./ COPY /Lifecycle/ ./Lifecycle/ RUN go build 0 bin/Lifecycle ./lifecycle/main.go # Stage 2: Copy the binaries from the inage builder to the base inage FROW redhat/ubi8-nininal: Latest COPY —from=builder /workspace/bin/Lifecycle /bin/lifecycle COPY ./Lifecycle/hooks/poststart.sh /bin/poststart.sh COPY ./Uifecycle/hooks/prestop.sh /bin/prestop.sh ENTRYPOINT ["/bin/lifecycle"]
Dockerfile
Step3. Deploy Pod with the main container and hooks Dockerfile
apiversion: vi Kinds Pod metadata: name: “lifecycle-deno® spec: terainationGracePeriodseconds: 10 containers: = name: Uifecycle-deso-container image: quay. io/nyan/ Lifecycle: Latest Ufecycle: poststart: , "/bin/poststart.sh > /proc/1/d/1"] command: ['sh", "~c", "/bin/prestop.sh > /proc/1/fd/1"] VoluneHounts: ~ nase: hooks mountPath /usr/share/. Volumes: ~ nan: hooks hostpath: paths fusr/hooks/
sample.yaml
As shown above, we deploy the Pod and delete it after a few seconds. I captured the logging process and plotted the logging flow. where the # indicates that the main container is running.
# [ 2022-08-31 01:28:03 ] main container start running ===================== poststart start: [2022-08-31 01:28:03 ] ===================== #(1) [2022-08-31 01:28:04] poststart is processing... #(2) [2022-08-31 01:28:05 ] poststart is processing... #(3) [2022-08-31 01:28:06 ] poststart is processing... #(4) [2022-08-31 01:28:07 ] poststart is processing... #(5) [2022-08-31 01:28:08 ] poststart is processing... ===================== poststart end: [2022-08-31 01:28:08 ] ===================== #(6) #(7) #(8) #(9) #(10) #(11) #(12) #(13) #(14) #(15) #(16) #(17) #(18) #(19) #(20) #(21) ===================== prestop start: [2022-08-31 01:28:24 ] ===================== #(22) [2022-08-31 01:28:25] prestop is processing... #(23) [2022-08-31 01:28:26 ] prestop is processing... #(24) [2022-08-31 01:28:27 ] prestop is processing... #(25) [ 2022-08-31 01:28:28 ] prestop is processing... #(26) [2022-08-31 01:28:29 ] prestop is processing.. ===================== prestop end: [2022-08-31 01:28:29 ] ===================== # [2022-08-31 01:28:29 ] receive signal: terminated => 15 # [2022-08-31 01:28:29 ] graceful shutdown # [2022-08-31 01:28:31] main container finished
Pod.log
Scenario 1 flow chart
We can see that postStart
and the main container are running at the same time. After the preStop
comes to an end, the Pod receives a SIGTERM signal. Then the GracefulShutdown starts and when it’s done, The container process ends as expected
It is worth noting that the main container’s GraceShutdown(2s) is less than terminationGracePeriodSeconds(10s), and the main container is shut down gracefully.
Scenario 2: terminationGracePeriodSeconds
& preStop
& graceful shutdown
Step 1: Extend the preStop to 15 seconds, then the parameters are as follows.
postStart=5s
preStop=15s
termiationGracePeriodSeconds=10s
gracefulShutdown=2s
#!/bin/bash set -e pipefail echo "" time=$(date "+%Y-%m-%d %H:%M:%S") echo "========================== prestop start: [ $time ] ==========================" echo "start time: $time" > /usr/share/prestop second=15 while [[ $second -ne 0 1]; do sleep 1 time=$(date "+%Y-%m-%d %H:M:%S") echo" [ $time ] prestop is processing..." ((second--)) done time=$(date "+%Y-%m-%d %H:%M:%S") echo "========================== prestop end: [ $time ] ==========================" echo "end time: $time" >> /usr/share/prestop
preStop.sh
# [ 2022-08-31 03:28:26 ] main container start running ========================== poststart start: [ 2022-08-31 03:28:27 ] ========================== #(1) [ 2022-08-31 03:28:28 ] poststart is processing... #(2) [ 2022-08-31 03:28:29 ] poststart is processing... #(3) [ 2022-08-31 03:28:30 ] poststart is processing.. #(4) [ 2022-08-31 03:28:31 ] poststart is processing... #(5) [ 2022-08-31 03:28:32 ] poststart is processing... ========================== poststart end: [ 2022-08-31 03:28:32 ] ============================ #(6) #(7) #(8) #(9) #(10) #(11) #(12) #(13) #(14) #(15) #(16) #(17) #(18) #(19) #(20) #(21) #(22) ========================== prestop start: [ 2022-08-31 03:28:49 ] ============================ #(23) [ 2022-08-31 03:28:50 ] prestop is processing... #(24) [ 2022-08-31 03:28:51 ] prestop is processing... #(25) [ 2022-08-31 03:28:52 ] prestop is processing... #(26) [ 2022-08-31 03:28:53 ] prestop is processing.. #(27) [ 2022-08-31 03:28:54 ] prestop is processing... #(28) [ 2022-08-31 03:28:55 ] prestop is processing... #(29) [ 2022-08-31 03:28:56 ] prestop is processing... #(30) [ 2022-08-31 03:28:57 ] prestop is processing.. #(31) [ 2022-08-31 03:28:58 ] prestop is processing... #(32) # [ 2022-08-31 03:28:59 ] receive signal: terminated => 15 # [ 2022-08-31 03:28:59 ] graceful shutdown [ 2022-08-31 03:28:59 ] prestop is processing... [ 2022-08-31 03:29:00 ] prestop is processing... # [ 2022-08-31 03:29:01 ] main container finished
pod.log
We can find that terminationGracePeriodSeconds
is the duration between the start of Preston
and the receiving of SIGTERM. After that, the preStop
continued to work until the main container shut down.
Step2: Set the gracefulShutdown
more than termiationGracePeriodSeconds
# [ 2022-08-31 10:40:37 ] main container start running ========================== poststart start: [ 2022-08-31 10:40:37 ] ========================== # (1) [ 2022-08-31 10:40:38 ] poststart is processing... # (2) [ 2022-08-31 10:40:39 ] poststart is processing.. # (3) [ 2022-08-31 10:40:40 ] poststart is processing... # (4) [ 2022-08-31 10:40:41 ] poststart is processing... # (5) [ 2022-08-31 10:40:42 ] poststart is processing... ========================== poststart end: [ 2022-08-31 10:40:42 ] ============================ #(6) #(7) #(8) #(9) #(10) #(11) #(12) ========================== prestop end: [ 2022-08-31 10:40:49 ] ============================== #(13) [ 2022-08-31 10:40:50 ] prestop is processing... #(14) [ 2022-08-31 10:40:51 ] prestop is processing... #(15) [ 2022-08-31 10:40:52 ] prestop is processing... #(16) [ 2022-08-31 10:40:53 ] prestop is processing... #(17) # [ 2022-08-31 10:40:54 ] receive signal: terminated => 15 # [ 2022-08-31 10:40:54 ] graceful shutdown running [ 2022-08-31 10:40:54 ] prestop is processing... # [ 2022-08-31 10:40:55 ] graceful shutdown running [ 2022-08-31 10:40:55 ] prestop is processing... # [ 2022-08-31 10:40:56 ] graceful shutdown running [ 2022-08-31 10:40:56 ] prestop is processing... # [ 2022-08-31 10:40:57 ] graceful shutdown running [ 2022-08-31 10:40:57 ] prestop is processing... ========================== prestop end: [ 2022-08-31 10:40:57 ] ============================== # [ 2022-08-31 10:40:58 ] graceful shutdown running # [ 2022-08-31 10:40:59 ] graceful shutdown running rpc error: code = NotFound desc = an error occurred when try to find container "229ecf2fe42975a00a49e066249ef9ead70a6618c3b625c86eb5bd4a5bf5a717": not found#
pod.log
The following conclusions can be drawn from the above logs.
- The timing of receiving the SIGTERM depends on the
preStop
andterminationGracePeriodSeconds
. In a nutshell, the duration of receiving SIGTERM = Min(preStop
,terminationGracePeriodSeconds
).
To be specific, ifpreStop < terminationGracePeriodSeconds
, then get the SIGTERM after running thepreStop
. IfpreStop >= terminationGracePeriodSeconds
, then get the SIGTERM after running theterminationGracePeriodSeconds
. - After receiving the SIGTERM, The pod begins to shut down, and receiving the SIGKILL stops arbitrarily, with the maximum duration (
terminationGracePeriodSeconds
).
Scenario 3: Request the API Server during the Pod graceful shutdown
postStart=5
preStop=5
termiationGracePeriodSeconds=8
gracefulShutdown=10
loop: for { select { case sig := <-signalChan: fmt. Printf("# [ %s ] receive signal: %s => %d \n", time.Now (). Format("2006-01-02 15:04:05"), sig.String(), sig) count = 0 break loop case <-ticker. C: count++ fmt. Printf(" #(%d)", count) } } fmt.Printf("# [ %s] graceful shudown >>>>>>>>>>>>>>> \n", time.Now ().Format("2006-01-02 15:04:05")) for i:= 0; i < 10; i++ { time. Sleep (1 * time. Second) count++ sa, _ := clientset. CoreV1 (). ServiceAccounts("default").Get(context. TODO(), "default", metav1.GetOptions{}) fmt.Printf(" #(%d) => service account: %s \n", count, sa.Name) } fmt.Printf("# [ %s ] main container finished \n", time.Now().Format("2006-01-02 15:04:05")) }
main.go
Since the pod needs to connect to the API Server, It needs to give some privileges to the Pod’sServiceAccount
in the default
namespace named default
.
kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/vi metadata: name: default-user-clusteradnin namespace: default subjects: - kind: ServiceAccount name: default namespace: default roleRe: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin
# [ 2022-08-31 13:35:12 ] main container start running ========================== poststart start: [ 2022-08-31 13:35:12 ] ============================== #(1) [ 2022-08-31 13:35:13 ] poststart is processing... #(2) [ 2022-08-31 13:35:14 ] poststart is processing... #(3) [ 2022-08-31 13:35:15 ] poststart is processing... #(4) [ 2022-08-31 13:35:16 ] poststart is processing... #(5) [ 2022-08-31 13:35:17 ] poststart is processing... ========================== poststart end: [ 2022-08-31 13:35:17 ] ============================== #(6) #(7) #(8) #(9) #(10) #(11) #(12) ========================== prestop start: [ 2022-08-31 13:35:25 ] ============================== #(13) [ 2022-08-31 13:35:26 ] prestop is processing... #(14) [ 2022-08-31 13:35:27 ] prestop is processing... #(15) [ 2022-08-31 13:35:28 ]prestop is processing... #(16) [ 2022-08-31 13:35:29 ] prestop is processing... #(17) [ 2022-08-31 13:35:30 ] prestop is processing... =============================== prestop end: [ 2022-08-31 13:35:30 ] ============================== # [ 2022-08-31 13:35:30 ] receive signal: terminated => 15 # [ 2022-08-31 13:35:30 ] graceful shudown >>>>>>>>> #(1) => service account: default #(2) => service account: default #(3) => service account: default #(4) => service account: default #(5) => service account: default #(6) => service account: default #(7) => service account: default rpc error: code = Not Found desc = an error occurred when try to find container "1ba21311a518cf9a3b00803d47e78fc29b63dca09e5f17bf705cf41183b741b6": not found#
pod.log
At last, we verified the Pod is always able to request API Server during the graceful shutdown.
Demo Source Code