その2が2022年の夏なのでもう3年近く放置していたRaspberryPiでk8s、なんとなくそろそろ手を付けるかと思ったので再開することにした。
なんで放置したかというと、kubeadm init
は正常に終了するもののapiserverが無限再起動編に突入して正常に稼働しなくて心が折れたというのがある。
一生kubeadm initが成功しない、いや成功するんだけどその後apiserverやらschedulerやらが勝手に再起動しまくってて使い物にならない、なんだこれ……
— 秋弦めい☂ (@maytheplic) July 4, 2022
結局原因を追及できなくてお手上げになったわけだが、それからまぁ3年も経てば何かしら変化があるだろうと思って再挑戦してみる。
なお、システム構成はほぼ変わらないが、Raspberry Pi OSはベースになるDebianのバージョンが上がってBullseyeからBookwormになっている。
ネットワーク構成
我が家のネットワークは192.168.39.0/24
が基本になっている(39
は乱数で振っただけ)。デフォルトゲートウェイおよびDNSサーバーは192.168.39.1
である。
各ノードはthistle-[0-2]
というホスト名が振られており、thistle-[0-2].intra.aquarite.info
で名前解決ができる。なお、これは宅内のDNSサーバーに設定を仕込んであるだけで、当然外部か intra.aquarite.info
の名前解決はできない。また、thistle-[0-2]
はIPアドレスを固定していて、thistle-0
から順に192.168.39.[20-22]
となっている。
Kubernetesのためには10.0.0.0/8
を割り当てる。
システムの準備
デフォルトではスワップが有効だが、kubeletはスワップが有効なままでは起動できない。スワップがあっても起動するように設定するか、スワップを無効にすればよい。今回はスワップファイルを削除し、無効にしておく。
まずは状態を確認する。free
コマンドを実行すると、Swapのtotalが511Mi
と表示されていることから、スワップが有効であることがわかる。
$ free -h
total used free shared buff/cache available
Mem: 7.6Gi 178Mi 7.3Gi 1.1Mi 249Mi 7.5Gi
Swap: 511Mi 0B 511Mi
スワップ無効化と、スワップファイルを管理する dphys-swapfile
の停止を行う。
$ sudo swapoff --all
$ sudo systemctl stop dphys-swapfile.service
$ sudo systemctl disable dphys-swapfile.service
IPv4のIPフォワードを有効にする。
$ sudo sysctl -w net.ipv4.ip_forward=1
次にmemory cgroupを有効にする。/boot/firmware/cmdline.txt
を開き、行末に cgroup_enable=memory cgroup_memory=1
を追記する。
$ cat /boot/firmware/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=99a9f0e9-02 rootfstype=ext4 fsck.repair=yes rootwait cgroup_enable=memory cgroup_memory=1
最後にbr_netfilter
を有効にする。起動時に読み込まれるように/etc/modules-load.d/
以下にファイルを作る。
$ echo br_netfilter | sudo tee /etc/modules-load.d/br_netfilter.conf
ここで一回再起動する。
なお、cgroupsの状態は /proc/cgroups
で確認できる。無効の状態だと、次のように memory
の行の enabled
が 0
になっている。
$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 0 38 1
cpu 0 38 1
cpuacct 0 38 1
blkio 0 38 1
memory 0 38 0
devices 0 38 1
freezer 0 38 1
net_cls 0 38 1
perf_event 0 38 1
net_prio 0 38 1
pids 0 38 1
上記の通り編集して再起動すると、1
に変わっている。
$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 0 69 1
cpu 0 69 1
cpuacct 0 69 1
blkio 0 69 1
memory 0 69 1
devices 0 69 1
freezer 0 69 1
net_cls 0 69 1
perf_event 0 69 1
net_prio 0 69 1
pids 0 69 1
コンテナランタイムの準備
今回はCRI-Oを採用する。バージョンはv1.32系を採用することとし、次の通り環境変数を設定しておく。
KUBERNETES_VERSION=v1.32
CRIO_VERSION=v1.32
https://github.com/cri-o/packaging/blob/main/README.md#distributions-using-deb-packages に従ってリポジトリの追加とパッケージのインストールを行う。
$ curl -fsSL https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
$ echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/ /" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
$ curl -fsSL https://download.opensuse.org/repositories/isv:/cri-o:/stable:/$CRIO_VERSION/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
$ echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://download.opensuse.org/repositories/isv:/cri-o:/stable:/$CRIO_VERSION/deb/ /" | sudo tee /etc/apt/sources.list.d/cri-o.list
$ sudo apt update
$ sudo apt install cri-o kubelet kubeadm kubectl
$ sudo systemctl enable --now crio.service
$ sudo systemctl enable --now kubelet
kubeadm init
でクラスタを初期化する。
$ sudo kubeadm init --control-plane-endpoint=thistle-0.intra.aquarite.info --pod-network-cidr=10.0.0.0/8
[init] Using Kubernetes version: v1.32.3
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local thistle-0 thistle-0.intra.aquarite.info] and IPs [10.96.0.1 192.168.39.20]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost thistle-0] and IPs [192.168.39.20 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost thistle-0] and IPs [192.168.39.20 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 2.001340198s
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 9.003387504s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node thistle-0 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node thistle-0 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: ***
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join thistle-0.intra.aquarite.info:6443 --token *** \
--discovery-token-ca-cert-hash sha256:*** \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join thistle-0.intra.aquarite.info:6443 --token *** \
--discovery-token-ca-cert-hash sha256:***
※ 一応トークンとかハッシュのところは潰している。
残りの2ノードでメッセージの最後に表示されたコマンドを実行する。
$ sudo kubeadm join thistle-0.intra.aquarite.info:6443 --token *** \
--discovery-token-ca-cert-hash sha256:***
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
[preflight] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
[preflight] Use 'kubeadm init phase upload-config --config your-config.yaml' to re-upload it.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.503925623s
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
コントロールノードでノードの情報を取得する。CNIが入っていないので全てのノードがNotReadyになっている。
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
thistle-0 NotReady control-plane 4m21s v1.32.3
thistle-1 NotReady <none> 2m30s v1.32.3
thistle-2 NotReady <none> 2m21s v1.32.3
Ciliumのインストール(失敗)
まずは公式のスニペットをそのまま実行する。
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
Cilium CLIを確認する。
$ which cilium
/usr/local/bin/cilium
インストールする。この時点での最新版は1.17.2。
$ cilium install --version 1.17.2
ℹ️ Using Cilium version 1.17.2
🔮 Auto-detected cluster name: kubernetes
🔮 Auto-detected kube-proxy has been installed
ステータスを確認する。--wait
を与えると正常な状態になるまでループする。
$ cilium status --wait
/¯¯\
/¯¯\__/¯¯\ Cilium: 1 errors, 2 warnings
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: 1 errors, 1 warnings
\__/¯¯\__/ Hubble Relay: disabled
\__/ ClusterMesh: disabled
DaemonSet cilium Desired: 3, Ready: 1/3, Available: 1/3, Unavailable: 2/3
DaemonSet cilium-envoy Desired: 3, Unavailable: 3/3
Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1
Containers: cilium Pending: 2, Running: 1
cilium-envoy Pending: 1, Running: 2
cilium-operator Running: 1
...
しばらく待てば正常終了するはずが、Envoy DaemonSetでエラーが出て一向に終了しない。一旦Ctrl-Cで止めて、ログを確認してみる。まずはpodの一覧を見る。
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
cilium-4kw74 1/1 Running 0 3m17s
cilium-envoy-jp7xv 0/1 CrashLoopBackOff 4 (20s ago) 3m17s
cilium-envoy-v6rtr 0/1 CrashLoopBackOff 4 (18s ago) 3m17s
cilium-envoy-w2fzn 0/1 CrashLoopBackOff 5 (10s ago) 3m17s
cilium-nw99x 1/1 Running 0 3m17s
cilium-operator-59944f4b8f-t77mw 1/1 Running 0 3m17s
cilium-pwcfx 1/1 Running 0 3m17s
coredns-86c5fccbb-2css6 1/1 Running 0 2m17s
coredns-86c5fccbb-ztp94 1/1 Running 0 2m17s
etcd-thistle-0 1/1 Running 1 9m24s
kube-apiserver-thistle-0 1/1 Running 1 9m24s
kube-controller-manager-thistle-0 1/1 Running 1 9m24s
kube-proxy-ctmwg 1/1 Running 0 7m29s
kube-proxy-jzcqj 1/1 Running 0 7m38s
kube-proxy-rc25q 1/1 Running 0 9m21s
kube-scheduler-thistle-0 1/1 Running 1 9m26s
cilium-envoy-*
がCrashLoopBackOffになっている。jp7xv
のpodのログを見る。
$ kubectl -n kube-system logs cilium-envoy-jp7xv
external/com_github_google_tcmalloc/tcmalloc/system-alloc.cc:625] MmapAligned() failed - unable to allocate with tag (hint, size, alignment) - is something limiting address placement? 0x161600000000 1073741824 1073741824 @ 0x557b0bccc4 0x557b0b90e0 0x557b0b89a0 0x557b0981d0 0x557b0b6694 0x557b0b6468 0x557b08d988 0x557afa3c84 0x557afa09a0 0x7f94868614
external/com_github_google_tcmalloc/tcmalloc/arena.cc:58] FATAL ERROR: Out of memory trying to allocate internal tcmalloc data (bytes, object-size); is something preventing mmap from succeeding (sandbox, VSS limitations)? 131072 632 @ 0x557b0bd034 0x557b098260 0x557b0b6694 0x557b0b6468 0x557b08d988 0x557afa3c84 0x557afa09a0 0x7f94868614
どうもメモリの確保に失敗しているようだ。2行目ではOut of memoryと言っているので、念のためメモリの状態を確認してみるが、メモリには余裕があるので別の原因であろうと思われる。
$ free -h
total used free shared buff/cache available
Mem: 7.6Gi 878Mi 4.3Gi 3.4Mi 2.6Gi 6.8Gi
Swap: 0B 0B 0B
調べてみると、どうやらEnvoy、というかtcmallocの問題で、Raspberry Pi (ARM64)で動作しないようだ。かなり前のissueではあるが、コメント自体は2024年までついていて状況が改善されてなさそうな雰囲気がある。実際こうして問題を踏んでいるし……。
- Envoy 1.17-1.23 unable to start on Raspberry · Issue #23339 · envoyproxy/envoy
- Cilium fails to launch on ARM64: Binary cilium-envoy cannot be executed · Issue #17467 · cilium/cilium
問題を回避するにはカーネルオプションを変えてビルドし直すかtcmallocをビルドし直す必要があるということで、さすがにヘビー過ぎるというか、今後のメンテナンスが怠すぎるので大人しくFlannelを使うことにした。というわけでCiliumをアンインストール。
$ cilium uninstall
Flannelのインストール
Podネットワークの範囲を10.0.0.0/8
にしているので少々修正が必要である。まずはYAMLファイルをダウンロードしてくる。
$ curl -LO 'https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml'
このファイルを開いてCIDRの指定を編集する。
...
net-conf.json: |
{
"Network": "10.0.0.0/8",
"EnableNFTables": false,
"Backend": {
"Type": "vxlan"
}
}
...
それからkubectl apply
を行う。
$ kubectl apply -f kube-flannel.yml
namespace/kube-flannel created
serviceaccount/flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
ノードの状態を確認すると、全ノードがReadyになっている。
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
thistle-0 Ready control-plane 25m v1.32.3
thistle-1 Ready <none> 23m v1.32.3
thistle-2 Ready <none> 23m v1.32.3
3年前あんだけ詰まったのは一体なんだったんだろう。
CoreDNSが上がってこない
Podの状態を確認するとCoreDNSが起動していない。
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-86c5fccbb-2css6 0/1 ContainerCreating 1 6h13m
coredns-86c5fccbb-ztp94 0/1 ContainerCreating 1 6h13m
etcd-thistle-0 1/1 Running 3 6h20m
kube-apiserver-thistle-0 1/1 Running 3 6h20m
kube-controller-manager-thistle-0 1/1 Running 3 6h20m
kube-proxy-ctmwg 1/1 Running 2 6h19m
kube-proxy-jzcqj 1/1 Running 2 6h19m
kube-proxy-rc25q 1/1 Running 2 6h20m
kube-scheduler-thistle-0 1/1 Running 3 6h21m
describeでpodの状態を見てみる。
$ kubectl -n kube-system describe pods coredns-86c5fccbb-2css6
Name: coredns-86c5fccbb-2css6
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: coredns
Node: thistle-0/192.168.39.20
Start Time: Tue, 25 Mar 2025 04:58:34 +0900
Labels: k8s-app=kube-dns
pod-template-hash=86c5fccbb
Annotations: kubectl.kubernetes.io/restartedAt: 2025-03-25T04:58:33+09:00
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-86c5fccbb
Containers:
coredns:
Container ID:
Image: registry.k8s.io/coredns/coredns:v1.11.3
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Last State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was deleted. The container used to be Running
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 1
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6whk9 (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-6whk9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 9m54s (x281 over 5h5m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-86c5fccbb-2css6_kube-system_632c1b51-bec4-4935-a93f-d1191f76e6d6_0(935b3ad214394f5d3730b1c1a4f7cbe261dd29a0e5cce8ebcc3eaf945ec7eba8): error adding pod kube-system_coredns-86c5fccbb-2css6 to CNI network "cilium": plugin type="cilium-cni" failed (add): unable to connect to Cilium agent: failed to create cilium agent client after 30.000000 seconds timeout: Get "http://localhost/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Warning FailedCreatePodSandBox 3m8s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-86c5fccbb-2css6_kube-system_632c1b51-bec4-4935-a93f-d1191f76e6d6_0(888b9d00c92b667b0e0b6d865ba9b96ff0e1afcee33acc2818e00d5f6ee5d64f): error adding pod kube-system_coredns-86c5fccbb-2css6 to CNI network "cilium": plugin type="cilium-cni" failed (add): unable to connect to Cilium agent: failed to create cilium agent client after 30.000000 seconds timeout: Get "http://localhost/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Warning FailedCreatePodSandBox 2m4s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-86c5fccbb-2css6_kube-system_632c1b51-bec4-4935-a93f-d1191f76e6d6_0(345a1307bba81a94435d8aff9a3d79c67fca3683ad66284d475805c371a4b38c): error adding pod kube-system_coredns-86c5fccbb-2css6 to CNI network "cilium": plugin type="cilium-cni" failed (add): unable to connect to Cilium agent: failed to create cilium agent client after 30.000000 seconds timeout: Get "http://localhost/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Warning FailedCreatePodSandBox 63s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-86c5fccbb-2css6_kube-system_632c1b51-bec4-4935-a93f-d1191f76e6d6_0(2e436ec645ce0fac36945db53eaaee27b1d89680484749ad6e2d72c9371f77c1): error adding pod kube-system_coredns-86c5fccbb-2css6 to CNI network "cilium": plugin type="cilium-cni" failed (add): unable to connect to Cilium agent: failed to create cilium agent client after 30.000000 seconds timeout: Get "http://localhost/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Warning FailedCreatePodSandBox 2s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-86c5fccbb-2css6_kube-system_632c1b51-bec4-4935-a93f-d1191f76e6d6_0(4184bee28309b45aa260c25d2a92f68d6dbac4399ae028b465212cc4a25219ae): error adding pod kube-system_coredns-86c5fccbb-2css6 to CNI network "cilium": plugin type="cilium-cni" failed (add): unable to connect to Cilium agent: failed to create cilium agent client after 30.000000 seconds timeout: Get "http://localhost/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Eventsを見ると、どうやらCiliumを使おうとして失敗しているように見える。Ciliumは既にアンインストールしたので当然使えないが、なぜそもそも使おうとしてしまうのか?
というわけで調べてみると同じ事象の記事があった: kubernetesでcorednsが起動しない時に確認すること #Calico - Qiita
/etc/cni/net.d
以下にファイルがあるそうなので、探してみる。
$ cat /etc/cni/net.d/
05-cilium.conflist 10-flannel.conflist
10-crio-bridge.conflist.disabled .kubernetes-cni-keep
$
05-cilium.conflist
がある。これを削除してしばらくすると、Runningになった。
$ sudo rm /etc/cni/net.d/05-cilium.conflist
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-86c5fccbb-2css6 1/1 Running 2 6h26m
coredns-86c5fccbb-q226s 0/1 ContainerCreating 0 10m
etcd-thistle-0 1/1 Running 4 6h33m
kube-apiserver-thistle-0 1/1 Running 4 6h33m
kube-controller-manager-thistle-0 1/1 Running 4 6h33m
kube-proxy-ctmwg 1/1 Running 2 6h31m
kube-proxy-jzcqj 1/1 Running 2 6h31m
kube-proxy-rc25q 1/1 Running 3 6h33m
kube-scheduler-thistle-0 1/1 Running 4 6h33m
各ノードに05-cilium.conflist
があったため、他のノードでも同様に削除すると、残っているContainerCreatingのものも正常化した。
というわけでk8sクラスタの構築が完了した。本当に3年前のapiserver無限再起動は何だったんだろうね。