安装部署K8S集群 – v1.26.3部署踩坑

K8S集群安装部署

环境及架构

k8s v1.26.3
单Master,2Core 4G RAM,Ubuntu 22.04

关闭swap机制

查看本机swap配置,若无swap配置无需关闭

vim /proc/swaps

若有配置则注释掉/etc/fstab中的swap部分

vim /etc/fstab

清空已分配的swap

swapoff -a

查看内存分配情况

free -h

开放K8S通信端口

安装kubeadm, kubelet和kubectl

kubeadm用于构建集群,在Master机器声明Master,在Slave机器加入Master构建的集群
kubelet用于启动pod和container,运行在每台主机上
kubectl提供命令行

外网可访问

更新apt软件源

sudo apt-get update

安装https下载传输工具、CA证书、curl传输工具

sudo apt-get install -y apt-transport-https ca-certificates curl

获取公钥,可能需要后再上传至/etc/apt/keyrings/目录

sudo curl -fsSLo /etc/apt/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg

设置repository

echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

更新软件源后进行安装

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

外网不可访问

使用华为云镜像
备份/etc/apt/sources.list.d/kubernetes.list文件:

cp /etc/apt/sources.list.d/kubernetes.list /etc/apt/sources.list.d/kubernetes.list.bak

修改/etc/apt/sources.list.d/kubernetes.list文件,加入镜像源:

cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb https://repo.huaweicloud.com/kubernetes/apt/ kubernetes-xenial main
EOF

添加kubernetes的key

curl -s https://repo.huaweicloud.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -

更新索引文件并安装kubernetes

sudo apt-get update

若报错无NO_PUBKEY则使用科学上网获得的gpg文件,上传到/etc/apt/keyrings目录下添加公钥

apt-key add kubernetes-archive-keyring.gpg 

更新软件源

sudo apt-get update

安装kubeadm、kubelet、kubectl工具

sudo apt install -y kubeadm kubelet kubectl
sudo apt-mark hold kubelet kubeadm kubectl

查看k8s版本

kubectl version --output=yaml

启用containerd作为CRI

正确步骤

生成containerd配置文件

containerd config default > /etc/containerd/config.toml

按照官网CRI部分修改如下配置

[plugins."io.containerd.grpc.v1.cri"]
   # sandbox_image = "registry.k8s.io/pause:3.6"
   sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
 …
 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

重启使得配置生效

systemctl restart containerd

修改crictl.yaml的CRI

cat <<EOF | sudo tee /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
EOF

报错定位和解决思路

未按照正确步骤部署,将报错误

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'
…
To see the stack trace of this error execute with --v=5 or higher

那我们按照提示使用systemctl status kubelet和journalctl -xeu kubelet分别查看错误日志

failed to pull and unpack image \"registry.k8s.io/pause:3.6\"
[ERROR CRI]: container runtime is not running,

也就是说虽然指定了阿里云镜像,但仍然从外网拉取镜像,我们尝试用crictl手动拉取镜像,先查看拉取的镜像

crictl images

发生下述报错,结合之前的ERROR CRI可确定是CRI问题,查阅文档可知 新版本k8s要求CRI使用containerd而不再使用docker

/var/run/dockershim.sock: connect: no such file or directory"

按照提示使用kubeadm init –kubernetes-version 1.26.3 –image-repository=registry.aliyuncs.com/google_containers –v=5查看错误日志,发现如下问题

[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR Port-10250]: Port 10250 is in use

可能是上次未完整安装导致的重复,我们删掉exists的文件,查看kill掉占用端口10250的进程PID,重置配置后,重新去执行正确配置中的init

netstat -apl | grep 10250
kill -9 PID
kubeadm reset
rm -fr  $HOME/.kube/config

启动集群

正确步骤

使用kubeadm init 命令在主节点机器启动主节点
输入k8s版本号,使用阿里云镜像,apiserver监听本机ip,指定service的cidr子网,指定pod的cidr子网

kubeadm init \
--kubernetes-version 1.26.3 \
--apiserver-advertise-address=0.0.0.0 \
--image-repository=registry.aliyuncs.com/google_containers \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=10.244.0.0/16

使用root用户执行,支持使用kubectl命令

export KUBECONFIG=/etc/kubernetes/admin.conf

安装网络插件 flannel

kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

允许master部署pod

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

查看主机节点

kubectl get nodes

查看masterID详细信息

kubectl describe node masterID

报错解决

若未指定子网,后序将无法启动pod而报错,修改配置flannel仅在重启前有效,建议重置集群重新init

Warning FailedCreatePodSandBox 4m42s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8aad32481c39b67180336ba317289c1584e6d87a72a37488309cc84b4fe79313" network for pod "test-k8s-68bb74d654-6h6xv": networkPlugin cni failed to set up pod "test-k8s-68bb74d654-6h6xv_default" network: loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory

使用kubectl指令时报错,应使用正确步骤中的支持kubectl命令

E0329 11:29:27.558021  358193 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused

改进待续

从节点加入集群

搭建高可用K8S集群

参考资料

Kubernetes Documentation / Installing kubeadm
使用kubeadm创建集群失败报Unable to register node with API server
Kubernetes 高可用集群搭建说明

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

©2018-2025 Howell版权所有 备案号:冀ICP备19000576号