K8S集群安装部署
环境及架构
k8s v1.26.3
单Master,2Core 4G RAM,Ubuntu 22.04
关闭swap机制
查看本机swap配置,若无swap配置无需关闭
vim /proc/swaps
若有配置则注释掉/etc/fstab中的swap部分
vim /etc/fstab
清空已分配的swap
swapoff -a
查看内存分配情况
free -h
开放K8S通信端口
安装kubeadm, kubelet和kubectl
kubeadm用于构建集群,在Master机器声明Master,在Slave机器加入Master构建的集群
kubelet用于启动pod和container,运行在每台主机上
kubectl提供命令行
外网可访问
更新apt软件源
sudo apt-get update
安装https下载传输工具、CA证书、curl传输工具
sudo apt-get install -y apt-transport-https ca-certificates curl
获取公钥,可能需要后再上传至/etc/apt/keyrings/目录
sudo curl -fsSLo /etc/apt/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
设置repository
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
更新软件源后进行安装
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
外网不可访问
使用华为云镜像
备份/etc/apt/sources.list.d/kubernetes.list文件:
cp /etc/apt/sources.list.d/kubernetes.list /etc/apt/sources.list.d/kubernetes.list.bak
修改/etc/apt/sources.list.d/kubernetes.list文件,加入镜像源:
cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb https://repo.huaweicloud.com/kubernetes/apt/ kubernetes-xenial main
EOF
添加kubernetes的key
curl -s https://repo.huaweicloud.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
更新索引文件并安装kubernetes
sudo apt-get update
若报错无NO_PUBKEY则使用科学上网获得的gpg文件,上传到/etc/apt/keyrings目录下添加公钥
apt-key add kubernetes-archive-keyring.gpg
更新软件源
sudo apt-get update
安装kubeadm、kubelet、kubectl工具
sudo apt install -y kubeadm kubelet kubectl
sudo apt-mark hold kubelet kubeadm kubectl
查看k8s版本
kubectl version --output=yaml
启用containerd作为CRI
正确步骤
生成containerd配置文件
containerd config default > /etc/containerd/config.toml
按照官网CRI部分修改如下配置
[plugins."io.containerd.grpc.v1.cri"]
# sandbox_image = "registry.k8s.io/pause:3.6"
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
…
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
重启使得配置生效
systemctl restart containerd
修改crictl.yaml的CRI
cat <<EOF | sudo tee /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
EOF
报错定位和解决思路
未按照正确步骤部署,将报错误
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
…
To see the stack trace of this error execute with --v=5 or higher
那我们按照提示使用systemctl status kubelet和journalctl -xeu kubelet分别查看错误日志
failed to pull and unpack image \"registry.k8s.io/pause:3.6\"
[ERROR CRI]: container runtime is not running,
也就是说虽然指定了阿里云镜像,但仍然从外网拉取镜像,我们尝试用crictl手动拉取镜像,先查看拉取的镜像
crictl images
发生下述报错,结合之前的ERROR CRI可确定是CRI问题,查阅文档可知 新版本k8s要求CRI使用containerd而不再使用docker
/var/run/dockershim.sock: connect: no such file or directory"
按照提示使用kubeadm init –kubernetes-version 1.26.3 –image-repository=registry.aliyuncs.com/google_containers –v=5查看错误日志,发现如下问题
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
可能是上次未完整安装导致的重复,我们删掉exists的文件,查看kill掉占用端口10250的进程PID,重置配置后,重新去执行正确配置中的init
netstat -apl | grep 10250
kill -9 PID
kubeadm reset
rm -fr $HOME/.kube/config
启动集群
正确步骤
使用kubeadm init 命令在主节点机器启动主节点
输入k8s版本号,使用阿里云镜像,apiserver监听本机ip,指定service的cidr子网,指定pod的cidr子网
kubeadm init \
--kubernetes-version 1.26.3 \
--apiserver-advertise-address=0.0.0.0 \
--image-repository=registry.aliyuncs.com/google_containers \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=10.244.0.0/16
使用root用户执行,支持使用kubectl命令
export KUBECONFIG=/etc/kubernetes/admin.conf
安装网络插件 flannel
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
允许master部署pod
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
查看主机节点
kubectl get nodes
查看masterID详细信息
kubectl describe node masterID
报错解决
若未指定子网,后序将无法启动pod而报错,修改配置flannel仅在重启前有效,建议重置集群重新init
Warning FailedCreatePodSandBox 4m42s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8aad32481c39b67180336ba317289c1584e6d87a72a37488309cc84b4fe79313" network for pod "test-k8s-68bb74d654-6h6xv": networkPlugin cni failed to set up pod "test-k8s-68bb74d654-6h6xv_default" network: loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
使用kubectl指令时报错,应使用正确步骤中的支持kubectl命令
E0329 11:29:27.558021 358193 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
改进待续
从节点加入集群
搭建高可用K8S集群
参考资料
Kubernetes Documentation / Installing kubeadm
使用kubeadm创建集群失败报Unable to register node with API server
Kubernetes 高可用集群搭建说明