nfs 作为存储,pvc 和 pv 都是 bound 状态,而且还测试过 pod 都能够向 nfs 里面写入文件,但搭建 mysql 就报错:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 25m default-scheduler 0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling. Normal Scheduled 25m default-scheduler Successfully assigned lzipant/mysql-0 to arch124 Normal Pulling 25m kubelet Pulling image "mysql:5.7" Normal Pulled 25m kubelet Successfully pulled image "mysql:5.7" in 3.099960834s Normal Created 24m (x5 over 25m) kubelet Created container init-mysql Normal Pulled 24m (x4 over 25m) kubelet Container image "mysql:5.7" already present on machine Normal Started 24m (x5 over 25m) kubelet Started container init-mysql Warning BackOff 43s (x117 over 25m) kubelet Back-off restarting failed container
各项 yaml 如下:
configMap.yaml:
apiVersion: v1 kind: ConfigMap metadata: name: mysql namespace: lzipant labels: app: mysql data: master.cnf: | [mysqld] log-bin slave.cnf: | [mysqld] super-read-only
service.yaml:
apiVersion: v1 kind: ConfigMap metadata: name: mysql namespace: lzipant labels: app: mysql data: master.cnf: | [mysqld] log-bin slave.cnf: | [mysqld] super-read-only
statefulSet.yaml:
apiVersion: apps/v1 kind: StatefulSet metadata: name: mysql namespace: lzipant spec: selector: matchLabels: # 适用于所有 label 包括 app=mysql 的 pod app: mysql serviceName: mysql replicas: 3 # 定义 pod template: metadata: labels: app: mysql spec: # 在 init 容器中为 pod 中的 mysql 容器做初始化工作 initContainers: # init-mysql 容器会分配 pod 的角色是 master 还是 slave, 然后生成配置文件 - name: init-mysql image: mysql:5.7 command: - bash - "-c" - | set -ex # 生成 server-id [[ `hostname` =~ -([0-9]+)$ ]] || exit 1 ordinal=${BASH_REMATCH[1]} echo [mysqld] > /mnt/conf.d/server-id.cnf # 写入 server-id echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf # server-id 尾号为 0 作为 master, 否则作为 slave # 这里 cp 到 pod 中的 cnf 会与 server-id.cnf 一块被 mysql.cnf include 进去 # 这里指定了序号为 0 的 pod 会作为 master 节点提供写, 其他 pod 作为 slave 节点提供读 if [[ $ordinal -eq 0 ]]; then cp /mnt/config-map/master.cnf /mnt/conf.d/ else cp /mnt/config-map/slave.cnf /mnt/conf.d/ fi volumeMounts: # 将 conf 临时卷挂载到了 pod 的 /mnt/conf.d 路径下 - name: conf mountPath: /mnt/conf.d # 这里把 ConfigMap 中的配置怪哉到了 pod 的 /mnt/config-map 路径下 - name: config-map mountPath: /mnt/config-map # 这一个 init 容器会正在 pod 启动时假定之前已经存在数据, 并将之前的数据复制过来, 以确保新 pod 中有数据可以提供使用 - name: clone-mysql # xtrabackup 是一个开源工具, 用于克隆 mysql 的数据 image: ist0ne/xtrabackup:latest command: - bash - "-c" - | set -ex # Skip the clone if data already exists. [[ -d /var/lib/mysql/mysql ]] && exit 0 # Skip the clone on master (ordinal index 0). [[ `hostname` =~ -([0-9]+)$ ]] || exit 1 ordinal=${BASH_REMATCH[1]} [[ $ordinal -eq 0 ]] && exit 0 # Clone data from previous peer. ncat --recv-only mysql-$(($ordinal-1)).mysql 3307 | xbstream -x -C /var/lib/mysql # Prepare the backup. xtrabackup --prepare --target-dir=/var/lib/mysql volumeMounts: - name: mysql-data mountPath: /var/lib/mysql subPath: mysql - name: conf mountPath: /etc/mysql/conf.d containers: # 实际运行 mysqld 服务的 mysql 容器 - name: mysql image: mysql:5.7 env: - name: MYSQL_ROOT_PASSWORD value: "abcdef" ports: - name: mysql containerPort: 3306 volumeMounts: # 将 data 卷的 mysql 目录挂在到容器的 /var/lib/mysql - name: mysql-data mountPath: /var/lib/mysql subPath: mysql - name: conf mountPath: /etc/mysql/conf.d resources: requests: cpu: 500m memory: 1Gi # 启动存活探针, 如果失败会重启 pod livenessProbe: exec: command: ["mysqladmin", "ping"] initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 # 启动就绪探针确保容器的运行正常, 如果有失败会将 pod 从 service 关联的 endpoint 中剔除 readinessProbe: exec: # Check we can execute queries over TCP (skip-networking is off). command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"] initialDelaySeconds: 5 periodSeconds: 2 timeoutSeconds: 1 # init 结束后还会在启动一个 xtrabackup 容器作为 mysqld 容器的 sidecar 运行 - name: xtrabackup image: ist0ne/xtrabackup:latest ports: - name: xtrabackup containerPort: 3307 command: - bash - "-c" - | set -ex cd /var/lib/mysql # 他会在启动时查看之前是否有数据克隆文件存在, 如果有那就去其他从节点复制数据, 如果没有就去主节点复制数据 # Determine binlog position of cloned data, if any. if [[ -f xtrabackup_slave_info && "x$(<xtrabackup_slave_info)" != "x" ]]; then # XtraBackup already generated a partial "CHANGE MASTER TO" query # because we're cloning from an existing slave. (Need to remove the tailing semicolon!) cat xtrabackup_slave_info | sed -E 's/;$//g' > change_master_to.sql.in # Ignore xtrabackup_binlog_info in this case (it's useless). rm -f xtrabackup_slave_info xtrabackup_binlog_info elif [[ -f xtrabackup_binlog_info ]]; then # We're cloning directly from master. Parse binlog position. [[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1 rm -f xtrabackup_binlog_info xtrabackup_slave_info echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',\ MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in fi # Check if we need to complete a clone by starting replication. if [[ -f change_master_to.sql.in ]]; then echo "Waiting for mysqld to be ready (accepting connections)" until mysql -h 127.0.0.1 -e "SELECT 1"; do sleep 1; done echo "Initializing replication from clone position" mysql -h 127.0.0.1 \ -e "$(<change_master_to.sql.in), \ MASTER_HOST='mysql-0.mysql', \ MASTER_USER='root', \ MASTER_PASSWORD='', \ MASTER_CONNECT_RETRY=10; \ START SLAVE;" || exit 1 # In case of container restart, attempt this at-most-once. mv change_master_to.sql.in change_master_to.sql.orig fi # Start a server to send backups when requested by peers. exec ncat --listen --keep-open --send-only --max-cOnns=1 3307 -c \ "xtrabackup --backup --slave-info --stream=xbstream --host=127.0.0.1 --user=root" volumeMounts: # 将 data 卷的 mysql 目录挂在到容器的 /var/lib/mysql - name: mysql-data mountPath: /var/lib/mysql subPath: mysql - name: conf mountPath: /etc/mysql/conf.d volumes: - name: conf # pod 在节点上被移除时, emptyDir 会同时被删除 # emptyDir 一般被用作缓存目录, 这里用在 config emptyDir: {} - name: config-map # ConfigMap 对象中存储的数据可以被 configMap 类型的卷引用, 然后被 Pod 中运行的容器使用 # 这里引用了前面定义了名称为 mysql 的 ConfigMap 对象 configMap: name: mysql volumeClaimTemplates: # 这里面定义的是对 PVC 的模板, 这里没有单独为 mysql 创建 pvc, 而是动态创建的 - metadata: name: mysql-data namespace: lzipant spec: accessModes: ["ReadWriteOnce"] # 如果没有配置默认的 storageClass 的话, 需要指定 storageClassName storageClassName: managed-nfs-storage resources: requests: storage: 5Gi
storageClass.yaml:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: managed-nfs-storage namespace: lzipant provisioner: fuseim.pri/ifs # must match deployement env PROVISIONER_NAME reclaimPolicy: Retain
1 lingly02 2022-08-01 16:49:19 +08:00 这个一般是 PV 没有自动创建成功,可以 `kubectl desc pvc`, `kubectl get pv` 看看 |
![]() | 2 HarrisonLee OP @lingly02 应该创建成功了的吧 ```shell [root@arch121 mysql]# kubectl get pvc -n lzipant NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mysql-data-mysql-0 Bound pvc-ba6d4db6-4de3-4801-b700-f033c25c89af 5Gi RWX managed-nfs-storage 95s [root@arch121 mysql]# kubectl get pv -n lzipant NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-ba6d4db6-4de3-4801-b700-f033c25c89af 5Gi RWX Delete Bound lzipant/mysql-data-mysql-0 managed-nfs-storage 100s [root@arch121 mysql]# kubectl describe pvc mysql-data-mysql-0 -n lzipant Name: mysql-data-mysql-0 Namespace: lzipant StorageClass: managed-nfs-storage Status: Bound Volume: pvc-ba6d4db6-4de3-4801-b700-f033c25c89af Labels: app=mysql Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-class: managed-nfs-storage volume.beta.kubernetes.io/storage-provisioner: fuseim.pri/ifs volume.kubernetes.io/storage-provisioner: fuseim.pri/ifs Finalizers: [kubernetes.io/pvc-protection] Capacity: 5Gi Access Modes: RWX VolumeMode: Filesystem Used By: mysql-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalProvisioning 2m2s persistentvolume-controller waiting for a volume to be created, either by external provisioner "fuseim.pri/ifs" or manually created by system administrator Normal Provisioning 2m fuseim.pri/ifs_nfs-client-provisioner-5868c55665-mbmdq_f2a0aea4-8bfc-4b4a-8eac-78ca5cfcf134 External provisioner is provisioning volume for claim "lzipant/mysql-data-mysql-0" Normal ProvisioningSucceeded 2m fuseim.pri/ifs_nfs-client-provisioner-5868c55665-mbmdq_f2a0aea4-8bfc-4b4a-8eac-78ca5cfcf134 Successfully provisioned volume pvc-ba6d4db6-4de3-4801-b700-f033c25c89af [root@arch121 mysql]# kubectl describe pv pvc-ba6d4db6-4de3-4801-b700-f033c25c89af -n lzipant Name: pvc-ba6d4db6-4de3-4801-b700-f033c25c89af Labels: <none> Annotations: pv.kubernetes.io/provisioned-by: fuseim.pri/ifs Finalizers: [kubernetes.io/pv-protection] StorageClass: managed-nfs-storage Status: Bound Claim: lzipant/mysql-data-mysql-0 Reclaim Policy: Delete Access Modes: RWX VolumeMode: Filesystem Capacity: 5Gi Node Affinity: <none> Message: Source: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: arch121.com Path: /mnt/nfs/lzipant-mysql-data-mysql-0-pvc-ba6d4db6-4de3-4801-b700-f033c25c89af ReadOnly: false Events: <none> ``` |
3 novolunt 2022-08-01 17:30:57 +08:00 ![]() 众所周知生产 mysql 不建议直接用 kubernets ,所以衍生出 vitness 这个项目 https://vitess.io/ 建议还是虚拟机比较妥 |
4 novolunt 2022-08-01 17:33:05 +08:00 数据库(mysql/mongo)有个特点,要求块的连续读写,所以使用 k8s 的 pv 或者加密的硬盘,常常会出现无法正常启动的问题 |
5 gengchun 2022-08-01 17:37:56 +08:00 你觉得还是 nfs 有问题?那也应该是把 mysql 的启动报错贴出来。用 nfs 挂盘给 mysql 已经很奇怪了。 你的初始化 bash 脚本从哪里来的,真的生产搭建也没有必要这样初始化主库和从库数据库。 虽然 mariadb 镜像的默认启动脚本也有不少问题,但是,这个脚本更是漏掉了很多启动 mysql 需要考虑的问题,只是写了主从,还有 /mnt/conf.d 是什么?这个初始化脚本能正常启动?你确定不是初始化脚本的问题? 而且 mysql 为什么要加 readinessProbe ?要做主从自动切换,也应该是使用 maxscale 这样的中间件控制的。 |
![]() | 6 HarrisonLee OP |
7 gengchun 2022-08-01 17:47:54 +08:00 @novolunt pv 和连续读写没有关系。pv/pvc 只是个声明。没有这样的说法。 k8s 做这个不合适,只不过是因为特意这么做会牺牲 k8s 的特性,相比虚机方案复杂,运维上反而增加复杂度,而没有明显的好处,显得有些多余。并不是说 k8s 不能这么做。 |
8 gengchun/a> 2022-08-01 18:00:41 +08:00 @HarrisonLee 我不认为这个脚本可以从零正常启动,你可以自己看一下 mariadb 官方镜像里的初始化脚本是如何实现的。这个和 k8s 没有关系,需要看 docker 镜像的制作。 开发或者测试本地也没有必要搭建主从。而搭建团队共用的 stage 环境,也没有必要专门做这种主从。启动完手工配一次主从就够了。 只是初学的话,把主从去掉,mariadb 官方镜像按 k8s 官网的那个简单的部署来就行了。甚至都没有必要用 nfs 。 |
9 novolunt 2022-08-01 18:15:50 +08:00 @gengchun 嗯不是 pv 的问题,其实是想表达的是用到的 csi 是不可见的,你无法知道它底层硬件信息及技术细节,不适合用来存储 db 相关的数据。 |
![]() | 10 anubu 2022-08-01 18:29:22 +08:00 查容器日志,大概率是文件系统权限问题。数据库挂在 nfs 或 smb 很容易有文件系统问题,非要这么用的话,就用 PV 挂块存储,比如 iscsi 之类的。 |
![]() | 11 Pythondr 2022-08-02 10:38:26 +08:00 用 NFS 做数据库的存储后端,数据库的可用性基本为零。读写性能太拉跨了 |