docker --docker cluster and container orchestration Swarm mode

One, understand Docker Swarm

Insert picture description here


Insert picture description here
  • The Swarm orchestration tool is embedded in docker. It is a simple orchestration tool developed by docker's own company
  • The docker cluster node has two roles: manager and worker. Generally, there are more than two managers to ensure high availability.
  • Multiple managers realize state synchronization through Raft distributed storage
  • Worker is the node that mainly runs the container
  • Information synchronization between multiple worker nodes through the Gossip network
Insert picture description here


Insert picture description here

1. Create a three-node swarm cluster

# 查看帮助信息
[[email protected] voting]# docker swarm init --help
Usage:  docker swarm init [OPTIONS]
Initialize a swarm
Options:
      --advertise-addr string                  Advertised address (format: <ip|interface>[:port])
      --autolock                               Enable manager autolocking (requiring an unlock key to start a
                                               stopped manager)
      --availability string                    Availability of the node ("active"|"pause"|"drain") (default
                                               "active")
      --cert-expiry duration                   Validity period for node certificates (ns|us|ms|s|m|h) (default
                                               2160h0m0s)
      --data-path-addr string                  Address or interface to use for data path traffic (format:
                                               <ip|interface>)
      --data-path-port uint32                  Port number to use for data path traffic (1024 - 49151). If no
                                               value is set or is set to 0, the default port (4789) is used.
      --default-addr-pool ipNetSlice           default address pool in CIDR format (default [])
      --default-addr-pool-mask-length uint32   default address pool subnet mask length (default 24)
      --dispatcher-heartbeat duration          Dispatcher heartbeat period (ns|us|ms|s|m|h) (default 5s)
      --external-ca external-ca                Specifications of one or more certificate signing endpoints
      --force-new-cluster                      Force create a new cluster from current state
      --listen-addr node-addr                  Listen address (format: <ip|interface>[:port]) (default 0.0.0.0:2377)
      --max-snapshots uint                     Number of additional Raft snapshots to retain
      --snapshot-interval uint                 Number of log entries between Raft snapshots (default 10000)
      --task-history-limit int                 Task history retention limit (default 5)

Initialize the node (note: the first point of initialization is manager)

# 执行命令后会出现下面提示,类似于kubeadm
[[email protected] ~]# docker swarm init --advertise-addr 192.168.10.70
Swarm initialized: current node (i88ecofl4tnoo3ndp7ob8m2bx) is now a manager.

To add a worker to this swarm, run the following command:
    # 在其他节点上执行下面命令,worker节点加入到swarm集群
    docker swarm join --token SWMTKN-1-2nlhgt9g6b0hnk5hhq6ofxteh1ktu24gu4ha7v03ok66i6z9jd-9o603r396hw3yzome71b0ydgy 192.168.10.70:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

Calculate the above command on the worker node

[[email protected] ~]# docker swarm join --token SWMTKN-1-2nlhgt9g6b0hnk5hhq6ofxteh1ktu24gu4ha7v03ok66i6z9jd-9o603r396hw3yzome71b0ydgy 192.168.10.70:2377
# 提示节点加入了集群作为一个worker
This node joined a swarm as a worker.

The status of the cluster nodes can be viewed on the manager node

# 可以看到集群内有三个节点,
[[email protected] ~]# docker node ls
ID                            HOSTNAME                STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
i88ecofl4tnoo3ndp7ob8m2bx *   localhost               Ready     Active         Leader           20.10.6
iu98ufck2e8ftogiffy5f91um     localhost.localdomain   Ready     Active                          20.10.6
jang0tvfjrl5c11u54y63o6tc     localhost.localdomain   Ready     Active                          20.10.6

  • Service is equivalent to a container in the swarm cluster
  • docker service create — Equivalent to docker run, but docker run is created locally, and this is created in the swarm cluster
  • The service can be scaled horizontally

Related demo:

#创建service
[[email protected] ~]# docker service create --name demo -d busybox /bin/sh -c "while true;do sleep 4000;done"
lfk2kmdnod84j9i1xlr7g9qjp

# 查看集群内的service
[[email protected] ~]#  docker service ls
ID             NAME      MODE         REPLICAS   IMAGE            PORTS
lfk2kmdnod84   demo      replicated   1/1        busybox:latest   

# 查看service的运行状态,(注意:和docker ps 不同的是,docker service ps 后面要指明service的名称) 
[[email protected] ~]# docker service ps demo
ID             NAME      IMAGE            NODE                    DESIRED STATE   CURRENT STATE                ERROR     PORTS
l4qkr3bpk551   demo.1    busybox:latest   localhost.localdomain   Running         Running about a minute ago 

# 水平扩展service,如下扩展3的demo的eservice
[[email protected] ~]# docker service scale demo=3
 demo scaled to 3
overall progress: 3 out of 3 tasks 
1/3: running   [==================================================>] 
2/3: running   [==================================================>] 
3/3: running   [==================================================>] 
verify: Service converged 
  
# 再次查看,可以看到replicas数量变成了3
[[email protected] ~]# docker service ls
ID             NAME      MODE         REPLICAS   IMAGE            PORTS
lfk2kmdnod84   demo      replicated   3/3        busybox:latest   
 
# 查看详细状态,三个容器调度在不同节点上
[[email protected] ~]# docker service ps demo --filter "DESIRED-STATE=running"
ID             NAME      IMAGE            NODE          DESIRED STATE   CURRENT STATE            ERROR     PORTS
l4qkr3bpk551   demo.1    busybox:latest   swarm-work1   Running         Running 20 minutes ago             
ojzd9e5fuuzl   demo.2    busybox:latest   localhost     Running         Running 15 minutes ago             
kvnl9oa1rn3n   demo.3    busybox:latest   localhost     Running         Running 13 minutes ago   

The swarm cluster has a health check mechanism. When the number of replicas of the service is abnormal, it will be automatically created

# 查看rs数量
[[email protected] ~]# docker service ls
ID             NAME      MODE         REPLICAS   IMAGE            PORTS
lfk2kmdnod84   demo      replicated   3/3        busybox:latest   

# 到swarm-worker1上删除一个容器,测试是否会自动恢复
[[email protected] ~]# docker rm -f 63fcfafa62b9

# 在swarm-work1上删除后,立即到manager上查看,会发现少了一个
[[email protected] ~]# docker service ls
ID             NAME      MODE         REPLICAS   IMAGE            PORTS
lfk2kmdnod84   demo      replicated   2/3        busybox:latest   

# 符几秒后再查看,删除的那个容器又被创建回来了
[[email protected] ~]# docker service ls
ID             NAME      MODE         REPLICAS   IMAGE            PORTS
lfk2kmdnod84   demo      replicated   3/3        busybox:latest   

Delete servicedocker service rm <service名称>

# 删除demo的service
[[email protected] ~]# docker service rm demo

# 再次查看,已经没有了
[[email protected] ~]# docker service ps demo 
no such service: demo


3. Deploy wordpress in swarm cluster

  • Because it involves network communication between multiple machines, it is necessary to create an overlay network
# 创建一个overlay的网络
[[email protected] ~]# docker network create -d overlay demo
p9kbg79gpwkvp1un79eogvo0y

# 查看manager节点的网络信息,创建了demo的overlay网络
[[email protected] ~]# docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
451c3b1f0dce   bridge            bridge    local
p9kbg79gpwkv   demo              overlay   swarm
5adcb662ede3   docker_gwbridge   bridge    local
ef2dd6544dee   host              host      local
5m0vr63uorwl   ingress           overlay   swarm
0051745c4534   none              null      local

# 查看另外两个worker节点,暂时并没有同步demo的网络
[[email protected] ~]# sshpass -f .passwd.txt ssh 192.168.10.20 "docker network ls"
NETWORK ID     NAME              DRIVER    SCOPE
58a3d0b1fb40   bridge            bridge    local
ba35cba6afaf   docker_gwbridge   bridge    local
450b1d5510f0   host              host      local
5m0vr63uorwl   ingress           overlay   swarm
5d7c09a3f93b   none              null      local
[[email protected] ~]# sshpass -f .passwd.txt ssh 192.168.10.30 "docker network ls"
NETWORK ID     NAME              DRIVER    SCOPE
ce9f019a4a3a   bridge            bridge    local
b977f4a38389   docker_gwbridge   bridge    local
71e5f6103085   host              host      local
5m0vr63uorwl   ingress           overlay   swarm
7dc75ea6de05   none              null      local

Create two services, first create mysql service

  • --mount type=volume,source=mysql-data,destination=/var/lib/mysql
  • --mount: Equivalent to -v
  • type=volume: The mount type is volume
  • source=mysql-data: Data volume
  • destination=/var/lib/mysql: Directory in the container
[[email protected] ~]# docker service create --name mysql -d --network demo -e MYSQL_ROOT_PASSWORD=123456 -e MYSQL_DATABASE=wordpress --mount type=volume,source=mysql-data,destination=/var/lib/mysql mysql

# 查看mysql这个service信息,运行在manager上
[[email protected] ~]# docker service ps mysql
ID             NAME      IMAGE          NODE        DESIRED STATE   CURRENT STATE            ERROR     PORTS
66auhvk03hl4   mysql.1   mysql:latest   localhost   Running         Running 58 seconds ago             

Create wordpress

# 创建wordpress的service
[[email protected] ~]# docker service create --name wordpress -d --network demo -p 80:80 --env MYSQL_DB_HOST=mysql:3306 --env MYSQL_DB_USER=root --env MYSQL_DDB_PASSWORD=123456 wordpress

# 查看wordpress运行在哪个节点上,发现在swarm-work1节点
[[email protected] ~]# docker service ps wordpress 
ID             NAME          IMAGE              NODE          DESIRED STATE   CURRENT STATE              ERROR     PORTS
oxcqgg0h8851   wordpress.1   wordpress:latest   swarm-work1   Running         Preparing 17 seconds ago        

# 到 swarm-work1 节点上查看,运行了wordpress 容器
[[email protected] ~]# docker ps -a
CONTAINER ID   IMAGE              COMMAND                  CREATED          STATUS              PORTS                                       NAMES
2e55903404da   wordpress:latest   "docker-entrypoint.s…"   28 seconds ago   Up 27 seconds       80/tcp                                      wordpress.1.rzywcpd3li0hhhv2p9jdzp34i

Visit the ip test of swarm-work1, and successfully access

Insert picture description here


the ip access test using manager and swarm-work2, and found that they are all accessible. At

Insert picture description here


this time, we are checking the network information of swarm-work1 and swarm-work2 and found that it is scheduled to service The demo network appears on swarm-work1 of the node, but there is no demo network on swarm-work2 that is not scheduled

[[email protected] ~]# sshpass -f .passwd.txt ssh 192.168.10.20 "docker network ls"
NETWORK ID     NAME              DRIVER    SCOPE
ca3a3bd1b7c8   bridge            bridge    local
ba35cba6afaf   docker_gwbridge   bridge    local
450b1d5510f0   host              host      local
5m0vr63uorwl   ingress           overlay   swarm
5d7c09a3f93b   none              null      local

[[email protected] ~]# sshpass -f .passwd.txt ssh 192.168.10.30 "docker network ls"
NETWORK ID     NAME              DRIVER    SCOPE
a9d9db94cade   bridge            bridge    local
p9kbg79gpwkv   demo              overlay   swarm
b977f4a38389   docker_gwbridge   bridge    local
71e5f6103085   host              host      local
5m0vr63uorwl   ingress           overlay   swarm
7dc75ea6de05   none              null      local

2. Routing Mesh for cluster service communication

  • Swarm has its own service discovery function: when a service is created on the overlay network, swarm will automatically generate a virtual ip and bind the service, so that it can communicate no matter how the ip of the service container changes. The virtual ip communicates with the container ip of the service through LVS technology.
Insert picture description here
# 创建一个whoami的service,这个容器是可以web访问的,返回主机名
[[email protected] ~]# docker service create --name whoami -p 8000:8000 --network demo -d jwilder/whoami

[[email protected] ~]# docker service ps whoami 
ID             NAME       IMAGE                   NODE          DESIRED STATE   CURRENT STATE            ERROR     PORTS
nam97zd7o28h   whoami.1   jwilder/whoami:latest   swarm-work2   Running         Running 39 seconds ago    

# 访问whoami服务
[[email protected] ~]# curl 192.168.10.20:8000
I'm c7d008059b94

----------------------------------------------------------
# 运行一个busybox容器
[[email protected] ~]# docker service create --name client -d --network demo busybox sh -c "while true;do sleep 3000;done"

# 查看这个service容器,在swarm-work2节点上
[[email protected] ~]# docker service ps client --filter "desired-state=ready"
ID             NAME       IMAGE            NODE          DESIRED STATE   CURRENT STATE                  ERROR     PORTS
2tpreifgg2gp   client.1   busybox:latest   swarm-work2   Ready           Ready less than a second ago    

# 进入client容器,ping测试whoami,发现能ping通,目标地址是10.0.1.14
[[email protected] ~]# docker exec -it 5f sh
/ # ping whoami
PING whoami (10.0.1.14): 56 data bytes
64 bytes from 10.0.1.14: seq=0 ttl=64 time=0.147 ms
64 bytes from 10.0.1.14: seq=1 ttl=64 time=0.074 ms
64 bytes from 10.0.1.14: seq=2 ttl=64 time=0.126 ms
 
# 查看whoami地址,发现这个10.0.1.14地址并不是 whoami 的地址,其实是swarm绑定service的虚拟地址
[[email protected]rm-work2 ~]# docker ps -a
CONTAINER ID   IMAGE                   COMMAND       CREATED          STATUS          PORTS      NAMES
c7d008059b94   jwilder/whoami:latest   "/app/http"   14 minutes ago   Up 14 minutes   8000/tcp   whoami.1.nam97zd7o28huqpn2rjnqll22

[[email protected] ~]# docker exec  -it c7 ip a


# 对上面的实验进行扩展,将whoami进行扩容,看看ping的地址是否会变化
[[email protected] ~]# docker service scale whoami=3

# 进入client的容器进行ping验证,目标地址没变
[[email protected] ~]# docker exec -it 5f sh
/ # ping whoami
PING whoami (10.0.1.14): 56 data bytes
64 bytes from 10.0.1.14: seq=0 ttl=64 time=0.217 ms
64 bytes from 10.0.1.14: seq=1 ttl=64 time=0.071 ms
64 bytes from 10.0.1.14: seq=2 ttl=64 time=0.076 ms

# 可以通过 nslookup tasks.whoami 解析出whoami的真正的地址
/ # nslookup tasks.whoami

Insert picture description here
  • Internal: The access between Container and Container is through the overlay network (through vip, virtual ip)
  • Ingress: If the service has a binding interface, the secondary service can be accessed through the corresponding interface of any swarm node

1. Internal Load Balancing

Insert picture description here


Insert picture description here

2. Ingress Network

Insert picture description here
  • The phenomenon of ingress network: For example, a web service container runs on node1. After ingress, we can access this service by visiting other nodes such as node2. Ingress does an ip+port forwarding.
  • Ingress is implemented through ipvs technology. You can view the forwarding rules by viewing the nat table of iptables
# 例如查看iptables规则,发现会将任何访问8000端口的转发到172.30.0.2:8000上
[[email protected] ~]# iptables -t nat -L -v
……省略部分
Chain DOCKER-INGRESS (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    2   104 DNAT       tcp  --  any    any     anywhere             anywhere             tcp dpt:irdmi to:172.30.0.2:8000
    6   312 DNAT       tcp  --  any    any     anywhere             anywhere             tcp dpt:http to:172.30.0.2:80
  655 40130 RETURN     all  --  any    any     anywhere             anywhere            

# 现在查看172.30.0.2这个地址是什么地址
# 1. 用brctl show 命令,可以查看到docker_gwbridge网络上连接了4个接口
[[email protected] ~]# brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.02426f275910	no		
docker_gwbridge		8000.0242c9d88f33	no		veth4a9c734
							veth4af5146
							vethbb93b94
							vethff15d0c
virbr0		8000.525400893c89	yes		virbr0-nic

# 2. 查看docker_gwbridge的详细信息,看到ingress-sbox这个ingress网络地址恰好就是 172.30.0.2,因此可以判断,ingress其实是将路由转发到 docker_gwbridge 这个网络上来了
[[email protected] ~]# docker network inspect docker_gwbridge
……省略部分
        "Containers": {
            "18711b958fff1b824b6278c80cbb0f7bc30c131420c99e46d894f5e9cf4427f0": {
                "Name": "gateway_9faea5e8a179",
                "EndpointID": "4e04a4c2e83d3caefe6095735ed8bee58484be2bbf3398ae7cdd421aea4b3cbd",
                "MacAddress": "02:42:ac:1e:00:05",
                "IPv4Address": "172.30.0.5/16",
                "IPv6Address": ""
            },
            "5fce69392cbd8247e593e028582fc753287ffa734fd54af314e775fb6abf1516": {
                "Name": "gateway_1fa8b3ba7c37",
                "EndpointID": "b11f0f4979488be6aa1eeff2e9b5b4eca2f55a8bc0cca0cda060a1023d90db82",
                "MacAddress": "02:42:ac:1e:00:04",
                "IPv4Address": "172.30.0.4/16",
                "IPv6Address": ""
            },
            "9b6454d949244e9ce19304bbf6d40f86e6f5cdc58ddbc2133a32bd3098a6f5a5": {
                "Name": "gateway_77bf3eda0db8",
                "EndpointID": "4949a7aaeb13e8aaade37461290b448e4557e0c57a0490db0616177b650a41d7",
                "MacAddress": "02:42:ac:1e:00:03",
                "IPv4Address": "172.30.0.3/16",
                "IPv6Address": ""
            },
            "ingress-sbox": {
                "Name": "gateway_ingress-sbox",
                "EndpointID": "b7eb71949d02f8251f6ca06b15a33646b310dc47228c3d7c1cfcd4e3859b3e70",
                "MacAddress": "02:42:ac:1e:00:02",
                "IPv4Address": "172.30.0.2/16",
                "IPv6Address": ""
            }
……

The network diagram is as follows

Insert picture description here
# 查看docker的ingress网络,在/var/run/docker/netns/这个目录下可以查看到
[[email protected] ~]# ls /var/run/docker/netns/
1-5m0vr63uor  1fa8b3ba7c37  1-p9kbg79gpw  7745f2003e1d  77bf3eda0db8  9faea5e8a179  ingress_sbox  lb_p9kbg79gp

# 进入到ingress_sbox的network namespace
[[email protected] ~]# nsenter --net=/var/run/docker/netns/ingress_sbox
# 查看 ingress_sbox 的地址
[[email protected] ~]# ip a
……省略部分
119: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:1e:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 172.30.0.2/16 brd 172.30.255.255 scope global eth1
       valid_lft forever preferred_lft forever

# 在 ingress_sbox的network namespace 查看iptables,可以看到ipvs的负载均衡规则
[[email protected] ~]# iptables -nL -v -t mangle
Chain PREROUTING (policy ACCEPT 33 packets, 2534 bytes)
 pkts bytes target     prot opt in     out     source               destination         
   54  7780 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:80 MARK set 0x103
   19  1598 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8000 MARK set 0x104

Chain INPUT (policy ACCEPT 19 packets, 1598 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.0.0.5             MARK set 0x103
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.0.0.7             MARK set 0x104
……省略部分

# 使用ipvsadm -l命令也可以查看ipvs负载均衡的规则,rr是轮询
[[email protected] ~]# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  259 rr
  -> 10.0.0.6:0                   Masq    1      0          0         
FWM  260 rr
  -> 10.0.0.8:0                   Masq    1      0          0         
  -> 10.0.0.9:0                   Masq    1      0          0         
  -> 10.0.0.10:0                  Masq    1      0          0