一、搭建架构

● 3个或3个以上Nacos节点才能构成集群;
● Nacos Nginx Proxy用于代理转发;


二、准备工作

Nacos2.0版本相比1.X新增了gRPC的通信方式,因此需要增加2个端口。新增端口是在配置的主端口(server.port)基础上,进行一定偏移量自动生成。

端口 与主端口的偏移量 描述
8848 0 Nacos程序主配置端口
9848 +1000 客户端gRPC请求服务端端口,用于客户端向服务端发起连接和请求
9849 +1001 服务端gRPC请求服务端端口,用于服务间同步等
7848 -1000 Nacos 集群通信端口,用于Nacos 集群间进行选举,检测等

使用VIP/nginx请求时,需要配置成TCP转发,不能配置http2转发,否则连接会被nginx断开

按照上述官方的端口分配要求 ,此处部署的使用三台服务器上面创建的Nacos集群端口分配如下:

节点 IP 端口(所需暴露) 备注 版本 当前线下环境部署文件路径
Nacos_1 192168.18.73 宿主机:8858,9858,9859,7858 容器:8858,9858,9859,7858 Nacos 节点一 nacos/nacos-server:2.0.2 /root/nacos-deploy/
Nacos_2 192168.18.74 宿主机:8858,9858,9859,7858 容器:8858,9858,9859,7858 Nacos 节点二 nacos/nacos-server:2.0.2 /root/nacos-deploy/
Nacos_3 192168.18.75 宿主机:8858,9858,9859,7858 容器:8858,9858,9859,7858 Nacos 节点三 nacos/nacos-server:2.0.2 /root/nacos-deploy/
Nacos DB 192168.18.75 宿主机:3306 容器:3306 Nacos所需数据库 mysql:5.7.34 /root/nacos-mysql-deploy
Nacos Nginx Proxy 192168.18.75 宿主机:80 容器:80 Nacos代理 nginx:stable-alpine /root/nacos-proxy-deploy

2.1 创建Nacos数据库

这里使用的是容器化运行nacos,需要创建一个数据库,并从官方对应的版本中去导入基本数据库结构的数据文件

2.1.1 获取nacos 2.0.2 包文件中的基础表结构数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
https://github.com/alibaba/nacos/releases/download/2.0.2/nacos-server-2.0.2.tar.gz

# tar -xf nacos-server-2.0.2.tar.gz
# cd nacos
# tree -L 2
.
├── bin
│ ├── shutdown.cmd
│ ├── shutdown.sh
│ ├── startup.cmd
│ └── startup.sh
├── conf
│ ├── 1.4.0-ipv6_support-update.sql
│ ├── application.properties
│ ├── application.properties.example
│ ├── cluster.conf.example
│ ├── nacos-logback.xml
│ ├── nacos-mysql.sql
│ └── schema.sql
├── LICENSE
├── NOTICE
└── target
└── nacos-server.jar
以上是nacos的二进制包文件 ,这里只用到了 nacos-mysql.sql 这个文件

2.1.2 创建mysql服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 192.168.18.75
sudo -i
mkdir /data/mysql
cat << EOF > /root/nacos-mysql-deploy/mysql.yaml
version: "3"
services:
mysql-db:
container_name: nacos-mysql
image: mysql:5.7.34
ports:
- "3306:3306"
environment:
MYSQL_ROOT_PASSWORD: xxxxxxxxxxxx
volumes:
- "/data/mysql:/var/lib/mysql"
restart: always
EOF

docker-compose -f /root/nacos-mysql-deploy/mysql.yaml up -d

# 导入官方表数据
docker cp /root/nacos/conf/nacos-mysql.sql nacos-mysql:/tmp

docker exec -it nacos-mysql sh

mysql -uroot -pxxxxxxxxxxxx

create database nacos;

use nacos;

source /tmp/nacos-mysql.sql;

2.2 Naocs服务

2.2.1 创建、编写yaml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
sudo -i
mkdir /data/nacos2.0.2_1/logs -p
mkdir /root/nacos-deploy/
cat << EOF > /root/nacos-deploy/nacos1.yaml
version: "3"
services:
nacos1:
hostname: nacos2.0.2_1
container_name: nacos2.0.2_1
image: nacos/nacos-server:2.0.2
volumes:
- /data/nacos2.0.2_1/logs:/home/nacos/logs
- /data/nacos2.0.2_1/custom.properties:/home/nacos/init.d/custom.properties
ports:
- "8858:8858"
- "9858:9858"
- "9859:9859"
- "7858:7858"
environment:
- MODE=cluster
- PREFER_HOST_MODE=hostname
- NACOS_SERVER_IP=192.168.18.73
- NACOS_APPLICATION_PORT=8858
- NACOS_SERVERS=192.168.18.73:8858 192.168.18.74:8858 192.168.18.75:8858
- SPRING_DATASOURCE_PLATFORM=mysql
- MYSQL_SERVICE_HOST=192.168.18.75
- MYSQL_SERVICE_DB_NAME=nacos
- MYSQL_SERVICE_PORT=3306
- MYSQL_SERVICE_USER=root
- MYSQL_SERVICE_PASSWORD=xxxxxxxxxxx
- NACOS_AUTH_ENABLE=true
- MYSQL_DATABASE_NUM=1
restart: always
EOF
######################
sudo -i
mkdir /data/nacos2.0.2_2/logs -p
mkdir /root/nacos-deploy/
cat << EOF > /root/nacos-deploy/nacos2.yaml
version: "3"
services:
nacos1:
hostname: nacos2.0.2_2
container_name: nacos2.0.2_2
image: nacos/nacos-server:2.0.2
volumes:
- /data/nacos2.0.2_2/logs:/home/nacos/logs
- /data/nacos2.0.2_2/custom.properties:/home/nacos/init.d/custom.properties
ports:
- "8858:8858"
- "9858:9858"
- "9859:9859"
- "7858:7858"
environment:
- MODE=cluster
- PREFER_HOST_MODE=hostname
- NACOS_SERVER_IP=192.168.18.74
- NACOS_APPLICATION_PORT=8858
- NACOS_SERVERS=192.168.18.73:8858 192.168.18.74:8858 192.168.18.75:8858
- SPRING_DATASOURCE_PLATFORM=mysql
- MYSQL_SERVICE_HOST=192.168.18.75
- MYSQL_SERVICE_DB_NAME=nacos
- MYSQL_SERVICE_PORT=3306
- MYSQL_SERVICE_USER=root
- MYSQL_SERVICE_PASSWORD=xxxxxxxxxxx
- NACOS_AUTH_ENABLE=true
- MYSQL_DATABASE_NUM=1
restart: always
EOF
######################
sudo -i
mkdir /data/nacos2.0.2_3/logs -p
mkdir /root/nacos-deploy/
cat << EOF > /root/nacos-deploy/nacos2.yaml
version: "3"
services:
nacos1:
hostname: nacos2.0.2_3
container_name: nacos2.0.2_3
image: nacos/nacos-server:2.0.2
volumes:
- /data/nacos2.0.2_3/logs:/home/nacos/logs
- /data/nacos2.0.2_3/custom.properties:/home/nacos/init.d/custom.properties
ports:
- "8858:8858"
- "9858:9858"
- "9859:9859"
- "7858:7858"
environment:
- MODE=cluster
- PREFER_HOST_MODE=hostname
- NACOS_SERVER_IP=192.168.18.75
- NACOS_APPLICATION_PORT=8858
- NACOS_SERVERS=192.168.18.73:8858 192.168.18.74:8858 192.168.18.75:8858
- SPRING_DATASOURCE_PLATFORM=mysql
- MYSQL_SERVICE_HOST=192.168.18.75
- MYSQL_SERVICE_DB_NAME=nacos
- MYSQL_SERVICE_PORT=3306
- MYSQL_SERVICE_USER=root
- MYSQL_SERVICE_PASSWORD=xxxxxxxxxxx
- NACOS_AUTH_ENABLE=true
- MYSQL_DATABASE_NUM=1
restart: always
EOF

以上 各配置中的端口非标准端口,需要注意环境变量 NACOS_APPLICATION_PORT 如果不指定为非特定端口,那么缺省在配置中的环境变量则为 8848 ,在同一台机器上运行三个节点会出现问题!

2.2.2 运行Nacos

各节点运行即可

1
2
3
4
5
6
7
8
# 192.168.18.73
docker-compose -f /root/nacos-deploy/nacos1.yaml up -d

# 192.168.18.74
docker-compose -f /root/nacos-deploy/nacos2.yaml up -d

# 192.168.18.75
docker-compose -f /root/nacos-deploy/nacos3.yaml up -d

上图 三个节点单独访问页面正常;通过代理访问正常;

注意:web页面并没有使用到客户端,服务端gRPC调用服务,但是在程序中需要请求对应的端口,所以务必需要暴露出来,因为使用了非标准端口,那么在nginx做代理时就直接(伪装)映射成标准端口即可。

2.3 Nginx代理配置

此处仍然使用容器化方式运行。
配置nginx代理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
mkdir /root/nacos-proxy-deploy
cd /root/nacos-proxy-deploy
cat <<EOF > nginx.conf
user nginx;
worker_processes auto;

error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;


events {
worker_connections 65535;
}

stream {
upstream nacos-server-grpc9848 {
server 192.168.18.73:9858 max_fails=1 fail_timeout=30s;
server 192.168.18.74:9858 max_fails=1 fail_timeout=30s;
server 192.168.18.75:9858 max_fails=1 fail_timeout=30s
}

server {
listen 9848;
proxy_pass nacos-server-grpc9848;
}

upstream nacos-server-grpc9849 {
server 192.168.18.73:9859 max_fails=1 fail_timeout=30s;
server 192.168.18.74:9859 max_fails=1 fail_timeout=30s;
server 192.168.18.75:9859 max_fails=1 fail_timeout=30s;
}

server {
listen 9849;
proxy_pass nacos-server-grpc9849;
}
}

http {
include /etc/nginx/mime.types;
default_type application/octet-stream;

log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';

access_log /var/log/nginx/access.log main;

sendfile on;
#tcp_nopush on;

keepalive_timeout 65;

#gzip on;

upstream NACOS {
server 192.168.18.73:8858 max_fails=1 fail_timeout=30s;
server 192.168.18.74:8858 max_fails=1 fail_timeout=30s;
server 192.168.18.75:8858 max_fails=1 fail_timeout=30s;
}

server {
listen 8848 default_server;
server_name _;

location / {
proxy_pass http://NACOS;
}

}

include /etc/nginx/conf.d/*.conf;
}
EOF

代理容器yaml文件编写

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > nacos-nginx-proxy.yaml << EOF  
version: "3"
services:
nacos_proxy:
container_name: nacos_nginx_proxy
image: "nginx:stable-alpine"
ports:
- 80: 8848 #这里定义一个80是为了前端访问,或客户端配置定义时无需再加端口,一个ip或者域名即可
- 8848:8848
- 9848:9848
- 9849:9849
volumes:
- /root/nacos-proxy-deploy/nginx.conf:/etc/nginx/nginx.conf:ro
restart: always
EOF

# 容器运行
docker-compose -f nacos-proxy.yaml up -d

至此搭建完毕,但是容器运行正常不一定服务就是正常,这时需要测试实例服务注册是否正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
cat > nacos_check_status.py << EOF
# author: maoqiu.guo
# desc: Nacos服务注册功能测试、状态监控
# date:2022-06-11

import requests
import json
import time
import logging
from logging.handlers import RotatingFileHandler
import sys


logging.basicConfig(level=logging.DEBUG)
# 创建日志记录器,指明日志保存的路径,每个日志文件的最大值,保存的日志文件个数上限
log_handle = RotatingFileHandler('./log.txt', maxBytes=1024*1024, backupCount=5)
# 创建日志记录的格式
formatter = logging.Formatter("format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s-%(funcName)s',")
# 为创建的日志记录器设置日志记录格式
log_handle.setFormatter(formatter)
# 为全局的日志工具对象添加日志记录器
logging.getLogger().addHandler(log_handle)


# 进度条
def progress_bar(timmer):
for i in range(1, 101):
print("\r", end="")
print("Waiting: {}%: ".format(i), "▋" * (i // 2), end="")
sys.stdout.flush()
time.sleep(timmer)

class Nacos():
def __init__(self):
self.webhookurl="https://oapi.dingtalk.com/robot/send?access_token=1968dd11647dfd686c4e1107cf1ad3d0d21c3813a824ee632728327722d9fa82"
self.login_url = "/nacos/v1/auth/users/login"
self.accessToken=None
self.service_url = "/nacos/v1/ns/catalog/services"
self.instance_url = "/nacos/v1/ns/instance"

@staticmethod
def dingding(self, content):

data = {
"msgtype": "text",
"text": {
"content": "Nacos-" + content,
},
}
headers = {'Content-Type': 'application/json;charset=utf-8'}
response = requests.post(self.webhookurl, data=json.dumps(data), headers=headers)
print(response.content)

def get_access_token(self, nacos_server):
logging.info("\n\n************** NacosServer: {0} **************".format(nacos_server))
"登录获取nacos accessToken"
params= "username=nacos&password=nacos"
try:
r = json.loads(requests.post(params=params, url="http://" + nacos_server + self.login_url, timeout=5).text)
accessToken = str((r['accessToken']))
return {"accessToken": accessToken, "result": True, "server": nacos_server, "msg": "Login Nacos Success...[{0}]".format(nacos_server)}
except Exception as e:
return {"accessToken": None, "result": False, "server": nacos_server, "msg": "Login Nacos Falied...[{0}] Error: {1}".format(nacos_server, str(e))}

def register_test(self, accessToken, nacos_server):
'''
1. 创建实例
2. 注册服务
3. 删除实例
4. 删除服务
'''
data={
"serviceName": "test_service_instance",
"namespaceId": "public",
"accessToken": accessToken,
"ip": nacos_server.split(":")[0],
"port": nacos_server.split(":")[1],
"ephemeral": False
}

# 创建服务(注册实例)
try:
r = (requests.post(params=data, url="http://" + nacos_server + self.instance_url, timeout=5).text)
if r == "ok":
msg = "[注册实例&创建服务成功] Service: {0} NameSpace: {1} {2} InstanceIP: {3}".format(data['serviceName'], data['namespaceId'], nacos_server, data['ip'])
logging.info(msg)
else:
msg = "[注册实例&创建服务失败] Service: {0} NameSpace: {1} {2} {3}".format(data['serviceName'], data['namespaceId'], nacos_server, r)
logging.error(msg)
self.dingding(self, content=msg)
except Exception as e:
msg = "[注册实例&创建服务失败] Service: {0} NameSpace: {1} {2} {3}".format(data['serviceName'], data['namespaceId'], nacos_server, r)
logging.error(msg)
self.dingding(self, content=msg)

# 删除实例
def delete_instance_service(self, accessToken, nacos_server):
data={
"serviceName": "test_service_instance",
"namespaceId": "public",
"accessToken": accessToken,
"ip": nacos_server.split(":")[0],
"port": nacos_server.split(":")[1],
"ephemeral": False
}
# 注销实例
try:
r = (requests.delete(params=data, url="http://" + nacos_server + self.instance_url, timeout=5).text)
if r == "ok":
msg = "[注销实例成功] Service: {0} NameSpace: {1} {2} ".format(data['serviceName'], data['namespaceId'], nacos_server)
logging.info(msg)
else:
msg = "[注销实例失败] Service: {0} NameSpace: {1} {2} {3}".format(data['serviceName'], data['namespaceId'], nacos_server, r)
logging.error(msg)
self.dingding(self, content=msg)
except Exception as e:
msg = "[注销实例失败] Service: {0} NameSpace: {1} {2} {3}".format(data['serviceName'], data['namespaceId'], nacos_server, r)
logging.error(msg)
self.dingding(self, content=msg)

progress_bar(timmer=0.08)
# 删除服务
try:
r = (requests.delete(params=data, url="http://" + nacos_server + "/nacos/v1/ns/service", timeout=5).text)
if r == "ok":
msg = "[删除服务成功] Service: {0} NameSpace: {1} {2} ".format(data['serviceName'], data['namespaceId'], nacos_server)
logging.info(msg)
else:
msg = "[删除服务失败] Service: {0} NameSpace: {1} {2} {3}".format(data['serviceName'], data['namespaceId'], nacos_server, r)
logging.error(msg)
self.dingding(self, content=msg)
except Exception as e:
msg = "[删除服务失败] Service: {0} NameSpace: {1} {2} {3}".format(data['serviceName'], data['namespaceId'], nacos_server, r)
logging.error(msg)
self.dingding(self, content=msg)
progress_bar(timmer=0.1)

def get_service_list(self, accessToken, nacos_server):
"请求访问某个环境的某一个服务列表, 检查返回数据是否为空,为空则为不正常"
payload={
"accessToken": accessToken,
"pageNo": 1,
"pageSize": 10,
"namespaceId": "stage"
}
try:
r = json.loads(requests.get(params=payload, url="http://" + nacos_server + self.service_url).text)
if r['count'] == 0:
msg = "[获取注册服务数据失败] Nacos Server {0} 当前Stage服务列表为空!".format(nacos_server)
logging.info(msg)
self.dingding(self, content=msg)
else:
# print("Nacos 注册服务数据正常...[{0}]".format(nacos_server))
logging.info("[获取注册服务数据成功] ServiceTotal {1} on {0} - SercieName: stage ".format(nacos_server,r['count']))

except Exception as e:
pass


def letsgo(self, nacos_server):
# 登录获取token
login_nacos_res = (self.get_access_token(nacos_server))

if login_nacos_res["result"]:
logging.info("[登录获取accessToken 成功] Nacos Server {0} ".format((login_nacos_res['server'])))

self.get_service_list(login_nacos_res['accessToken'], login_nacos_res['server'] )

# 创建实例&服务
self.register_test(login_nacos_res['accessToken'], login_nacos_res['server'] )

#time.sleep(60 * 2)
progress_bar(timmer=0.1)
self.delete_instance_service(login_nacos_res['accessToken'], login_nacos_res['server'] )

else:
msg = "[登录获取accessToken 失败] {0} ".format((login_nacos_res['msg']))
logging.info(msg)
self.dingding(self, content=msg)

if __name__ == "__main__":
while True:
nacos_server_list=[
#"nacos.axiba.com",
"192.168.18.75:80",
"192.168.18.75:8848",
"192.168.18.73:8858",
"192.168.18.74:8858",
"192.168.18.75:8858",
]
for i in nacos_server_list:
Nacos().letsgo(i)
EOF

运行后:

以上脚本通过Nacos的OpenApi循环式通过从每个节点访问后发起以及通过Nginx代理发起服务实例的查询、注册、删除操作。

  1. 登录获取AccessToke(每个请求需要携带)
  2. 在 获取stage环境的服务列表数据,这是为了监测返回数据是否正常,因为这里刚搭建起来,如果里面服务数据不正常则会出现上面提示,并发送消息到钉钉;

上图是通过代理访问获取stage中的服务为0,后发送通知,说明此时可能(因为通过负载并不知道该请求是落在了哪一个Nacos节点)集群中某个节点出现故障数据不正常了,但是在检测脚本中会通过每个节点去注册,那么一定会在出现通知某个节点故障。

  1. 在public环境中通过调用OpenApi方式创建、注册一个实例 test_service_instance 跟服务,然后注销实例,并删除服务,这是为了监测集群是否正常。

三、小记

  1. 虽然文档给出的描述是只有当服务下实例数为0时允许删除,但是实际上并没有强制检查校验服务下的实例数是否为0。Nacos的服务创建有多种方式,可以主动创建可以在注册实例的时候创建,可以在实例发送心跳时创建,所以哪怕主动删除了还会自动创建回来。;
  2. 所有已注册服务在容器重启后会丢失,但是只要保证集群中有一个节点正常,那么重启后的服务数据会同步其他正常节点的,即需要客户端重新注册;
  3. 正常情况在某个节点注册了服务后会同步到其他节点。
  4. 各节点无法选举一个leader时 查看日志:/data/nacos2.0.2_1/logs/alipay-jraft.log
  5. 实例服务相关日志: /data/nacos2.0.2_1/naming-raft.log
  6. 通过日志发现,暴露8848端口是为了nacos客户端登陆获取Token,后续操作都会携带Token进行资源操作;然后暴露9848端口是客户端用于gRPC的请求,所以如果是设置了代理转发,务必将其暴露,否则连接失败失败。

四、吐槽

Nacos 官方为了推商业版,社区版几乎毛病百出,不管不顾,文档也是写得一团糟~ 无力吐槽