Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.4.0Version Server List 503 #4573

Closed
Oodachi opened this issue Dec 25, 2020 · 17 comments
Closed

1.4.0Version Server List 503 #4573

Oodachi opened this issue Dec 25, 2020 · 17 comments
Assignees
Labels
follow up this problem requires continuous follow-up kind/research status/need feedback

Comments

@Oodachi
Copy link

Oodachi commented Dec 25, 2020

  • OS: Centos
  • Version nacos-server 1.4.0
    image
    image

我使用docker部署集群模式, 部署后已经手动删除~/data/protocol, 并进行了重启. 现在 服务注册发现在我的项目中没有问题, 我想通过WEB UI来观察当前提供服务的实例. 我需要在页面点击查询三次才能看到我想要的信息, 进入详情需要刷新三次才能看到我想要的信息.

I use docker to deploy the cluster mode. After deployment, I have manually deleted ~/data/protocol and restarted it. Now the service registration found that there is no problem in my project. I want to observe the current service instance through the WEB UI. I need Click the query three times on the page to see the information I want, and enter the details to refresh three times to see the information I want.

我分别观察各个实例的 nacos.log 发现 错误的请求总会引发出现这个异常 java.lang.IllegalStateException: old raft protocol already stop 而正确的请求没有这种情况.

I observed the nacos.log of each instance and found that wrong requests always cause this exception java.lang.IllegalStateException: old raft protocol already stop, but correct requests do not have this situation.

image

@KomachiSion KomachiSion added follow up this problem requires continuous follow-up kind/research labels Dec 25, 2020
@Oodachi
Copy link
Author

Oodachi commented Dec 25, 2020

我尝试使用IP:PORT方式进行访问, 发现其中两个节点访问 服务列表 必然503

I tried to use the IP:PORT method to access, and found that two of the nodes access the service list must be 503

@KomachiSion
Copy link
Collaborator

我试了一下 确实是会报有这个old raft stop的错误,但是不会返回503.

能看一下返回503的时候的错误信息吗?

@KomachiSion
Copy link
Collaborator

另外, 第二张截图的节点元数据,全部打开看一下,截图或者复制贴出来看一下

@KomachiSion
Copy link
Collaborator

@dodolook8716
我清理环境后在docker环境用1.4.0版本重新搭建了一个集群,并没有一直出现old raft stop的错误,且没有出现503的错误。

然后我又在物理机环境搭建了3台1.4.0版本的集群,也没有出现old raft stop的错误,和503的错误。

能看一下返回503的时候的错误信息吗?
另外, 第二张截图的节点元数据,全部打开看一下,截图或者复制贴出来看一下

能麻烦提供一下这两个信息吗? 否则我们没法复现问题,只能关闭issue了

@Oodachi
Copy link
Author

Oodachi commented Dec 31, 2020

@dodolook8716
我清理环境后在docker环境用1.4.0版本重新搭建了一个集群,并没有一直出现old raft stop的错误,且没有出现503的错误。

然后我又在物理机环境搭建了3台1.4.0版本的集群,也没有出现old raft stop的错误,和503的错误。

能看一下返回503的时候的错误信息吗?
另外, 第二张截图的节点元数据,全部打开看一下,截图或者复制贴出来看一下

能麻烦提供一下这两个信息吗? 否则我们没法复现问题,只能关闭issue了

感谢你的关注.

503的错误信息截图:
image

我分别ip:port方式访问了三个节点, 通过devtool 将 /v1/core/cluster/nodes 响应json 复制出来,

node1.json.txt
node2.json.txt
node3.json.txt

@KomachiSion
Copy link
Collaborator

KomachiSion commented Jan 4, 2021

@dodolook8716 方便吧3个节点的服务端日志全部打包上传一下吗?

服务端返回server is Down now,应该是raft选主没有成功,导致状态一直是非启动。

@lx308033262
Copy link

我用k8s搭建的 感觉rafe选举还是有些问题 不太确定和k8s网络是否有关系
有些节点不正常 不知道怎么传图片 显示如下
{
"adWeight": "0",
"raftPort": "7848",
"site": "unknow",
"weight": "1"
}

有些节点正常
{
"adWeight": "0",
"lastRefreshTime": 1610612210988,
"naming": {
"ip": "nacos-3.nacos-headless.kifs.svc.cluster.local:8848",
"heartbeatDueMs": 2958,
"term": -1,
"leaderDueMs": 990,
"state": "FOLLOWER"
},
"raftMetaData": {
"metaDataMap": {
"naming_persistent_service": {
"leader": "nacos-3.nacos-headless.kifs.svc.cluster.local:7848",
"raftGroupMember": [
"nacos-1.nacos-headless.kifs.svc.cluster.local:7848",
"nacos-0.nacos-headless.kifs.svc.cluster.local:7848",
"nacos-3.nacos-headless.kifs.svc.cluster.local:7848",
"nacos-4.nacos-headless.kifs.svc.cluster.local:7848"
],
"term": 8
}
}
},
"raftPort": "7848",
"site": "unknow",
"version": "1.4.0",
"weight": "1"
}

@lx308033262
Copy link

[root@nacos-4 nacos]# curl http://nacos-2.nacos-headless.kifs.svc.cluster.local:8848/nacos/v1/ns/raft/state
caused: old raft protocol already stop;[root@nacos-4 nacos]#

@KomachiSion
Copy link
Collaborator

raft的关键日志一般在 alipay-jraft.log , protocol-raft.log, naming-raft.log, nacos.log

把这几个文件上传一下吧。 几个节点都需要

@stalary
Copy link

stalary commented Jan 20, 2021

我从1.3.1升级到1.4.1也遇到了这个问题

@admintertar
Copy link

admintertar commented Feb 28, 2021

也遇到这个问题,新鲜出炉的日志

nacos-node-0.tar.gz

nacos-node-1.tar.gz

nacos-node-2.tar.gz

@admintertar
Copy link

k8s节点之间都是互通的
image

@admintertar
Copy link

image
image

@admintertar
Copy link

无法注册的服务日志
image

@a386572631
Copy link

你好,这个问题有解决了么?我 k8s 1.17,nacos1.4.1也遇到了一样的问题。

@liurenbao
Copy link

我也是遇到了这个问题,生产出现故障多次了,更新pod后就会异常,nacos要重新部署才恢复

@wuwu955
Copy link

wuwu955 commented Apr 1, 2021

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
follow up this problem requires continuous follow-up kind/research status/need feedback
Projects
None yet
Development

No branches or pull requests

9 participants