Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv always has one that can't start. #8691

Closed
894383755 opened this issue Dec 14, 2018 · 12 comments
Closed

tikv always has one that can't start. #8691

894383755 opened this issue Dec 14, 2018 · 12 comments
Assignees
Labels
type/question The issue belongs to a question.

Comments

@894383755
Copy link

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    我使用ansible启动tidb集群(3tikv、3pd、1tidb)

image
_20181214114541
_20181214114549
_20181214114549
_20181214114553
image

  1. What did you expect to see?
    tidb 集群正常启动。

  2. What did you see instead?
    ansible-playbook start.yml 时,只报tidb错误,但发现tikv有一台进程不存在。
    我再次重启(这次配置2个tikv),仍只报tidb错误,有一台tikv进程不存在。
    我再次重启(这次只配置1个tikv),仍只报tidb错误,有一台tikv进程不存在。

  3. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

@siddontang
Copy link
Member

maybe you meet the same problem here #8683 (comment)

can you check it at first? @894383755

@shenli shenli added the type/question The issue belongs to a question. label Dec 14, 2018
@894383755
Copy link
Author

894383755 commented Dec 14, 2018

@shenli 好像不一样,我这边tikv日志没有报错,也没有升级,都是从新安装的。
不正常的:
image
正常的:
image
12月 14 22:54:21 CT7-85 systemd[1]: tikv-20160.service holdoff time over, scheduling restart.
12月 14 22:54:21 CT7-85 systemd[1]: Started tikv-20160 service.
12月 14 22:54:21 CT7-85 systemd[1]: Starting tikv-20160 service...
12月 14 22:54:21 CT7-85 run_tikv.sh[28564]: sync ...
12月 14 22:54:21 CT7-85 run_tikv.sh[28564]: real 0m0.095s
12月 14 22:54:21 CT7-85 systemd[1]: tikv-20160.service: main process exited, code=exited, status=1/FAILURE
12月 14 22:54:21 CT7-85 systemd[1]: Unit tikv-20160.service entered failed state.
12月 14 22:54:21 CT7-85 systemd[1]: tikv-20160.service failed.

@siddontang
Copy link
Member

any message in tikv error log?

@894383755
Copy link
Author

tikv.log:
2018/12/17 16:51:51.202 INFO util.rs:406: connecting to PD endpoint: "192.100.1.79:2379"
2018/12/17 16:51:51.207 INFO util.rs:406: connecting to PD endpoint: "192.100.1.80:2379"
2018/12/17 16:51:51.209 INFO util.rs:406: connecting to PD endpoint: "192.100.1.89:2379"
2018/12/17 16:51:51.211 INFO util.rs:406: connecting to PD endpoint: "http://192.100.1.80:2379"
2018/12/17 16:51:51.213 INFO util.rs:406: connecting to PD endpoint: "http://192.100.1.79:2379"
2018/12/17 16:51:51.215 INFO util.rs:465: connected to PD leader "http://192.100.1.79:2379"
2018/12/17 16:51:51.215 INFO util.rs:394: All PD endpoints are consistent: ["192.100.1.79:2379", "192.100.1.80:2379", "192.100.1.89:2379"]
2018/12/17 16:51:51.217 INFO tikv-server.rs:421: connect to PD cluster 6634897751991172956
2018/12/17 16:51:51.227 INFO mod.rs:343: starting working thread: addr-resolver
2018/12/17 16:51:51.443 INFO mod.rs:343: starting working thread: storage-scheduler
2018/12/17 16:51:51.443 INFO mod.rs:343: starting working thread: gc-worker
2018/12/17 16:51:51.443 INFO mod.rs:597: Storage (RaftKv engine) started.
2018/12/17 16:52:06.640 INFO mod.rs:27: Welcome to TiKV.
Release Version: 3.0.0-alpha
Git Commit Hash: e772a8cba1cbeeba1f6a001c4cf68724e3a2392e
Git Commit Branch: master
UTC Build Time: 2018-11-28 02:04:34
Rust Version: rustc 1.29.0-nightly (4f3c7a472 2018-07-17)
2018/12/17 16:52:06.641 INFO tikv-server.rs:398: using config:
tikv_stderr.log:
image

@siddontang
Copy link
Member

PTAL @LinuxGit

seem that the TiKV starts successfully, but ansible shows failure, strange.

@LinuxGit
Copy link

$ cd /home/tidb/deploy/scripts
$ bash -x run_tikv.sh

And check if there're any errors.

@894383755
Copy link
Author

error tikv:
[tidb@ct7-85-tikv scripts]$ bash -x run_tikv.sh

  • set -e
  • ulimit -n 1000000
  • cd /home/tidb/deploy
  • export RUST_BACKTRACE=1
  • RUST_BACKTRACE=1
  • export TZ=/etc/localtime
  • TZ=/etc/localtime
  • echo -n 'sync ... '
    sync ... ++ sync

real 0m0.002s
user 0m0.002s
sys 0m0.000s

  • stat=

  • echo ok
    ok

  • echo

  • echo 2204

  • exec bin/tikv-server --addr 0.0.0.0:20160 --advertise-addr 192.100.1.85:20160 --pd 192.100.1.89:2379,192.100.1.90:2379,192.100.1.91:2379 --data-dir /home/tidb/deploy/data --config conf/tikv.toml --log-file /home/tidb/deploy/log/tikv.log
    非法指令

when i use:
[root@ct7-85-tikv deploy]# ./bin/tikv-server --addr 0.0.0.0:20160 --advertise-addr 192.100.1.85:20160 --pd 192.100.1.89:2379,192.100.1.90:2379,192.100.1.91:2379 --data-dir /home/tidb/deploy/data --config conf/tikv.toml --log-file /home/tidb/deploy/log/tikv.log
i see:
非法指令

when i use :
./bin/tikv-server --addr 0.0.0.0:20160 --advertise-addr 192.100.1.85:20160 --pd 192.100.1.89:2379,192.100.1.90:2379,192.100.1.91:2379 --data-dir /home/tidb/deploy/data
i see:
2018/12/29 01:57:59.377 INFO mod.rs:343: starting working thread: storage-scheduler
2018/12/29 01:57:59.377 INFO mod.rs:343: starting working thread: gc-worker
2018/12/29 01:57:59.377 INFO mod.rs:597: Storage (RaftKv engine) started.
非法指令

@LinuxGit
Copy link

What's your Linux system destro and version? Could you post the result of the following commnad?

$ uname -a

It seems that the binaries could not run on your system.

@894383755
Copy link
Author

[root@ct7-85-tikv ~]# uname -a
Linux ct7-85-tikv 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

There's always a problem with tikv, and all the other tikv's working.

@LinuxGit
Copy link

Please check if the tikv-server binary on the node is the same as others.

$ md5sum tikv-server

@894383755
Copy link
Author

same as others
44f6c8ce5fc5b553e0a45cde7358e2f0 bin/tikv-server
44f6c8ce5fc5b553e0a45cde7358e2f0 bin/tikv-server
44f6c8ce5fc5b553e0a45cde7358e2f0 bin/tikv-server

I also duplicate the replacement mode and can't start it properly.

I found that I could install tikv-server in another way and start it successfully (use tidb-latest-linux-amd64.tar.gz)
but i find other tikv-server is error:
2018/12/29 19:16:18.412 INFO util.rs:465: connected to PD leader "http://192.100.1.89:2379"
2018/12/29 19:16:18.412 INFO util.rs:394: All PD endpoints are consistent: ["192.100.1.89:2379", "192.100.1.90:2379", "192.100.1.91:2379"]
2018/12/29 19:16:18.414 INFO tikv-server.rs:421: connect to PD cluster 6640352509811249458
2018/12/29 19:16:18.426 INFO mod.rs:343: starting working thread: addr-resolver
2018/12/29 19:16:33.648 INFO mod.rs:27: Welcome to TiKV.

@LinuxGit
Copy link

What's your tidb-version in inventory.ini?
It seems that your tikv binary has some compile problem. Could you download latest version of tidb-ansible, and run following command to download specified version.

ansible-playbook local_prepare.yml

https://github.com/pingcap/docs-cn/blob/master/op-guide/ansible-deployment.md#在中控机器上下载-tidb-ansible

If you want to upgrade, you could refer to https://github.com/pingcap/docs-cn/blob/master/op-guide/tidb-v2.1-upgrade-guide.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question The issue belongs to a question.
Projects
None yet
Development

No branches or pull requests

4 participants