Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1105 runtime error: invalid memory address or nil pointer dereference #19041

Closed
cberescu opened this issue Aug 6, 2020 · 21 comments
Closed

1105 runtime error: invalid memory address or nil pointer dereference #19041

cberescu opened this issue Aug 6, 2020 · 21 comments
Assignees
Labels
component/executor priority/P1 The issue has P1 priority. severity/critical type/bug The issue is confirmed as a bug.

Comments

@cberescu
Copy link

cberescu commented Aug 6, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Can not really reproduce, it does not happen all the time.

2. What did you expect to see? (Required)

To not receive a error

3. What did you see instead (Required)

General error: 1105 runtime error: invalid memory address or nil pointer dereference

[2020/08/06 11:11:05.744 +02:00] [ERROR] [misc.go:88] ["panic in the recoverable goroutine"] [r=""invalid memory address or nil pointer dereference""] ["stack trace"="github.com/pingcap/tidb/util.WithRecovery.func1\n\t/home/jenkins/age
nt/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/util/misc.go:90\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:679\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:199\nruntime.sigpanic\n\t/usr/local/go/src/runtime
/signal_unix.go:394\ngithub.com/pingcap/tidb/session.(*TxnState).GetSnapshot\n\t:1\ngithub.com/pingcap/tidb/executor.(*BatchPointGetExec).initialize\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/ti
db/executor/batch_point_get.go:128\ngithub.com/pingcap/tidb/executor.(*BatchPointGetExec).Next\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/batch_point_get.go:89\ngithub.com/pingcap/tidb/executor.N
ext\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/executor.go:248\ngithub.com/pingcap/tidb/executor.(*HashJoinExec).fetchBuildSideRows\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/
pingcap/tidb/executor/join.go:257\ngithub.com/pingcap/tidb/executor.(*HashJoinExec).fetchAndBuildHashTable.func1\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/join.go:675\ngithub.com/pingcap/tidb/ut
il.WithRecovery\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/util/misc.go:93"]

4. Affected version (Required)

v 4.00

5. Root Cause Analysis

@cberescu cberescu added the type/bug The issue is confirmed as a bug. label Aug 6, 2020
@lance6716
Copy link
Contributor

/label component/executor

@lance6716
Copy link
Contributor

github.com/pingcap/tidb/util.WithRecovery.func1
	/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/util/misc.go:90
runtime.gopanic
	/usr/local/go/src/runtime/panic.go:679
runtime.panicmem
	/usr/local/go/src/runtime/panic.go:199
runtime.sigpanic
	/usr/local/go/src/runtime/signal_unix.go:394
github.com/pingcap/tidb/session.(*TxnState).GetSnapshot
	:1
github.com/pingcap/tidb/executor.(*BatchPointGetExec).initialize
	/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/batch_point_get.go:128
github.com/pingcap/tidb/executor.(*BatchPointGetExec).Next
	/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/batch_point_get.go:89
github.com/pingcap/tidb/executor.Next
	/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/executor.go:248
github.com/pingcap/tidb/executor.(*HashJoinExec).fetchBuildSideRows
	/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/join.go:257
github.com/pingcap/tidb/executor.(*HashJoinExec).fetchAndBuildHashTable.func1
	/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/join.go:675
github.com/pingcap/tidb/util.WithRecovery
	/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/util/misc.go:93

@ichn-hu
Copy link
Contributor

ichn-hu commented Aug 25, 2020

@cberescu could you please give more information on this issue? e.g. how do you find it and how it will likely to occur?

@cberescu
Copy link
Author

It is hard to duplicate it, it appears in a complex query, i will attach the query if you think it may help here :

[2020/08/06 11:11:05.756 +02:00] [ERROR] [conn.go:728] ["command dispatched failed"] [conn=8355] [connInfo="id:8355, addr:78.46.249.6:60308 status:10, collation:utf8_general_ci, user:alex"] [command=Query] [status="inTxn:0, autocom
mit:1"] [sql="SELECT ROUND(AVG(users_ratings.value),1) as avgScore, COUNT(users_ratings.value) as totalRatings,verification_requests.last_update as verifiedLastUpdate, profiles.id_user as id,\nprofiles.id as idp,profiles.age,profiles.cou
ntry,profiles.name, profiles.show_price,profiles.show_skype,profiles.score,profiles.min_sum, images.id as idImage,images.ext as imgExt,\nvideos.video_link as videoLink,videos.video_id as videoId,users.username,users.verified,profiles.ins
ert_date as register_date,users.paypermin,\nIF((devices.status=\"online\") AND ((devices.services_status=\"online\") OR (devices.services_status=\"\") OR (devices.version=\"1.4.3/3.0.7\"))   ,1,0) as dstatus,\n\nIFNULL((SELECT username F
ROM users AS studios WHERE studios.id=users.id_parent AND users.id IN (174569,1154157,1151609,1150540,1145234,1145005,1142218,136995,1134450,1134163,1125097,1121530,1117395,1109043,1108471,1107323,1102165,1099851,98445,1097807,1097713,10
93495,1090391,1088241,1084185,1078454,73863,1071451,1055248,52403,1049316,1038740,36081,34029,987778,984532,980564,974397,968443,965822,961565,959942,957710,957146,956951,951304,929905,924820,921862,912553)),\"n/a\") AS studioId,\n\nIFNU
LL((SELECT profiles_types_options.value\n    FROM profiles_types_links\n    INNER JOIN profiles_types_options ON profiles_types_options.id=profiles_types_links.id_option\n        and profiles_types_options.id_type=(SELECT id FROM profile
s_types where name=\"category\" AND id_domain='1')\n    WHERE id_profile=profiles.id AND id_profile IN (28652,166896,166599,167024,165855,166516,165796,29006,166410,164512,163461,163189,162756,161902,162108,161680,161332,160929,14619,161
143,160916,160520,160404,159921,159531,159038,10489,158344,156726,7232,156058,155014,4796,4295,152494,152542,151465,151097,149353,149116,148193,147908,156388,147701,147458,146566,143536,142666,142378,161146)\n        ),\"n/a\") AS catego
ry,\n\n\nIFNULL((SELECT SUBSTRING_INDEX(GROUP_CONCAT(profiles_tags.name),\",\",3)\n    FROM profiles_tags_links\n    INNER JOIN profiles_tags ON profiles_tags.id=profiles_tags_links.id_tag\n    WHERE profiles_tags_links.id_profile=profil
es.id AND profiles_tags_links.id_profile IN (28652,166896,166599,167024,165855,166516,165796,29006,166410,164512,163461,163189,162756,161902,162108,161680,161332,160929,14619,161143,160916,160520,160404,159921,159531,159038,10489,158344,
156726,7232,156058,155014,4796,4295,152494,152542,151465,151097,149353,149116,148193,147908,156388,147701,147458,146566,143536,142666,142378,161146) ),\n        \"n/a\") AS top3tags,\n\nIFNULL((SELECT SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(
profile_images.id,\".\",profile_images.ext)),\",\",3)\n    FROM images as profile_images\n    WHERE profile_images.id_user=profiles.id_user AND profile_images.type=\"gallery\" AND profile_images.id_user IN (174569,1154157,1151609,1150540
,1145234,1145005,1142218,136995,1134450,1134163,1125097,1121530,1117395,1109043,1108471,1107323,1102165,1099851,98445,1097807,1097713,1093495,1090391,1088241,1084185,1078454,73863,1071451,1055248,52403,1049316,1038740,36081,34029,987778,
984532,980564,974397,968443,965822,961565,959942,957710,957146,956951,951304,929905,924820,921862,912553)\n    ),\"\") AS 3images\n\nFROM profiles\nLEFT JOIN users_ratings ON users_ratings.id_user_rated=profiles.id_user AND users_ratings
.value > 0 AND users_ratings.id_user_rated IN (174569,1154157,1151609,1150540,1145234,1145005,1142218,136995,1134450,1134163,1125097,1121530,1117395,1109043,1108471,1107323,1102165,1099851,98445,1097807,1097713,1093495,1090391,1088241,10
84185,1078454,73863,1071451,1055248,52403,1049316,1038740,36081,34029,987778,984532,980564,974397,968443,965822,961565,959942,957710,957146,956951,951304,929905,924820,921862,912553)\nLEFT JOIN verification_requests ON verification_reque
sts.id_user=profiles.id_user AND verification_requests.id_user IN (174569,1154157,1151609,1150540,1145234,1145005,1142218,136995,1134450,1134163,1125097,1121530,1117395,1109043,1108471,1107323,1102165,1099851,98445,1097807,1097713,109349
5,1090391,1088241,1084185,1078454,73863,1071451,1055248,52403,1049316,1038740,36081,34029,987778,984532,980564,974397,968443,965822,961(len: 6837)"] [txn_mode=PESSIMISTIC] [err="runtime error: invalid memory address or nil pointer derefe
rence\ngithub.com/pingcap/tidb/executor.(*HashJoinExec).fetchAndBuildHashTable.func2\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/join.go:678\ngithub.com/pingcap/tidb/util.WithRecovery.func1\n\t/ho
me/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/util/misc.go:85\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:679\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:199\nruntime.sigpanic\n\t/usr/local/
go/src/runtime/signal_unix.go:394\ngithub.com/pingcap/tidb/session.(*TxnState).GetSnapshot\n\t<autogenerated>:1\ngithub.com/pingcap/tidb/executor.(*BatchPointGetExec).initialize\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.
com/pingcap/tidb/executor/batch_point_get.go:128\ngithub.com/pingcap/tidb/executor.(*BatchPointGetExec).Next\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/batch_point_get.go:89\ngithub.com/pingcap/t
idb/executor.Next\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/executor.go:248\ngithub.com/pingcap/tidb/executor.(*HashJoinExec).fetchBuildSideRows\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/s
rc/github.com/pingcap/tidb/executor/join.go:257\ngithub.com/pingcap/tidb/executor.(*HashJoinExec).fetchAndBuildHashTable.func1\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/executor/join.go:675\ngithub.com/p
ingcap/tidb/util.WithRecovery\n\t/home/jenkins/agent/workspace/tidb_v4.0.0/go/src/github.com/pingcap/tidb/util/misc.go:93\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]

If it is anything else i can give or require please let me know i am available to help in any way i can.

@ichn-hu
Copy link
Contributor

ichn-hu commented Aug 25, 2020

@cberescu thank you for being so responsive, I'll be looking into this issue, more over, could you please a more specific version number? Say, the output of select tidb_version()

@cberescu
Copy link
Author

Hi, the version was 4.0.0 , i have now updated to 4.0.3 , I can give a try and see it this happens with this version also.

@ichn-hu
Copy link
Contributor

ichn-hu commented Aug 25, 2020

Hi, the version was 4.0.0 , i have now updated to 4.0.3 , I can give a try and see it this happens with this version also.

Cool, thanks a lot. By the way, how frequent will this error pops up?

@cberescu
Copy link
Author

I can not tell you exactly how often it pops up, it did 2 times pretty close to each other within an hour, after that i moved the query back to mysql. At this moment it is running for about 3 hours and nothing has happened. I will let it run for the next 24 or 48 hours and if nothing happens then i think we can conclude the issue was fixed with the 4.0.3 version .

I will keep you informed if anything happens or it doesn't. Thank you.

@ichn-hu
Copy link
Contributor

ichn-hu commented Aug 26, 2020

Cool, thank you so much for your warm heart, let's work it out together!

Looking forward to your update!

@fzhedu fzhedu added the priority/P1 The issue has P1 priority. label Aug 27, 2020
@eurekaka
Copy link
Contributor

It looks like a data race when BatchPointGet is executed in multi-thread context. You can enable -race flag when building tidb-server, and then try the reproduction again.

@cberescu
Copy link
Author

Hey guys,

I have let the query where i found the issue to be processed by tidb for the last 48h and i did not see for the error to appear again. So it seems maybe there was a fix between version 4.0.0 and 4.0.3

I will anyway keep it running like this for the time being.

@ichn-hu
Copy link
Contributor

ichn-hu commented Aug 28, 2020

Hey guys,

I have let the query where i found the issue to be processed by tidb for the last 48h and i did not see for the error to appear again. So it seems maybe there was a fix between version 4.0.0 and 4.0.3

I will anyway keep it running like this for the time being.

Could you try @eurekaka 's suggestion, by rebuilding the tidb-server with -race flag and then continue the reproduction?

@cberescu
Copy link
Author

Hey @ichn-hu unfortunately i do not know how to do that, i used the tiup to deploy the cluster. Is this something i can do with it? If you have a documentation about how i can do it i can deploy a instance with the flag set and send all traffic to it.

@ichn-hu
Copy link
Contributor

ichn-hu commented Aug 28, 2020

It looks like a data race when BatchPointGet is executed in multi-thread context. You can enable -race flag when building tidb-server, and then try the reproduction again.

Hi @eurekaka , would you please elaborate for @cberescu ?

@ichn-hu
Copy link
Contributor

ichn-hu commented Sep 2, 2020

Hey @ichn-hu unfortunately i do not know how to do that, i used the tiup to deploy the cluster. Is this something i can do with it? If you have a documentation about how i can do it i can deploy a instance with the flag set and send all traffic to it.

Hi @cberescu , sorry for a late reply, basically you need to compile & build the tidb server by yourself with go's race detector turned on.

To compile the tidb-server, you need to have a go environment, then clone the code repository, and use

WITH_RACE=1 make server

to compile the tidb server with the race detector.

And you could then test against the compiled binary file, which will detect race condition on the fly.

Let me know if you have any more questions.

@ichn-hu
Copy link
Contributor

ichn-hu commented Sep 2, 2020

More on this, you might not need to use tiup to test the binary file in a cluster, you could just ./bin/tidb-server to start a single tidb server in a standalone mode (you don't even need tikv) if your test case is not that huge.

If you need to use tiup in any case, if you are using tiup playground, you could specify a binary file path, see tiup playground --help; if you are using tiup cluster, you might need to do some research on how to specifiy a compiled binary to deploy.

@cberescu
Copy link
Author

cberescu commented Sep 2, 2020

for now i do not seem to receive the error anymore, could it have been fixed maybe in one of releases since 4.0.0? As i said the error was received on the 4.0.0 version.

@ichn-hu
Copy link
Contributor

ichn-hu commented Sep 2, 2020

for now i do not seem to receive the error anymore, could it have been fixed maybe in one of releases since 4.0.0? As i said the error was received on the 4.0.0 version.

Probably. Perhaps we could consider close this issue? @cberescu feel free to reopen it as long as the problem accur again, thanks for your patience and help~

@ichn-hu
Copy link
Contributor

ichn-hu commented Sep 2, 2020

@SunRunAway would you please help close this issue?

@SunRunAway SunRunAway reopened this Sep 2, 2020
@ichn-hu
Copy link
Contributor

ichn-hu commented Sep 2, 2020

After investigation, this issue is confirmed to be resolved by later PRs, namely #18145, and the cause is identified here: #16817 (comment) . In short, the BatchPointGetExec tries to get a snapshot from the txnState, which after checking validity, might be concurrently changed before snapshot = txn.GetSnapshot(), resulted in null pointer dereference.

@cberescu as long as you are using later release versions, such as 4.0.5, you are now free from such panic.

@cberescu
Copy link
Author

cberescu commented Sep 2, 2020

that is great news, thanks for your explanation and patience with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/executor priority/P1 The issue has P1 priority. severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

7 participants