Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller crash on nil pointer deference during - now at different location #1460

Open
arunsworld opened this issue Oct 24, 2023 · 7 comments

Comments

@arunsworld
Copy link

Stack trace:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1cc4311]

goroutine 13310556 [running]:
github.com/openziti/channel/v2.(*channelImpl).rxer.func1()
	/go/pkg/mod/github.com/openziti/channel/v2@v2.0.99/impl.go:308 +0x3f
panic({0x2113340?, 0x4196650?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/openziti/ziti/controller/sync_strats.(*InstantStrategy).ReceiveClientHello(0xc00204ab00, 0xc002e390e0, 0xc002514000)
	/go/src/github.com/openziti/ziti/controller/sync_strats/sync_instant.go:500 +0x1051
github.com/openziti/ziti/controller/handler_edge_ctrl.(*helloHandler).HandleReceive(0xc001d6c3a0, 0xc000ac6330, {0x2bc82e0, 0xc00273f500})
	/go/src/github.com/openziti/ziti/controller/handler_edge_ctrl/hello.go:58 +0x4a8
github.com/openziti/channel/v2.(*channelImpl).rxer(0xc0003e3680?)
	/go/pkg/mod/github.com/openziti/channel/v2@v2.0.99/impl.go:357 +0x571
created by github.com/openziti/channel/v2.(*channelImpl).startMultiplex in goroutine 56
	/go/pkg/mod/github.com/openziti/channel/v2@v2.0.99/impl.go:288 +0xc7

Branch: fix-router-connect-panic
Part of the longevity test to check if issue #1423 was fixed.

I see that there is null checking above for logging but in line 500 it's unchecked.

If I may also suggest, a lot of the code I've looked at is very "Java". Setters and Getters and self. if it were more idiomatic Go, it would be easier on Go devs and leverage the simplicity and semantic principles foundational in the language.

@plorenz
Copy link
Member

plorenz commented Oct 24, 2023

Hi @arunsworld ,
Would you be willing to run a build from a debug branch, so we can get some more information? I've spent a few hours digging through the code and I can't see a way for the VersionInfo to be nil there.

The VersionInfo gets set in accept.go. If the VersionInfo isn't present or can't be parsed, we now close the connection instead of continuing processing. The Router with the VersionInfo set is then placed in the connected routers map. The only time something is placed in that map is from accept. We never clear VersionInfo. Part of the fix was to always grab the router from the connected routers map.

Anyway, I can add some debug logging which may help us get to the bottom of this. Let me know if that's something you want to pursue. I'm surprised we're not seeing this as we're hosting lots of ziti networks and I get notified whenever there's a panic.

Regarding code style: I agree about the setters/getters and I think newer code doesn't suffer from those issues. We try to fix those things as we come across them. We've gotten much more comfortable in Go since the project has started and I think the code is much cleaner and more idiomatic these days. In general I personally try to only add setters/getters when it's needed for an interface.

Regarding self, we're going to have to disagree :) The Go recommendations for style have a few things I don't like.

One is single letter variable names. I think there's a happy medium between a, b, c and superLongOverlyDescriptiveVariableName.

The other is the advice to not use names like self. I consider using self a big time saver. I've never gotten confused about what it means in context. I no longer need to spend time thinking about how to abbreviate type names. As the quote says "There are only two hard things in Computer Science: cache invalidation and naming things."

@plorenz
Copy link
Member

plorenz commented Oct 24, 2023

Also, I realized I forgot to mention - I know I could just wrap that line in a nil check, but since that should never be nil, I want to track down the actual root cause, rather than bandaid it. If we can't figure out, I can always do the bandaid later.

@arunsworld
Copy link
Author

Sure; happy to build from a debug branch and running that and hopefully reproduce the problem. Please let me know once there is a ready branch. Thanks.

@plorenz
Copy link
Member

plorenz commented Oct 30, 2023

I'll put together a branch today and post it here

@plorenz
Copy link
Member

plorenz commented Oct 30, 2023

I've pushed some changes to the router-hello-panic-debug branch. I also made some tweaks which may possible fix the panic, though I'm not holding out hope. The main debugging I added was logging the router and channel pointer values so that we compare them in the accept and the hello handling code and check the ordering and the values.

Let me know how it looks. Appreciate you helping track this down.

@arunsworld
Copy link
Author

Thanks. I've deployed from this branch all controllers and routers. Hopefully will be able to reproduce the panic soon; and I'll have the logs here presently.

@plorenz
Copy link
Member

plorenz commented Nov 1, 2023

FYI: The potential fix included in the debug branch was released in 0.31.0. Even if it doesn't fix the bug it made the code a bit cleaner, so I went ahead and pushed it through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants