Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recursion in cache processing #2259

Merged
merged 1 commit into from
Aug 26, 2020
Merged

Conversation

cevian
Copy link
Contributor

@cevian cevian commented Aug 25, 2020

Previously, cache invalidation could cause recursion in cache processing.
PR #1493 fixed this for the

  • cache_invalidate_callback() -> ts_extension_invalidate()

call path. But the call path

  • cache_invalidate_callback() -> ts_extension_is_loaded()

could still go into recursion.

So, this PR moves the recursion-prevention logic into
extension_update_state(), which is common to both call paths.

Fixes #2200.

@cevian cevian requested a review from a team as a code owner August 25, 2020 16:10
@codecov
Copy link

codecov bot commented Aug 25, 2020

Codecov Report

Merging #2259 into master will increase coverage by 0.12%.
The diff coverage is 90.90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2259      +/-   ##
==========================================
+ Coverage   90.10%   90.22%   +0.12%     
==========================================
  Files         211      212       +1     
  Lines       34290    34297       +7     
==========================================
+ Hits        30897    30946      +49     
+ Misses       3393     3351      -42     
Impacted Files Coverage Δ
src/compat.h 100.00% <ø> (ø)
src/hypertable.c 88.61% <0.00%> (-0.10%) ⬇️
src/dimension_slice.c 95.25% <50.00%> (+0.81%) ⬆️
src/debug_wait.c 88.88% <88.88%> (ø)
src/chunk.c 95.38% <100.00%> (+0.06%) ⬆️
src/extension.c 88.29% <100.00%> (ø)
src/hypercube.c 92.85% <100.00%> (+0.14%) ⬆️
src/scanner.c 96.26% <100.00%> (+0.02%) ⬆️
src/loader/bgw_message_queue.c 84.51% <0.00%> (-2.59%) ⬇️
src/bgw/scheduler.c 82.78% <0.00%> (-1.79%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5300b68...7b65ce7. Read the comment docs.

@erimatnor erimatnor added the bug label Aug 25, 2020
@erimatnor erimatnor modified the milestones: 1.7.3, 2.0.0 Aug 25, 2020
@erimatnor
Copy link
Contributor

@cevian would be good to have a test case that reproduces the bug and shows that this is issue is fixed.

@cevian
Copy link
Contributor Author

cevian commented Aug 25, 2020

@erimatnor In theory I agree, but I haven't been able to come up with a reproducible case. This works most of the time as-is. It's some internal Postgres state that causes this behavior to be triggered.

@cevian
Copy link
Contributor Author

cevian commented Aug 25, 2020

@erimatnor I spent a lot more time trying to reproduce and could not. It seems like the loop is within ReceiveSharedInvalidMessages being execute at sinval.c:90 and sinval:120 (the latter every 32 = MAXINVALMSGS calls).

I think this only happens when there are a lot of msgs or something similar. I put my attempts at testing this in the fix_recursion_tests branch in my fork in case people want to get back to it at some point, but it feels like there is a benefit to merging this in the meantime.

Copy link
Contributor

@pmwkaa pmwkaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but agree with @erimatnor that it would be great to have a test case for it

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something is wrong with the cache code if we have to rely on these hacks to solve issues and I wonder how many more issues we have with the cache code. However, this solves the current problem and I do not see that this approach should cause any issues, so approving.

This seems to be a rare occurrence, so not sure how we can test this.

@cevian
Copy link
Contributor Author

cevian commented Aug 26, 2020

Leaving some more notes: I suspect this happens in the UNKNOWN state and in background workers. The backend causing the invalidation needs to commit or abort to cause invalidations to be sent to other backends, including bg workers. The bg workers has to also already be in a transaction when the messages are received, otherwise it will just happily stay in the unknown state.

Another note: scanning for the namespace in the heap instead of through cache will not help. The backtrace shows that the recursion happens through LockRelationOid, which would be called for a pure-heap scan too. Notice that the backtrace is for a cache miss.

Previously, cache invalidation could cause recursion in cache processing.
PR timescale#1493 fixed this for the
   cache_invalidate_callback() -> ts_extension_invalidate()
call path. But the call path
   cache_invalidate_callback() -> ts_extension_is_loaded()
could still go into recursion.

So, this PR moves the recursion-prevention logic into
extension_update_state(), which is common to both call paths.

Fixes timescale#2200.
@cevian cevian merged commit e33fc8c into timescale:master Aug 26, 2020
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Aug 26, 2020
This maintenance release contains bugfixes since the 1.7.2 release. We deem it high
priority for upgrading.

In particular the fixes contained in this maintenance release address bugs in compression,
drop_chunks and the background worker scheduler.

**Bugfixes**
* timescale#2059 Improve infering start and stop arguments from gapfill query
* timescale#2067 Support moving compressed chunks
* timescale#2068 Apply SET TABLESPACE for compressed chunks
* timescale#2090 Fix index creation with IF NOT EXISTS for existing indexes
* timescale#2092 Fix delete on tables involving hypertables with compression
* timescale#2164 Fix telemetry installed_time format
* timescale#2184 Fix background worker scheduler memory consumption
* timescale#2222 Fix `negative bitmapset member not allowed` in decompression
* timescale#2255 Propagate privileges from hypertables to chunks
* timescale#2256 Fix segfault in chunk_append with space partitioning
* timescale#2259 Fix recursion in cache processing
* timescale#2261 Lock dimension slice tuple when scanning

**Thanks**
* @akamensky for reporting an issue with drop_chunks and ChunkAppend with space partitioning
* @dewetburger430 for reporting an issue with setting tablespace for compressed chunks
* @fvannee for reporting an issue with cache invalidation
* @nexces for reporting an issue with ChunkAppend on space-partitioned hypertables
* @PichetGoulu for reporting an issue with index creation and IF NOT EXISTS
* @sezaru for reporting an issue with background worker scheduler memory consumption
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Aug 27, 2020
This maintenance release contains bugfixes since the 1.7.2 release. We deem it high
priority for upgrading.

In particular the fixes contained in this maintenance release address bugs in compression,
drop_chunks and the background worker scheduler.

**Bugfixes**
* timescale#2059 Improve infering start and stop arguments from gapfill query
* timescale#2067 Support moving compressed chunks
* timescale#2068 Apply SET TABLESPACE for compressed chunks
* timescale#2090 Fix index creation with IF NOT EXISTS for existing indexes
* timescale#2092 Fix delete on tables involving hypertables with compression
* timescale#2164 Fix telemetry installed_time format
* timescale#2184 Fix background worker scheduler memory consumption
* timescale#2222 Fix `negative bitmapset member not allowed` in decompression
* timescale#2255 Propagate privileges from hypertables to chunks
* timescale#2256 Fix segfault in chunk_append with space partitioning
* timescale#2259 Fix recursion in cache processing
* timescale#2261 Lock dimension slice tuple when scanning

**Thanks**
* @akamensky for reporting an issue with drop_chunks and ChunkAppend with space partitioning
* @dewetburger430 for reporting an issue with setting tablespace for compressed chunks
* @fvannee for reporting an issue with cache invalidation
* @nexces for reporting an issue with ChunkAppend on space-partitioned hypertables
* @PichetGoulu for reporting an issue with index creation and IF NOT EXISTS
* @sezaru for reporting an issue with background worker scheduler memory consumption
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Aug 27, 2020
This maintenance release contains bugfixes since the 1.7.2 release. We deem it high
priority for upgrading.

In particular the fixes contained in this maintenance release address issues in compression,
drop_chunks and the background worker scheduler.

**Bugfixes**
* timescale#2059 Improve infering start and stop arguments from gapfill query
* timescale#2067 Support moving compressed chunks
* timescale#2068 Apply SET TABLESPACE for compressed chunks
* timescale#2090 Fix index creation with IF NOT EXISTS for existing indexes
* timescale#2092 Fix delete on tables involving hypertables with compression
* timescale#2164 Fix telemetry installed_time format
* timescale#2184 Fix background worker scheduler memory consumption
* timescale#2222 Fix `negative bitmapset member not allowed` in decompression
* timescale#2255 Propagate privileges from hypertables to chunks
* timescale#2256 Fix segfault in chunk_append with space partitioning
* timescale#2259 Fix recursion in cache processing
* timescale#2261 Lock dimension slice tuple when scanning

**Thanks**
* @akamensky for reporting an issue with drop_chunks and ChunkAppend with space partitioning
* @dewetburger430 for reporting an issue with setting tablespace for compressed chunks
* @fvannee for reporting an issue with cache invalidation
* @nexces for reporting an issue with ChunkAppend on space-partitioned hypertables
* @PichetGoulu for reporting an issue with index creation and IF NOT EXISTS
* @prathamesh-sonpatki for contributing a typo fix
* @sezaru for reporting an issue with background worker scheduler memory consumption
@svenklemm svenklemm mentioned this pull request Aug 27, 2020
svenklemm added a commit that referenced this pull request Aug 27, 2020
This maintenance release contains bugfixes since the 1.7.2 release. We deem it high
priority for upgrading.

In particular the fixes contained in this maintenance release address issues in compression,
drop_chunks and the background worker scheduler.

**Bugfixes**
* #2059 Improve infering start and stop arguments from gapfill query
* #2067 Support moving compressed chunks
* #2068 Apply SET TABLESPACE for compressed chunks
* #2090 Fix index creation with IF NOT EXISTS for existing indexes
* #2092 Fix delete on tables involving hypertables with compression
* #2164 Fix telemetry installed_time format
* #2184 Fix background worker scheduler memory consumption
* #2222 Fix `negative bitmapset member not allowed` in decompression
* #2255 Propagate privileges from hypertables to chunks
* #2256 Fix segfault in chunk_append with space partitioning
* #2259 Fix recursion in cache processing
* #2261 Lock dimension slice tuple when scanning

**Thanks**
* @akamensky for reporting an issue with drop_chunks and ChunkAppend with space partitioning
* @dewetburger430 for reporting an issue with setting tablespace for compressed chunks
* @fvannee for reporting an issue with cache invalidation
* @nexces for reporting an issue with ChunkAppend on space-partitioned hypertables
* @PichetGoulu for reporting an issue with index creation and IF NOT EXISTS
* @prathamesh-sonpatki for contributing a typo fix
* @sezaru for reporting an issue with background worker scheduler memory consumption
svenklemm added a commit that referenced this pull request Aug 27, 2020
This maintenance release contains bugfixes since the 1.7.2 release. We deem it high
priority for upgrading.

In particular the fixes contained in this maintenance release address issues in compression,
drop_chunks and the background worker scheduler.

**Bugfixes**
* #2059 Improve infering start and stop arguments from gapfill query
* #2067 Support moving compressed chunks
* #2068 Apply SET TABLESPACE for compressed chunks
* #2090 Fix index creation with IF NOT EXISTS for existing indexes
* #2092 Fix delete on tables involving hypertables with compression
* #2164 Fix telemetry installed_time format
* #2184 Fix background worker scheduler memory consumption
* #2222 Fix `negative bitmapset member not allowed` in decompression
* #2255 Propagate privileges from hypertables to chunks
* #2256 Fix segfault in chunk_append with space partitioning
* #2259 Fix recursion in cache processing
* #2261 Lock dimension slice tuple when scanning

**Thanks**
* @akamensky for reporting an issue with drop_chunks and ChunkAppend with space partitioning
* @dewetburger430 for reporting an issue with setting tablespace for compressed chunks
* @fvannee for reporting an issue with cache invalidation
* @nexces for reporting an issue with ChunkAppend on space-partitioned hypertables
* @PichetGoulu for reporting an issue with index creation and IF NOT EXISTS
* @prathamesh-sonpatki for contributing a typo fix
* @sezaru for reporting an issue with background worker scheduler memory consumption
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault in cache code
5 participants