Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that RedisConnection is not Thread-safe #2653

Closed
zhaozhiguang opened this issue Jul 27, 2023 · 9 comments
Closed

Document that RedisConnection is not Thread-safe #2653

zhaozhiguang opened this issue Jul 27, 2023 · 9 comments
Assignees
Labels
in: lettuce Lettuce driver type: documentation A documentation update

Comments

@zhaozhiguang
Copy link

Current Behavior

Use parallelStream for concurrent operations. The amount of data is about 300k. After execution, all results will be added to the pipe. However, this pipe is of the ArrayList type when initializing, so some results are not added to the pipe. Then, NullPointerException will be thrown when traversing the pipe when closePipeline()

Stack trace
java.lang.NullPointerException:null
 at org.springframework.data.redis.connection.lettuce.LettuceConnection.closePipeline(LettuceConnection.java:515)
...

Input Code

Input Code
redisTemplate.executePipeLined(connection -> {
	objectives.parallelStream().forEach(obj -> {
		connection.setCommands().sAdd(rawKey,rawValue);
	});
})

Environment

  • Spring-Data-Redis version(s): [e.g. 2.5.6]
  • Redis version: [e.g. 6.2.5]
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jul 27, 2023
@jxblum
Copy link
Contributor

jxblum commented Jul 28, 2023

@zhaozhiguang -

Perhaps you can explain what objectives is from your "Input Code":

redisTemplate.executePipelined(connection -> {
	objectives.parallelStream().forEach(obj -> {
		connection.setCommands().sAdd(rawKey, rawValue);
	});
})

For my explanation, I assume (and I am reasonably certain) that:

  1. objectives is some type of Collection
  2. And, you are calling parallelStream() on this Collection (Javadoc)
  3. And, the Collection implementation supports parallel processing of the Stream returned by the Collection that is also the source for the elements contained in the Stream.

First of all, the Spring Data Redis RedisConnection interface is not documented as being Thread-safe, not in reference documentation and not in Javadoc.

WARNING: Even a LettuceConnection is not necessarily Thread-safe despite the native Lettuce connection being generally Thread-safe.

Thread-safety is certainly not the case for Jedis. Under-the-hood, a Spring Data Redis JedisConnection (Javadoc; source) uses a Jedis driver client, Jedis instance, which is not Thread-safe (see here).

For Lettuce, while the underlying, native connection is Thread-safe, as documented in Spring Data Redis's reference documentation as well as the Lettuce driver docs (see RedisClient and StatefulConnection, such as StatefulRedisConnection), it is not necessarily true that a Spring Data Redis LettuceConnection itself is Thread-safe.

NOTE: It should be noted that the general Lettuce StatefulConnection interface is not documented as Thread-safe.

NOTE: The List (implementation) declared for pipelining (here, and specifically constructed, here, when opening the pipeline) is not the only non-Thread-safe data structure used by LettuceConnection. LettuceConnection also uses a non-Thread-safe Queue (backed by a LinkedList implementation) for collecting TX results (see here).

When a Java stream is parallelized, such as by calling parallelStream(), the stream is partitioned into smaller sub-streams that are then processed in parallel (concurrently) using multiple Threads created and managed by the ForkJoin framework (using a Thread pool).

Therefore, a LettuceConnection is being accessed from multiple Threads based on how you structured your code.

TIP: It is important to understand the implications of "parallel" streams (and parallel processing in general). You can find documentation on this matter from Oracle, as well as in this nice blog post from Baeldung.

As a temporary workaround, you could have simply, instead, sequentially processed the elements (objectives) in the Collection by doing:

redisTemplate.executePipelined(connection -> {
	objectives.forEach(obj -> {
		connection.setCommands().sAdd(rawKey, rawValue);
	});
})

You are already benefiting from using "Redis pipelining", as documented, anyway.

Alternatively, although I am not entirely certain in this case, you may be able to restructure your code by doing:

objectives.parallelStream().forEach(obj -> {
        redisTemplate.executePipelined(operations -> {
		operations.opsForSet().add(rawKey, rawValue);
	});
})

In this case, you are calling RedisTemplate.executePipelined(:SessionCallback<?>) (Javadoc).

Also, see the Javadoc for SessionCallback.

Of course, SessionCallback seems to be intended for use in transactions. Still, like transactions, pipelining works against a dedicated connection to send a batch of Redis commands in a single request (without the atomic guarantee in TX).

NOTE: RedisTemplate is Thread-safe (see Javadoc, and ref doc).

In either arrangement, I would strongly encourage the use of pooled connections.

Let me know what you think.

In the meantime, I will speak to the team about our Thread-safety guarantees with regard to Spring Data Redis connections (i.e. instances of RedisConnection).

@jxblum
Copy link
Contributor

jxblum commented Jul 28, 2023

@zhaozhiguang -

Also be aware that Spring Data Redis 2.5.x (last version was 2.5.12) is no longer in OSS support (see here). Even commercial support ends in a couple weeks, on 2023-August-14.

You should minimally upgrade to Spring Data Redis 2.7.x (latest version is 2.7.14). Although, be aware that SD Redis 2.7.x OSS support ends in November, 2023, after which, your team will need to purchase a commercial support from VMware by purchasing a VMware Spring Runtime (VSR; see here) license.

I would encourage you to upgrade to Spring Data Redis 3.1.x as your earliest convenience. Spring Data Redis 3.x does require a Java 17 baseline, uses Spring Framework 6, and is pulled in by Spring Boot 3.

@jxblum jxblum added for: team-attention An issue we need to discuss as a team to make progress in: lettuce Lettuce driver labels Jul 28, 2023
@jxblum jxblum changed the title pipe mode has thread-safety bug pipeline mode has thread-safety bug Jul 28, 2023
@jxblum
Copy link
Contributor

jxblum commented Aug 1, 2023

NOTE: I edited my original, first comment above.

@jxblum
Copy link
Contributor

jxblum commented Aug 1, 2023

@zhaozhiguang - Following up to my previous comments, I have additional feedback for you.

First, I wrote integration tests to reproduce and analyze the problem. A few things should stand out here:

  1. The storeInRedisUsingParallelStreamInPipeline() test case is precisely your scenario.

  2. I could only reproduce the NPE after using a data set size of 500,000 elements, running Oracle JDK 17.0.7 in macOS Ventura 13.5 on Intel.

  3. The test case both passed and failed (throwing the NPE, as documented) across several, different integration test runs. It does not consistently fail or pass, though it passed more often than it failed.

  4. The failure was not unexpected (nor was a "pass" outcome) given the nature of "race conditions" when using non-Thread-safe components, such as Redis connections, and in particular, the Spring Data Redis LettuceConnection class, in a concurrent, multi-Threaded context.

  5. I compared your arrangement to using a sequential stream for processing (storing) the elements in a Redis Set, in this test case.

  6. The ordering between parallel and sequential test cases in my integration test class was deliberate. I did NOT want the "sequential" test case to run with a warm (JIT optimized) JVM as compared to (possibly) the "parallel" test case.

  7. Next, I added time measurements (and timing for sequential execution) to compare sequential and parallel stream use inside Redis Pipelined connections.

  8. Finally, I logged the Thread names in use when testing parallel stream execution inside Redis Pipelined connections to ensure the Collection supported parallel streams in the first place, and that different Threads were used to process the Stream returned by the Collection.

The output should look similar to (conveniently formatted for this comment):

2023-08-01T14:42:20.145-07:00 DEBUG 57021 --- [           main] .s.d.r.c.t.AbstractRedisIntegrationTests : 
PIPELINE-STREAM THREADS [[
  ForkJoinPool.commonPool-worker-1, 
  ForkJoinPool.commonPool-worker-10, 
  ForkJoinPool.commonPool-worker-11, 
  ForkJoinPool.commonPool-worker-2, 
  ForkJoinPool.commonPool-worker-3,
  ForkJoinPool.commonPool-worker-4, 
  ForkJoinPool.commonPool-worker-5, 
  ForkJoinPool.commonPool-worker-6, 
  ForkJoinPool.commonPool-worker-7, 
  ForkJoinPool.commonPool-worker-8, 
  ForkJoinPool.commonPool-worker-9, 
  main
]]

You may be surprised to learn that the "sequential" execution outperformed the "parallel" execution in my testing. Of course, this is not unexpected. Parallelization (or premature optimization) does not always lead to performance improvements. It depends on the environment and measurements are always required to optimize correctly, if at all.

I'd say, in your case, with only a 300K element data set size, it won't make much of a difference to optimize through the use parallelization. It will also most definitely vary by environment. And, most of the time is likely to be eaten up by network activity, anyway, which can vary from moment to moment depending on the amount of network traffic.

In my test run, although local, the difference between sequential and parallel execution on 500,000 elements was not significant, with sequential execution running slightly faster even though it was run first (possibly void of JIT compilation (optimizations in general)):

Using Parallel Stream: 10,918 ms
Using Sequential Stream: 10,269 ms

Of course, (JVM) optimizations can vary between run invocations as well, thereby affecting measurements. But, on average, there was not a significant, noticeable advantage from 1 approach to the other.

NOTE: To get more accurate and reliable measurements, the use JMH is recommended over arbitrary time measurements using System.currentTimeMillis().

@jxblum
Copy link
Contributor

jxblum commented Aug 1, 2023

Finally, I'd also add that you should NOT use the alternative approach I suggested, which I also captured in a test case.

This performed very poorly, and when you trace down (into Spring Data Redis), you realize that you are not really using Redis Pipelined connections effectively since the SessionCallback is NOT a true "session" in the sense of coupling a connection to a Thread with ThreadLocal in the same way transaction connections are managed.

@zhaozhiguang
Copy link
Author

Yes, I have also tested both concurrent and serial modes. Serial performance will be better, because I read Lettuce's official website that Thread safety is, of course, Spring Data Redis is also Thread safety

@jxblum
Copy link
Contributor

jxblum commented Aug 2, 2023

Not all classes and components in Spring Data Redis are Thread-safe, as I have discussed above.

If Spring Data Redis classes are not explicitly documented as Thread-safe, then it is not safe to assume they are, even if certain underlying classes (e.g. Lettuce driver/library classes) used by Spring Data Redis classes are Thread-safe. If a class or component is not explicitly documented as Thread-safe, then you must assume it is NOT Thread-safe, in fact.

For instance, clearly the RedisConnection classes are NOT Thread-safe. JedisConnection is definitely not Thread-safe because Jedis (single instance used by JedisConnection) itself is not Thread-safe (from Jedis doc).

While RedisTemplate is documented (ref doc, Javadoc) to be Thread-safe, it depends on how it is used.

So long as the RedisTemplate is acquired from the Spring container (i.e. injected into your application components and beans managed by the Spring container), and NOT re-configured, it will remain Thread-safe. It can be safely shared across multiple Threads, even though RedisTemplate uses connections, because it acquires, uses and then releases connections per operation, thereby maintaining Thread-safety (1 reason why connection pooling helps here). If it kept a reference to, and simply reused a single connection, then RedisTemplate would no longer be Thread-safe.

If the template configuration is changed after it is acquired from (or injected by) the Spring container, then it is no longer Thread-safe since it carries non-volatile references to other objects, like RedisSerializers used to de/serialize keys/values, the ClassLoader used on deserialization, and so on (all this; mileage on operation interface delegates used by RedisTemplate varies).

NOTE: Non-volatile and non-atomic references to mutable state is not Thread-safe, and even if the references were volatile/atomic, the objects to which they refer much also be Thread-safe (or effectively immutable) for the template to be Thread-safe.

In summary, the "Thread-safety" guarantee of RedisTemplate hinges on "initialization-safety" and nothing changing after initialization.

The same rules apply for other classes and components inside the Spring Data Redis framework.

@jxblum
Copy link
Contributor

jxblum commented Aug 2, 2023

Two things. In general...

  1. Not every class that consists entirely of or uses Thread-safe classes is itself Thread-safe. For example, think "compound actions" on multiple, independent, Thread-safe objects.

  2. And, not every class consisting of or using non-Thread-safe classes is itself, not Thread-safe. For example, think of classes composed of immutable, or effectively immutable classes or components that are safely initialized.

There are many factors for determining the Thread-safety guarantees and what constitutes a "Thread-safe" class.

Having said this, I really believe this issue boils down to a documentation change for Spring Data Redis, more than anything else.

@jxblum
Copy link
Contributor

jxblum commented Aug 2, 2023

As of right now, we are currently undecided whether changing the List type (e.g. ArrayList -> CopyOnWriteArrayList or a ConcurrentSkipListSet) in LettuceConnection is appropriate. In general, we don't want users to automatically assume SD Redis connections (or other classes) are necessarily Thread-safe, even if they use Thread-safe classes (or libraries, like Lettuce) under-the-hood.

CC'ing/Pinging @christophstrobl & @mp911de for further comments.

@jxblum jxblum changed the title pipeline mode has thread-safety bug Pipeline mode has Thread-safety bug Aug 8, 2023
@jxblum jxblum added status: declined A suggestion or change that we don't feel we should currently apply and removed status: waiting-for-triage An issue we've not yet triaged for: team-attention An issue we need to discuss as a team to make progress labels Aug 8, 2023
jxblum added a commit to jxblum/spring-data-redis that referenced this issue Aug 8, 2023
mp911de pushed a commit that referenced this issue Aug 10, 2023
… Thread-safety guarantees.

Closes #2653
Original pull request: #2667
mp911de added a commit that referenced this issue Aug 10, 2023
Tweak wording around thread-safety. Include notices in connection factories.

Reformat asciidoc.

See #2653
Original pull request: #2667
mp911de added a commit that referenced this issue Aug 10, 2023
Tweak wording around thread-safety. Include notices in connection factories.

Reformat asciidoc.

See #2653
Original pull request: #2667
mp911de pushed a commit that referenced this issue Aug 10, 2023
… Thread-safety guarantees.

Closes #2653
Original pull request: #2667
mp911de added a commit that referenced this issue Aug 10, 2023
Tweak wording around thread-safety. Include notices in connection factories.

Reformat asciidoc.

See #2653
Original pull request: #2667
@mp911de mp911de added this to the 3.0.9 (2022.0.9) milestone Aug 10, 2023
@mp911de mp911de added type: documentation A documentation update and removed status: declined A suggestion or change that we don't feel we should currently apply labels Aug 10, 2023
@mp911de mp911de changed the title Pipeline mode has Thread-safety bug Document that RedisConnection is not Thread-safe Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: lettuce Lettuce driver type: documentation A documentation update
Projects
None yet
4 participants