Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Out_channel STM tests #431

Merged
merged 53 commits into from
Jan 23, 2024
Merged

Add Out_channel STM tests #431

merged 53 commits into from
Jan 23, 2024

Conversation

jmid
Copy link
Collaborator

@jmid jmid commented Jan 6, 2024

The Lin Out_channel test is a sore point of the test suite:

Out_channels buffer internally, meaning length's result can vary.
This is a bad fit for Lin's sequential consistency test, because we may end up finding counterexamples of different buffering (between a parallel and any interleaved sequential run) - or just spend a long time searching for one.
As such, a model-based STM test is a better fit - and this PR offers one.

The new model-based test

  • tests sequential and parallel usage
  • runs across all CI platforms as a positive test
  • covers as much of the Out_channel module as the existing Lin test
  • runs much more predictably
  • enables us to test Out_channel on macOS again (without any issues)

Many things thus speak for switching to it.

On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412

Technically, there are a few nuggets in the test code:

  • we model both closed and open Out_channels with suitable preconditions. As such the test can generate open-close-open-close cmd sequences.
  • the uncertainty of buffering affecting length's output is handled by comparing less-or-equal-to the model's length
  • binary-mode (only available on Windows) is modeled in the state
    • when enabled on MinGW/Cygwin it means length increases by 2 for every '\n' output (since '\n' maps to "\r\n")
    • position does not reflect this translation though
    • the tests finally found a counterexample illustrating how buffering may interfere with set_binary_mode: it may not be the mode enabled at output-call-time that takes effect (example below). This is solved by always calling flush before set_binary_mode on MinGW/Cygwin where this matters:
    Open_text : Ok (())
    Set_binary_mode true : ()
    Output_char '\n' : Ok (())
    Set_binary_mode false : ()
    Flush : ()
    Length : 2
    

Closes #378 and #401

@jmid jmid linked an issue Jan 6, 2024 that may be closed by this pull request
@jmid
Copy link
Collaborator Author

jmid commented Jan 9, 2024

CI summary for 2150652:

Out of 59 workflows 7 failed with 1 genuine failure and 6 ci setup issues

@jmid
Copy link
Collaborator Author

jmid commented Jan 9, 2024

I've rebased the PR on main after the merge of #429

@jmid
Copy link
Collaborator Author

jmid commented Jan 9, 2024

CI summary for 024dd85

Out of 44 workflows 5 failed with all 5 being genuine failures.

Copy link
Collaborator

@shym shym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A real test for Out_channels, where we explicitly take care of the buffering, this is very nice! Congratulations!

src/io/stm_tests.ml Outdated Show resolved Hide resolved
src/io/stm_tests.ml Outdated Show resolved Hide resolved
src/io/stm_tests.ml Outdated Show resolved Hide resolved
src/io/stm_tests.ml Show resolved Hide resolved
src/io/stm_tests.ml Outdated Show resolved Hide resolved
@shym
Copy link
Collaborator

shym commented Jan 11, 2024

On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412

Do you think the STM tests are less likely to trigger such defects?

@jmid
Copy link
Collaborator Author

jmid commented Jan 11, 2024

Thanks for the review!

Discussing this PR with you triggered me to look up previous counterexamples reported by the Lin Out_channel test.
Indeed, the last merge to main a571972 triggered, e.g., the following for the Linux 5.2 debug workflow:

Messages for test Lin Out_channel test with Domain:

  Results incompatible with sequential execution

                                           |                  
                         Out_channel.output_byte t 2 : Ok (())
                                           |                  
                       .--------------------------------------.
                       |                                      |                  
         Out_channel.length t : Ok (1)         Out_channel.close_noerr t : ()   

This should have a possible sequentialization (below) which we seem to miss:

Out_channel.output_byte t 2;;
Out_channel.length t;;
Out_channel.close_noerr t;;

Looking at previous counterexamples also made me realize that

  • some of them are triggered by using close/close_noerr in parallel. However we won't generate such inputs in this STM test version ATM unfortunately, because they don't have a well-balanced open-close serialization. I want to experiment with lifting this limitation.
  • other Lin Out_channel counterexamples are triggered by weird output unlinearizable behaviours arising from running other cmds on already closed channels. This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS. As such, I'm more reluctant to invest in lifting it.
                                                                                                                     |                                                       
                                                                                                       Out_channel.flush t : Ok (())                                         
                                                                   Out_channel.output_bytes t "7)\016)\170\245A{(2\255\000mr\185\243\206\2475-\192i\135\2216}w" : Ok (())    
                                                                                               Out_channel.set_binary_mode t false : Ok (())                                 
                                                                                                                     |                                                       
                                                             .---------------------------------------------------------------------------------------------------------------.
                                                             |                                                                                                               |                                                       
                                               Out_channel.length t : Ok (27)                                                                                  Out_channel.close_noerr t : ()                                        
                                           Out_channel.is_buffered t : Ok (true)                                                                               Out_channel.close t : Ok (())                                         
                                           Out_channel.output_byte t 6 : Ok (())                                                                               Out_channel.pos t : Ok (65536)                                        
                                                                                                                                                           Out_channel.is_buffered t : Ok (true)                                     
                                                                                                                                       Out_channel.output_bytes t "K\226" : Error (Sys_error("Bad file descriptor"))                 
                                                                                                                                                           Out_channel.is_buffered t : Ok (true)                                     
                                                                                                                       Out_channel.output_substring t "\197\180\2484g\180\230\193" 4 7 : Error (Invalid_argument("output_substring"))
                                                                                                                                                               Out_channel.flush t : Ok (())                                         
    

Our discussion also made me check my recollection regarding locking:
https://github.com/ocaml/ocaml/blob/cedff5854ac91dccf5847b527192223ef506b1e2/runtime/io.c#L60

All operations on channels first take the channel lock.

As such, they Out_channel operations should act atomically when used in parallel and be sequential consistent.

On the other hand, despite its downsides the old Lin test has also managed to stress the runtime into triggering defects #412

Do you think the STM tests are less likely to trigger such defects?

Possibly, but I'm unsure.

  • A single Lin input will call the Out_channel interface (up to) 50 times * 1 parallel run * #interleavings times.
  • In comparison a single STM input will call the interface (up to) 25 times * 1 parallel run.

Since #interleavings can be quite high, a corner case bug is more likely to trigger more often when it is run more.
On the other hand, when considering this behaviour across a count of 1000, the negative Lin test will finish sooner (and report a "counterexample") whereas the current STM test runs to completion (1000 inputs * 25 repetitions).

@jmid
Copy link
Collaborator Author

jmid commented Jan 11, 2024

In 8236876 I extend the STM tests to be able to generate close and close_noexn in state Closed too.

@shym
Copy link
Collaborator

shym commented Jan 12, 2024

This should have a possible sequentialization (below) which we seem to miss:

In the sequentialized version, the length will most probably be 0, as the buffer is not flushed; the output we see is due to the close_noerr being half-run, having flushed but not closed yet (as those two steps are explicitly not run atomically), isn’t it?

This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS.

To make sure that I understand correctly what this is doing, because I think I missed it in my first review: due to the postcond function, we’ll never test close or close_noerr in one of the parallel branches except when the other branch is only doing open and close calls, or am I mistaken in how postcond is used in all_interleavings_ok? This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?

@jmid
Copy link
Collaborator Author

jmid commented Jan 12, 2024

This should have a possible sequentialization (below) which we seem to miss:

In the sequentialized version, the length will most probably be 0, as the buffer is not flushed; the output we see is due to the close_noerr being half-run, having flushed but not closed yet (as those two steps are explicitly not run atomically), isn’t it?

This is a probable explanation indeed! 👍
However it seems a bit unsatisfying that this stops testing and is reported as a counterexample of parallelism-safety.

This is a gray-zone specification-wise, which does not offer an immediate benefit AFAICS.

To make sure that I understand correctly what this is doing, because I think I missed it in my first review: due to the postcond function, we’ll never test close or close_noerr in one of the parallel branches except when the other branch is only doing open and close calls, or am I mistaken in how postcond is used in all_interleavings_ok? This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?

Point well taken - and yes, you are right 👍
Following the Erlang version described in the ICFP'09 paper, STM will generate candidate "Y"-triples and only keep those which satisfy all preconditions in all interleavings (which is what all_interleavings_ok should express). This indeed leaves out two parallel close* cmds.

I've therefore taken a pass over the STM test and extended the cmds so that all of them can also occur in state Closed. The only generator and precond limitation now is that Open_text is not allowed in state Open (this was also a limitation of the Lin test).

This revealed a divergence from the spec as I shared offline:

val close : t -> unit
(** Close the given channel, flushing all buffered write operations.  Output
    functions raise a [Sys_error] exception when they are applied to a closed
    output channel, except {!close} and {!flush}, which do nothing when applied
    to an already closed channel.  Note that {!close} may raise [Sys_error] if
    the operating system signals an error when flushing or closing. *)

Yet output_string, output_bytes, output, and output_substring of length 0 on a Closed channel does not fail, e.g.:

--- Failure --------------------------------------------------------------------

Test STM Out_channel test sequential failed (22 shrink steps):

   Close
   Output_string ""


+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test STM Out_channel test sequential:

  Results incompatible with model

   Close : Ok (())
   Output_string "" : Ok (())

For now, I've adjust postcond to accept these cornercases.
We should consider submitting a PR to adjust the specification though...

@jmid
Copy link
Collaborator Author

jmid commented Jan 15, 2024

This would mean those tests would never run into issues such as ocaml/ocaml#11878 I suppose?

We can now experimentally confirm that the latest changes can indeed find issues like that.
All 5.2 and 5.3/trunk workflows show a regression on outputting on a closed Out_channel, where only the first cmd now triggers an exception: 😮

+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test STM Out_channel test sequential:

  Results incompatible with model

   Close_noerr : Ok (())
   Output_string "}\169)gN\255\017" : Error (Sys_error("Bad file descriptor"))
   Output_byte 48 : Ok (())

@jmid
Copy link
Collaborator Author

jmid commented Jan 16, 2024

Some overdue CI summaries...

For 9c63209:

Out of 44 workflows 5 failed and 1 was cancelled, all 6 due to genuine issues

For 8236876:

Out of 44 workflows 5 failed with all 5 being genuine issues

For 0c8fdcb:

Out of 44 workflows 24 failed with a total of 28 failures (4 workflows failed two tests) with all of them being genuine

@jmid
Copy link
Collaborator Author

jmid commented Jan 16, 2024

The PR's latest test revision is so effective at triggering the #432 regression that we should consider temporarily relaxing the test to accept the current behaviour, to avoid having to check ~20 failing workflows on each CI run... 😬

Copy link
Collaborator

@shym shym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, thank you for the updates!

@jmid
Copy link
Collaborator Author

jmid commented Jan 23, 2024

I rebased on main to avoid more false alarms and added 86f2ab2 to temporarily disable the #432 regression reports.

@jmid
Copy link
Collaborator Author

jmid commented Jan 23, 2024

CI summary:

Out of 44 workflows 2 failed, with both being genuine issues

@jmid jmid merged commit b53236d into main Jan 23, 2024
32 of 34 checks passed
@jmid jmid deleted the io-stm-tests branch January 23, 2024 22:17
@jmid
Copy link
Collaborator Author

jmid commented Jan 24, 2024

CI summary for merge to main:

Out of 45 workflows 1 failed with a genuine issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lin Out_channel test fails to trigger on FreeBSD Long Out_channel shrinking runs cause timeout
2 participants