-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test failures in OpenBSD #483
Comments
|
|
I tried |
Has this started failing after we switched to fork()/exec() ? If you were able to build from |
I failed to build 2016-01-10-beta branch. I get errors because I built commit 6815d6c (the parent of 566ec82, "Use fork()/exec() instead of posix_spawn() by default"), and the first run of
|
FWIW, @kernigh, even though I don't use OpenBSD on a regular basis I have been building and running the ksh unit tests on a OpenBSD 6.2 VM. As well as macOS and FreeBSD. I'm seeing the same problems you've reported in this issue. Thank you very much for your detailed and concise problem reports which help ensure people are aware of these problems and we don't lose track of them. Obviously the AT&T team was either not aware of, or wasn't particularly concerned about, the problems with the code on modern BSD based systems. |
Also, the locale specific tests that currently fail because a locale is not supported need a change similar to those I've made recently to the src/cmd/ksh93/tests/wchar.sh unit test to determine whether the system supports a given locale. This supports the usual variants and maximizes test coverage regardless of the system:
|
The problem is that OpenBSD supports a locale named "x", but |
That's not really the correct characterization of the situation. Every system I'm familiar with supports a locale named "x". In as much as they all treat it as equivalent to the "C" (or "POSIX" if you prefer) locale assuming you haven't actually created a locale named "x". The difference is that only on OpenBSD does I use I hate to do it but we should probably just special-case OpenBSD for this test and bypass the test with a warning. |
Also, @kernigh, I see that my second most previous comment completely missed the point of the issue with respect to setting |
I have no objection to platform specific tests. Rather than disabling or removing them globally we should simply ensure they only run on platforms where they make sense. Having said that given the nature of ksh it does seem odd to test for behaviors that are not universally valid. |
Some of the /dev/fd/* and /proc paths become valid (in OpenBSD) if I use the builtin mkdir:
We don't build the builtins of
I don't see the TODO in testlog.txt. The |
I'm afraid to look at the code to see what it's doing. Having a command as a builtin for speed or predictability of its flags and behavior is fine. Having a builtin that pretends to support a feature the OS doesn't support is questionable at best. |
Partial fix for att#483
The value of ${.sh.sig.code} is SI_QUEUE if the system supports sigqueue(), or SI_USER otherwise. Partial fix for att#483
OpenBSD accepts "x" as an alias of "C", and I can't use `locale -a` to detect this. If $(uname) is OpenBSD, then use "x.x" as the unknown locale. Most other systems agree that "x" is unknown. If `locale -a` shows that someone added the "x" locale, then skip the test. Partial fix for att#483
The value of ${.sh.sig.code} is SI_QUEUE if the system supports sigqueue(), or SI_USER otherwise. Partial fix for #483
OpenBSD accepts "x" as an alias of "C", and I can't use `locale -a` to detect this. If $(uname) is OpenBSD, then use "x.x" as the unknown locale. Most other systems agree that "x" is unknown. If `locale -a` shows that someone added the "x" locale, then skip the test. Partial fix for #483
Regarding the
Note that function |
@siteshwar wrote:
I not only vote that we disable those tests I also vote we revert the code in src/lib/libast/path/pathcanon.c that supports the idiom on systems that don't natively support it. This is another example of trying to impose AST behavior on the world by expecting everyone to use the libast code. That isn't going to happen. Lots of other shells, compliant with POSIX 1003 (e.g., bash and zsh) and not compliant (e.g., fish and elvish), manage to achieve wide acceptance without resorting to these tricks. I can appreciate that the AST team was trying to innovate. But this libast feature is an example of innovation that is not central to ksh, will probably never be widely adopted, and should be dropped in order to focus on keeping ksh relevant. |
I tested d952fdf (May 7, 2018) in OpenBSD/amd64 6.3 with my patch to fix strtold() in OpenBSD libc. The only failures were in |
After the addition of the API/* tests, a test times out and Meson gets stuck:
Not a bug in ksh. My report to Meson is "PermissionError when a test times out in OpenBSD", mesonbuild/meson#3569 |
@kernigh I'm seeing the same thing. So there is obviously a bug in Meson. However, I can't help but think this behavior is a clue to something that is wrong with how ksh behaves on OpenBSD. |
I cloned Meson's Git, made a symlink from
Now Meson kills tests that time out. These tests aren't OK:
|
Here's the problem with API/tlock.c in OpenBSD:
The test sleeps too much. The spinlock uses compare-and-swap and To measure the sleep time, I wrote this test program: #include <sys/time.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv) {
struct timeval t1, t2, ti;
int i;
char c;
if (argc != 2 || sscanf(argv[1], "%d %c", &i, &c) != 1) {
fprintf(stderr, "usage: measure integer\n");
return 1;
}
gettimeofday(&t1, NULL);
usleep(i);
gettimeofday(&t2, NULL);
timersub(&t2, &t1, &ti);
printf("usleep(%d) took %lld s %lld us\n", i, (long long)ti.tv_sec, (long long)ti.tv_usec);
return 0;
}
The sleep is always between 10000 and 20000 microseconds.
I tried changing the timecounter with commands as root like The kernel's clock runs at 100 hz, so each tick has 10000 microseconds. In libc, usleep() calls nanosleep(). In the kernel, nanosleep() converts the sleep time to ticks. So |
Good debugging, @kernigh. I'm actually surprised the test presumes it can sleep for ~100us. Because on older operating systems minimum sleep durations were typically on the order of 1ms, 10ms, or even 100ms. So I would expect this test to fail on a lot of operating systems unless run with a ridiculously long timeout. I strongly suspect that historically the AST team ran these tests with no (or a huge) timeout. If you run the tests with a large multiplier (e.g., |
@krader1961 just taught me that Running
This run has 7 failures:
|
Thanks for that work, @kernigh. I'd really like to see us reduce the run time of those tests to something more reasonable given what they do. But as a stopgap measure I'd be okay with changing the default timeout for the API tests to five minutes. That is, adding a |
Several unit tests utilize very short duration sleeps (e.g., 100 usec) while waiting for something to happen. The problem with this is that on some platforms the minimum sleep interval might be 1, 10, even 100 msec. Which means the test can take one or two orders of magnitude longer to run on those platforms compared to a platform where sleeps are more granular. So instead of sleeping simply yield execution to another runable thread. This reduces the unit test run time for me from 49 minutes to 20 minutes on OpenBSD in a virtual machine. And from 35 minutes to 16 minutes on Cygwin. Partial fix for #483 Partial fix for #546
I'm still using a patch for OpenBSD libc to fix the arith, comvario, treemove failures. The patch went into OpenBSD-current, and should be in the next release (OpenBSD 6.4), but I am still running OpenBSD 6.3 on amd64. The patch only affects alpha, amd64, arm, i386. Thanks to recent changes by @krader1961, API tests no longer take an annoyingly long time to run. I saw no API test failures today. I got 7 failures: pty failed as usual, sigchld randomly passed or failed as usual, but arrays is a new failure:
This is the new failure in arrays. I don't know the cause of failure, but I might want to look at vmalloc removal (#396) first.
The failure in sigchld has a new message:
This is the backtrace from
|
Thanks, @kernigh, for your excellent, well written, observations. I'm working on a change to reintroduce the AST Vmalloc |
I now confirm that commit 489c672 (Replace calls to AST Vmalloc with system malloc) causes the arrays test to fail. |
The
|
/facepalm The failure is in this block of code:
Specifically, the This failure can occur on any distro and theoretically even with AST Vmalloc rather than the system malloc. Whether or not the SIGSEGV occurs is simply a consequence of whether or not the second of two var names passed to this function happens to end on a page boundary with no adjacent mapped page. Something that is unlikely but not impossible. |
Double /facepalm The logic appears to make an incorrect assumption. It's trying to determine if two consecutive var names have a shared compound var name prefix. But unless there are non-obvious constraints on the caller of For those interested in more details the problem is seen here:
The problem occurs because the var name pointed to by |
The
It should be copying |
LOL and crying at the same time. The code block in my previous comment is broken in a more serious way. The A rough guess is 90% of the explicit casts are not needed. Or wouldn't be needed if |
The
This is from the final test which spawns 512 Too, that test is pointless in addition to being broken by virtue of relying on an incorrect assumption about how PIDs are assigned. It would be better if it tracked the PID of each sleep put in the background then verified we processed a SIGCHLD for each one. And at the end verified there were no PIDs for which we did not run the SIGCHLD trap. |
Closing since I've merged all the changes needed to have testing on OpenBSD have failure rates comparable to other platforms. That is, there are still a few bugs such as #576 which manifest occasionally on OpenBSD but they affect all platforms in a random fashion. So I'm closing this since all the systemic failures have been addressed. |
These are the other test failures in OpenBSD, after the recent fixes of #429 (zzz problem) and #443 (vfork problem). I attach the test log from commit f5b9c64 in OpenBSD 6.3 on amd64.
Attachment: testlog.txt
arith, comvario, treemove: bug in OpenBSD
Not a bug in ksh. This is a bug in OpenBSD/amd64, where libc functions like strtod(3) flip the sign so "nan" becomes -nan and "-nan" becomes nan. I hacked my libc to fix strtod(), but I didn't fix strtold(), so I still see failures. I sent a bug report to OpenBSD: https://marc.info/?l=openbsd-bugs&m=152391654313171&w=2
directoryfd: no /proc, no /dev/fd/11/*
OpenBSD has no /proc directory. It does have /dev/fd/11, an fd(4) device node. To open /dev/fd/11 is to duplicate file descriptor 11. Paths like /dev/fd/11/test1 don't work because they don't open /dev/fd/11, so /dev/fd/11 is still a device node, not a directory.
OpenBSD doesn't have these paths, but the test expects them to work. Does ksh emulate these paths? Perhaps mkdir must be a builtin and not /bin/mkdir?
pty: problem with pty or vi mode
This uses pty to test vi mode in ksh,
set -o vi
. I tried entering vi mode. Commands like/
,?
,n
,N
seem to work as well in ksh as in other shells. I don't know if another command in vi mode is broken, or if pty is broken.sh_match: no ulimit -M or -V
The "as:" error comes from ulimit -M, and "vmem:" from ulimit -v, in the line:
OpenBSD doesn't support ulimit -M and -v. Now ksh knows this, so
ulimit -a
prints "not supported" for both -M and -v. The test doesn't know to ignore the errors.In OpenBSD, ulimit -d controls memory: getrlimit(2) says of RLIMIT_DATA,
In some other systems, RLIMIT_DATA only controls sbrk(2), and different limits control other allocators like mmap(2).
sigchld: unknown problem
In this run, sigchld/shcomp failed but sigchld succeeded. Some failures that don't happen in every test run; this seems to be one of them.
signal: no sigqueue(2)
OpenBSD doesn't have sigqueue(2), so ksh doesn't use it. The test tries to pass 4 and 5 through sigqueue(2); the values get lost. One might modify the test to accept systems where both values are 0.
variables: locale "x" is valid
All these tests expect that "x" isn't a valid locale name. OpenBSD takes "x" as a valid alias of "C", because setlocale(3) says,
So "x.x" is a bad locale name in OpenBSD. One might assign
bad=x.x
and testLC_ALL=$bad
. If different systems needed different bad names, then one might check$(uname)
and assignbad=x
,bad=x.x
, or another value.The text was updated successfully, but these errors were encountered: