Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/dist: "dist test" overhead increased by more than x10 from 1.9 to 1.10 on plan9/arm #24300

Closed
millerresearch opened this issue Mar 7, 2018 · 9 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-Plan9
Milestone

Comments

@millerresearch
Copy link
Contributor

The new style plan9/arm buildlet (running on a Raspberry Pi 3) has become impossibly slow in go1.10 because of increased overhead in the "dist test" which is run for each test shard.

Previously:

cpu% go version
go version go1.9.4 plan9/arm
cpu% time go tool dist test go_test:bufio

##### Testing packages.
ok  	bufio	0.634s

ALL TESTS PASSED (some were excluded)
0.05u 0.32s 14.84r 	 go tool dist test go_test:bufio ...

But now:

cpu% go version
go version go1.10 plan9/arm
cpu% time go tool dist test go_test:bufio

##### Testing packages.
ok  	bufio	0.677s

ALL TESTS PASSED (some were excluded)
0.49u 1.80s 194.42r 	 go tool dist test go_test:bufio ...

An extra 180s for each shard, times 50+ shards, adds 2.5 hours to the buildlet run time.

The relevant change appears to be this new code in src/cmd/dist/test.go:

// Complete rebuild bootstrap, even with -no-rebuild.
// If everything is up-to-date, this is a no-op.
....
if !t.listMode {
    goInstall("go", append([]string{"-i"}, toolchain...)...)
    goInstall("go", append([]string{"-i"}, toolchain...)...)
    goInstall("go", "std", "cmd")
    checkNotStale("go", "std", "cmd")
}

Note the comment; this must be a candidate for the most expensive no-op ever.
In the buildlet context, where we have just built everything with make.rc, could we possibly skip this staleness check?
Alternatively, in the context of a single buildlet with no helpers, could we skip the sharding and run all the tests in a single invocation of "dist test" to avoid multiplying the overhead?

@andybons andybons added OS-Plan9 help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Mar 7, 2018
@andybons andybons added this to the Go1.11 milestone Mar 7, 2018
@andybons
Copy link
Member

andybons commented Mar 7, 2018

/cc @bradfitz for a decision on skipping the staleness check

@bradfitz
Copy link
Contributor

bradfitz commented Mar 7, 2018

Ouch. Investigating.

@rsc added that in 8f70e1f.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/99296 mentions this issue: cmd/dist: skip rebuild before running tests when on the build systems

@bradfitz
Copy link
Contributor

bradfitz commented Mar 7, 2018

Well, I sent https://go-review.googlesource.com/#/c/go/+/99296 which helps when running under the builders, since the builders will always run make.whatever first.

But I'm not sure whether that helps you in your local development case. I'm not sure why it's taking an extra 180s. Is it just that slow, even in the no-op case? Or is it incorrectly doing work?

gopherbot pushed a commit that referenced this issue Mar 7, 2018
Updates #24300

Change-Id: I7752dab67e15a6dfe5fffe5b5ccbf3373bbc2c13
Reviewed-on: https://go-review.googlesource.com/99296
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@millerresearch
Copy link
Contributor Author

I'm not sure why it's taking an extra 180s. Is it just that slow, even in the no-op case?

The staleness check (which is also what a no-op "go install" amounts to) is walking the source tree and seems to be opening each .go file two or three times. On linux this is mitigated by aggressive cacheing of directories and inodes, but the plan9/arm builder is running off a networked server where walk / open / stat all bypass the local cache and require an rpc to the server.

cpu% iostats go list '-f={{if .Stale}} STALE {{.ImportPath}}: {{.StaleReason}}{{end}}' std cmd

read      53299911 bytes, 2686.844 Kb/sec
write     450 bytes, 344.9408 Kb/sec
protocol  58920745 bytes, 17974.73 Kb/sec
rpc       109301 count

Message    Count   Low  High  Time Averg          T       R
version        1     1     1     1     1 ms       19       19 bytes
attach         1     0     0     0     0 ms       25       20 bytes
walk       33742     0    46  9492     0 ms  1391690  1755374 bytes
open       23103     0   977 27805     1 ms   277236   554472 bytes
create         2     4     4     8     4 ms       70       48 bytes
read       16536     0    48 19372     1 ms   380328 53481807 bytes
write         50     0     0     1     0 ms     1600      550 bytes
clunk      30331     0    26 11231     0 ms   333641   212317 bytes
remove         2     1     1     2     1 ms       22       14 bytes
stat        5239     0    22  7671     1 ms    57629   453429 bytes
wstat        294     1    18   629     2 ms    18228     2058 bytes

Opens    Reads  (bytes)   Writes  (bytes) File
   11       22     5324        0        0 /dev/sysstat
   11       22     1562        0        0 /dev/swap
    1        2     4866        0        0 /usr/miller/go/src
    2        4      280        0        0 /usr/miller/go/src/archive
    3        6     3309        0        0 /usr/miller/go/src/archive/tar
    3        6    32382        0        0 /usr/miller/go/src/archive/tar/common.go
    2        2     2952        0        0 /usr/miller/go/src/archive/tar/example_test.go
    3        5    19554        0        0 /usr/miller/go/src/archive/tar/format.go
    3        7    34559        0        0 /usr/miller/go/src/archive/tar/reader.go
    2        2     8192        0        0 /usr/miller/go/src/archive/tar/reader_test.go
    2        2      828        0        0 /usr/miller/go/src/archive/tar/stat_actime1.go
    2        2      828        0        0 /usr/miller/go/src/archive/tar/stat_actime2.go
    2        2     6154        0        0 /usr/miller/go/src/archive/tar/stat_unix.go
    3        5    17365        0        0 /usr/miller/go/src/archive/tar/strconv.go
    2        2     8192        0        0 /usr/miller/go/src/archive/tar/strconv_test.go
    2        2     8192        0        0 /usr/miller/go/src/archive/tar/tar_test.go
    3        6    26797        0        0 /usr/miller/go/src/archive/tar/writer.go
    2        2     8192        0        0 /usr/miller/go/src/archive/tar/writer_test.go
    1        2     3545        0        0 /usr/miller/go/src/archive/tar/testdata
    3        6     2109        0        0 /usr/miller/go/src/archive/zip
    2        2     4064        0        0 /usr/miller/go/src/archive/zip/example_test.go
    3        6    26111        0        0 /usr/miller/go/src/archive/zip/reader.go
    2        2     8192        0        0 /usr/miller/go/src/archive/zip/reader_test.go
    3        4    11262        0        0 /usr/miller/go/src/archive/zip/register.go
    3        5    19259        0        0 /usr/miller/go/src/archive/zip/struct.go
    3        5    23489        0        0 /usr/miller/go/src/archive/zip/writer.go
    2        2     8192        0        0 /usr/miller/go/src/archive/zip/writer_test.go
    2        2     8192        0        0 /usr/miller/go/src/archive/zip/zip_test.go
    1        2     2184        0        0 /usr/miller/go/src/archive/zip/testdata
    3        6     1413        0        0 /usr/miller/go/src/bufio
    3        6    26233        0        0 /usr/miller/go/src/bufio/bufio.go
    2        2     8192        0        0 /usr/miller/go/src/bufio/bufio_test.go
    2        2     6692        0        0 /usr/miller/go/src/bufio/example_test.go
    2        2     1194        0        0 /usr/miller/go/src/bufio/export_test.go
    3        5    21689        0        0 /usr/miller/go/src/bufio/scan.go
    2        2     8192        0        0 /usr/miller/go/src/bufio/scan_test.go
    ... and so on

@millerresearch
Copy link
Contributor Author

Well, I sent https://go-review.googlesource.com/#/c/go/+/99296 which helps when running under the builders, since the builders will always run make.whatever first.

Yes, that's an improvement. Buildlet for plan9/arm now completes in just under 3 hours. Still considerably slower than the legacy all.rc which was taking less than 2 hours.

@millerresearch
Copy link
Contributor Author

There are still two calls to checkNotStale in the runPending function in cmd/dist/test.go, each taking about 30 seconds on the plan9/arm builder.

  1. If checkNotStale is called before and after each set of tests, aren't half the calls redundant? (How can anything become stale between finishing one set of tests and starting the next?)
  2. In the builder context, where make.rc has been done before we start, what is the purpose of the checkNotStale calls? Is it to guard against anyone writing a test which inadvertently changes something?

Maybe @rsc can advise?

@millerresearch
Copy link
Contributor Author

I'm content for this issue to be closed. For plan9_arm buildlets with an external file server, we've added a more aggressive cache (this one), which brings the time down to just over an hour.

@ALTree
Copy link
Member

ALTree commented Apr 26, 2018

Closing as @millerresearch asked.

@ALTree ALTree closed this as completed Apr 26, 2018
@golang golang locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-Plan9
Projects
None yet
Development

No branches or pull requests

5 participants