Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: daemon should recognize previously created repos. #212

Merged
merged 11 commits into from
Mar 16, 2018

Conversation

JonKrone
Copy link
Contributor

@JonKrone JonKrone commented Feb 28, 2018

While working on some go/js interop tests I noticed that a daemon spawned on an already-initialized repo doesn't report itself as initialized.

It turned out to be just a typo so I fixed that and added a test.

Update

This PR has four fixes related to proc node startup:

  • proc nodes error when reporting their version: here
  • proc nodes spawn before their IPFS instance has finished booting: here
  • proc nodes do not report themselves as initialized: here
  • daemons spawned remotely do not report themselves as initialized. here, here, here, here

@JonKrone JonKrone self-assigned this Feb 28, 2018
@ghost ghost added the status/in-progress In progress label Feb 28, 2018
@@ -141,6 +141,11 @@ describe('Spawn options', () => {
expect(_ipfsd).to.exist()
expect(_ipfsd.api).to.not.exist()

// proc nodes do not reuse initialized repos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no reason why we can't do this in 'proc' nodes as well.

Copy link
Contributor Author

@JonKrone JonKrone Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that proc nodes always set their initialized to false (here) so having this check would avoid a failure once an issue with proc nodes (line 122 above) is resolved.

@JonKrone
Copy link
Contributor Author

@dryajov Travis is failing on browser tests for my changes. There may be an obvious answer for this, but: how are browser tests being run? When I run test:browser, no tests run and the output is just an error (below).

0 passing (1m)

Error: Some tests are failing

As I understand, aegir looks for a test/browser.js file, though this project does not have one.
There are clearly browser tests being run on CI so I'm a little confused. I've checked travis config but no reference to anything but npm run test

@JonKrone
Copy link
Contributor Author

JonKrone commented Feb 28, 2018

Ah! Figured it out. aegir fallsback on all tests with .spec.js. Turns out my problem was not having Chrome installed! I had a development version of Chrome but it didn't recognize that.

@JonKrone
Copy link
Contributor Author

JonKrone commented Feb 28, 2018

@dryajov I've looked into why the browser tests are failing and think it's a bug.

The daemon that the browser's DaemonClient is communicating with is initialized but the browser's DaemonClient doesn't recognize that. If you try to call init on the client's daemon, it errors with ipfs configuration file already exists

The flow is, I think:

  1. browser creates factory, remote: true so use FactoryClient
  2. browser spawns daemon from factory with a repoPath that is already initialized
    • browser sends request to /spawn endpoint
    • ipfsd-ctl spawns a daemon from FactoryDaemon
      • This spawned daemon has initialized === true
    • ipfsd-ctl responds with the {id, api} of the spawned daemon
    • browser creates a DaemonClient with {id, api}
      • This DaemonClient has initialized === false
  3. DaemonClient is the ipfsd instance with an api we can send commands through.
    • calling ipfsd.init fails

Should we instead carry the initialized through the /spawn api and feed it to the DaemonClient?

@JonKrone
Copy link
Contributor Author

I've just pushed changes for the above problem. Happy to change or revert it if you want me to.

There's is a race condition I was running into during browser tests for ipfsd.start on an initialized repo with a go daemon. I am working on a Windows machine and it may be something that is resolved with the /shutdown work. The daemon failed to start on top of a repo because it was still locked.

@JonKrone JonKrone requested a review from dryajov March 1, 2018 16:29
@JonKrone
Copy link
Contributor Author

JonKrone commented Mar 1, 2018

Current status: CI inconsistent but overall Jenkins and Travis show that things are okay.

Jenkins passed Windows 9.2 but failed 8.9 with a daemon.stop race condition (shutdown related).

Travis passed Node 6 and Node 8 tests passed but coverage failed. A similar thing happened with previous CI: Travis failed during coverage (but on different tests than this time).

Ready for review

This was referenced Mar 6, 2018
@dryajov
Copy link
Member

dryajov commented Mar 6, 2018

Pity that we don't get CircleCi builds - I'm guessing its because this is a PR from a fork. The existing CI issues indeed look "normal" - windows builds fail because of lack of #205 (as you mention).

Copy link
Member

@dryajov dryajov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides lint.

@@ -137,11 +137,16 @@ describe('Spawn options', () => {
}

f.spawn(options, (err, _ipfsd) => {
ipfsd = _ipfsd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will fail lint - can you run npm run lint on the repo and make sure everything passes please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have and do not see any failures. Were you expecting it to fail because it's assigning a global? ipfsd is defined above, on line 124 so here we are just setting it.

I moved this assignment from below because if any of the below expects failed, it would inadvertently cause the next its to fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually eslint will complain if the first line in the callback is not handling the error, but perhaps its OK since it happens right after ipfsd = _ipfsd. I would still move the assignment below the expectss since if there was an error _ipfsd would be null/empty/invalid.

expect(err).to.not.exist()
expect(_ipfsd).to.exist()
expect(_ipfsd.api).to.not.exist()

ipfsd = _ipfsd
// proc nodes don't reuse initialized repos
if (fOpts.type !== 'proc') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add this functionality to proc nodes - there is no reason they can't do that. But I'm fine if its in another PR. Could you file an issue to add this please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh cool. I really haven't even looked at in-proc nodes but this is an opportunity. I've filed an issue and am happy to work on it later. #214

@@ -52,7 +52,7 @@ class Daemon {
this.disposable = this.opts.disposable
this.exec = this.opts.exec || process.env.IPFS_EXEC || findIpfsExecutable(this.opts.type, rootPath)
this.subprocess = null
this.initialized = fs.existsSync(path)
this.initialized = fs.existsSync(this.path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@codecov
Copy link

codecov bot commented Mar 7, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@1f3f172). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #212   +/-   ##
=========================================
  Coverage          ?   87.03%           
=========================================
  Files             ?       17           
  Lines             ?      671           
  Branches          ?        0           
=========================================
  Hits              ?      584           
  Misses            ?       87           
  Partials          ?        0
Impacted Files Coverage Δ
src/ipfsd-client.js 97.26% <100%> (ø)
src/endpoint/routes.js 81.11% <100%> (ø)
src/factory-client.js 60.52% <100%> (ø)
src/ipfsd-daemon.js 91.83% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f3f172...7756924. Read the comment docs.

@JonKrone
Copy link
Contributor Author

JonKrone commented Mar 7, 2018

@dryajov

I have added the fix for in-proc nodes to this PR: #214. It ended up being a two-liner:

I think that they are in-line with the original intent of this PR. If we'd prefer to separate them, I am happy to revert.

Edit: Whoops, didn't run browser tests. Looking into it.

@@ -131,6 +131,7 @@ class FactoryInProc {
const node = new Node(options)

series([
(cb) => node.exec.on('ready', cb),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use once - this event might be triggered several times.

@@ -29,11 +30,11 @@ class Node {
this.path = this.opts.repoPath
this.repo = createRepo(this.path)
this.disposable = this.opts.disposable
this.initialized = fs.existsSync(this.path)
Copy link
Member

@dryajov dryajov Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail in the browser, since it wouldn't be a path... Maybe we can use the ipfs-repo module here to verify it's a repo - tho I've little experience with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew! Took me a minute to realize we use package.json to define aliases for some files on browsers. That threw me for a loop.

There's a good possibility from ipfs-repo: _isInitialized(callback). I'm about to push something, just running all tests. Lemme know what you think.


node.once('ready', () => {
node.version((err, _version) => {
if (err) { callback(err) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs return.

Copy link
Contributor Author

@JonKrone JonKrone Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Thanks, you really have an eye for these. Perhaps we could write a lint warning for this?

Copy link
Contributor Author

@JonKrone JonKrone Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diasdavid Looks like there already is an ESLint rule for this: https://eslint.org/docs/rules/callback-return

Adding a lint rule to aegir would be a big change across our projects but it might be worth it for this one as it's not about style but functionality and could catch bugs. Have you guys looked into it before?

It might be okay to set it with error severity but we would want to pre-run some repos to check. We could alternatively phase it in as a warning and do a fix chore of key repos during that period, bumping it to an error afterwards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say enable the lint rule in aegir as an error and see what happens.

node.version((err, _version) => {
if (err) { callback(err) }
callback(null, _version)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as #205 (comment)

@@ -131,6 +135,19 @@ class FactoryInProc {
const node = new Node(options)

series([
(cb) => node.exec.once('ready', cb),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as #205 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something isn't quite right here. the node is created but then what we listen on the ready event is exec??

@diasdavid Can you rephrase? node.exec is the IPFS instance being run by the daemon node. What's going on is that new Node() instantiates the daemon and returns immediately with a still-booting IPFS instance under exec. We could ask consumers to wait for their in-proc daemon's IPFS to be ready but that is inconsistent with the go and js factories, which deliver ready instances.

try {
node.repo._isInitialized(() => {
node.initialized = true
cb()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not call callbacks within try blocks, it results on any error in the future that gets thrown (in any part of the code) to get caught by this try block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! That's good to know

@JonKrone
Copy link
Contributor Author

@dryajov looking at the js daemon, if you pass a custom IPFS exec: js-factory.spawn({ exec: new IPFS({}) }), the daemon returned by factory.spawn would still be booting. Do we expect users to know about this and to pass a 'ready' IPFS or wait to be ready after spawning? Could be, just curious.

https://github.com/ipfs/js-ipfsd-ctl/blob/master/src/ipfsd-daemon.js#L53
and
https://github.com/ipfs/js-ipfsd-ctl/blob/master/src/factory-daemon.js#L132

@dryajov
Copy link
Member

dryajov commented Mar 12, 2018

new IPFS() is not a valid parameter for exec - it expects a class/function declaration not instantiation. Its also not a valid parameter for go or js types. So the behavior is undefined in those cases.

@dryajov
Copy link
Member

dryajov commented Mar 14, 2018

@JonKrone can we get master rebased here. It should resolve the concerns that @diasdavid raised.

…ing.\nUse ipfs-repo._isInitialized to determine whether they are or not. I'm curious it throws instead of returning false.
…. It doesn't need a try/catch at all, I mistook the behavior of _isInitialized.
@JonKrone
Copy link
Contributor Author

PR is up to date. Jenkins failed due to a macos EADDIRINUSE error (has an issue: ipfs-inactive/jenkins#97) but everything else is ✅

I've since re-run CI and hit a couple of different issues:

  1. macos 9.2 failed to download go-ipfs_v0.4.13_darwin-amd64.tar.gz. I saw this issue once before, many weeks ago, and I'm not sure if it is a rare intermittent issue or related to something else: https://ci.ipfs.team/blue/organizations/jenkins/IPFS%2Fjs-ipfsd-ctl/detail/PR-212/16/pipeline/14

  2. windows 9.2 seemed to hang on [5/5] Building fresh packages... of the yarn --mutex network command for about 30 minutes. I stopped it, concerned about wasting resources. I've since learned that it may have been waiting for a build elsewhere on the network to complete before running its final step but I'm not sure. Does someone know if it was hanging or just paused? https://ci.ipfs.team/blue/organizations/jenkins/IPFS%2Fjs-ipfsd-ctl/detail/PR-212/14/pipeline

@victorb
Copy link
Member

victorb commented Mar 15, 2018

PR is up to date. Jenkins failed due to a macos EADDIRINUSE error (has an issue: ipfs-inactive/jenkins#97) but everything else is white_check_mark

Yeah, we have something weird with aegir itself or the tests it runs that doesn't properly stop resources when it's done and some processes ends up running, so the next run on the same project + worker leads to collisions...

macos 9.2 failed to download go-ipfs_v0.4.13_darwin-amd64.tar.gz. I saw this issue once before, many weeks ago, and I'm not sure if it is a rare intermittent issue or related to something else: https://ci.ipfs.team/blue/organizations/jenkins/IPFS%2Fjs-ipfsd-ctl/detail/PR-212/16/pipeline/14

There seems to be some DNS flipping happening with dist.ipfs.io currently, which is why this error is happening, so seems intermittent.

windows 9.2 seemed to hang on [5/5] Building fresh packages... of the yarn --mutex network command for about 30 minutes.

I think windows is just very slow on installing. The --mutex network flag does indeed make sure only run one instance of yarn, but currently is locally to the worker, to prevent concurrent installs (neither yarn or npm handles those very good).

30 minutes sounds like an aweful lot though, I'll look into it.

@daviddias
Copy link
Member

@JonKrone please start opening PR from branches from the repo itself to make it simpler to collaborate on a WIP PR.

@daviddias
Copy link
Member

image

Reading the above, LGTM. Great work @JonKrone :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants