-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prepare a version of PloverDB for "turnkey" deployment by NCATS #1460
Comments
adding some notes after today's discussion about this: Plover's current Dockerfile is here and how-to steps for building an image/running it are here. (I just moved the running of that one script I mentioned into the Dockerfile.) so I think the main piece missing is perhaps automating the grabbing of the KG file (and specifying which Biolink model version to use)? anything else? also suggested by Eric on today's call: tweak Plover's test suite a bit so that NCATS could easily run the tests against whatever endpoint they put their Plover at (so they can verify it indeed seems to be working). |
This would likely be welcomed by NCATS ITRB. What are your thoughts about feasibility, @amykglen ? |
See also the NCATS ITRB Standard CI CD Policy document. |
There was some grumbling about SQLite databases at today's deployment call. It seemed that ITRB was trying to steer people away from SQLite and instead replace it with an external enterprise DB. They didn't close the door on it, but it seemed like they were not so happy about (this was in the context of MolePro's deployment, not ours). It seems like we could include our databases in the docker container, but this would cause some very large containers. Or alternatively we could download the databases at launch time, but this incurs a substantial delay in launch and also net high network requirements for each launch. |
Thank you @edeutsch for bringing this to the team's attention. I feel like a pragmatic approach is to specify that we need a host volume (to hold our sqlite files) in the Dockerfile. We don't have to care what the absolute path is to the volume in the host OS, that's transparent to the container. We just specify a within-container path for the volume, and it appears to us like a directory where we can write/read data without bloating the container size. https://docs.docker.com/storage/volumes/ As for the concept of a central managed database: having to update schemas and database contents in a NCATS IT-managed RDBMS would really really slow down updates and feature enhancements to ARAX. And I think it is not necessary if we just use the volume approach and initialize the sqlite files into the volume after the container is born, not as a part of the docker image. I think this should avoid bloating the image or the container. |
nice - it was pretty trivial to add; just did so and added some 'how to test' steps to the README here. |
so for building a plover docker image, @saramsey and I were figuring that it'd make the most sense for the KG2c JSON file to be because the plover Dockerfile only needs scp access when building an image (and not when running a container), I see a few options for how to make this
one downside of options 1 and 2 is that the RSA key pair persists in the image, but I suppose this is what the ARAX dockerfile will have to do anyway(?) anyway, I've figured out how to make any of these options work technically (with the help of Steve's key install Dockerfile code :)), but just wondering if anyone has input about which they think NCATS would like best... |
I just don't understand the details of docker container building and deployment to make any informed input here. My only concern that might be worth considering is: suppose the keys were compromised somehow, under this scheme the perp would potentially have access to lots more on arax.ncats.io (any file with world read?) unless we are super careful about locking things down. Might it make more sense to put the data in an S3 bucket and give NCATS read-only keys to the S3 bucket to be used for Docker image building. Should the keys be compromised somehow, there is only access to the contents of the bucket? |
fair point - I think I'm actually favoring the S3 option at this point... realized that I don't think it'd be too bad to configure |
ok, I got things working (locally) so that
where XXXX and YYYY are the keys we give them that allow them to download from a particular S3 bucket (probably would want to create a new S3 bucket for this?) I think I favor this option vs. the scp option for a few reasons:
@saramsey - does this S3 method seem ok to you? |
posting for the record: Steve and I discussed a bit and agreed to start with this S3 strategy, perhaps with some minor tweaks. but I also reached out to Amit to see if NCATS has any preferences here, since it's difficult to know what would work best for them and their automation plans. |
so I haven't heard anything from NCATS on preferences about the above, but I pushed code that uses the S3 method, with a minor tweak so that the AWS keypair is copied into the image (they just have to put a copy of their I updated the README (here) and tested to verify that everything works as expected (the KG file is successfully downloaded from S3 and everything builds fine). I think the preferred plan is that NCATS would give us a keypair that we can then grant read-only access to the S3 bucket. so I think we can call this done for now? |
I'll go ahead and close this issue - we can reopen or create a new issue if NCATS has any tweaks. |
NCATS wants to be able to deploy the KG2 KP via Docker. I think it makes sense to re-interpret this as "PloverDB", since there is a natural separation (in code, hosted instance, etc.) between PloverDB and ARAX. So, we will need to write a Dockerfile for it.
The text was updated successfully, but these errors were encountered: