Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/azure container #75

Closed
wants to merge 36 commits into from
Closed

Feature/azure container #75

wants to merge 36 commits into from

Conversation

AlexAxthelm
Copy link
Collaborator

Contains infrastrucure to Build a docker image suitable to run on Azure Container Instances, as well as a deploy template.

Coincidentally:
Closes #3
Closes #17

config.yml Show resolved Hide resolved
@jdhoffa
Copy link
Member

jdhoffa commented Jan 19, 2024

I think I'm just out of sync from being away, or maybe you and CJ already discussed this in person, but could you ELI5 this in the PR description?

I'm having a hard time following what this does.

That said, will try to test it in my machine later today, I wanted to try to get data prep running anyway to make sure I can still do it 😀

@cjyetman
Copy link
Member

I feel like completely swapping out one dependency for another ({rlog} to {logger}) should be in a separate PR, not one titled "Feature/azure container".

@AlexAxthelm
Copy link
Collaborator Author

@jdhoffa still a fair bit of testing to do on my side, but while it's cooking, I'll be writing docs for it.

@jdhoffa
Copy link
Member

jdhoffa commented Jan 19, 2024

Sweet sounds good. Let me know when it's ready for a test run!

Also agree with CJ. Very happy to swap rlog with logger, but prob deserves it's own PR.

@AlexAxthelm
Copy link
Collaborator Author

AlexAxthelm commented Jan 19, 2024

Switching to logger available as #76. Basically the same changes as have been implemented here, but isolated.

@AlexAxthelm
Copy link
Collaborator Author

@cjyetman @jdhoffa Following the discussion in #73 I'm thinking that changing the strategy for this PR makes sense.

I'd been not trying to change the logic in run_data_preparation.R (just add logging), or touch config.yml too much, which is why there's two scripts run in this PR: copy_raw_data.R (in the ACI directory), and run_data_preparation.R. Where the first one copies the raw data files from Azure files to what is the "inputs" directory for run_data_preparation.R.

I'm now thinking that the way to address the "how do we track the source files" question might best be addressed by including the paths to the raw data files in the config (see example below), and including a copy_inputs option in the config, so that if we want to isolate what were the inputs for this particular set of outputs, we can.

Sketch:

Here's a (simplified) sketch of what I'm thinking:

#config.yaml
default:
  copy_inputs: false
  # these two replace the data_prep_inputs_path:
  factset_data_path: /inputs
  asset_impact_data_path: /inputs

2022Q4_CICD
  copy_inputs: true
  factset_data_path: "/mnt/factset-extracted/factset-pacta_timestamp-20221231T000000Z_pulled-20231221T195325Z"
  asset_impact_data_path: "/mnt/rawdata/AssetImpact"
# run_data_preparation.R
# Loading config...
masterdata_ownership_path <- file.path(asset_impact_data_path, masterdata_ownership_filename) #and friends
factset_financial_data_path <- file.path(factset_data_path, "factset_financial_data.rds") # and friends

# Do all the normal data_prep stuff

if (copy_inputs) {
  inputs_copy_path <- file.path(data_prep_outputs_path, "data_prep_inputs")
  dir.create(inputs_copy_path)
  file.copy(input_files, inputs_copy_path)
}

log_info("Done!)

@jdhoffa
Copy link
Member

jdhoffa commented Jan 23, 2024

Sounds reasonable to me?

@cjyetman
Copy link
Member

cjyetman commented Mar 1, 2024

@AlexAxthelm is this superseded by one or more of the other recent PRs? can/should we close it?

@cjyetman
Copy link
Member

cjyetman commented Mar 7, 2024

@AlexAxthelm can this be closed now?

@AlexAxthelm
Copy link
Collaborator Author

Closing this, but not deleting the branch yet, since there's some elements from that that I'd like to bring into main, I just need to get some time to do that.

@AlexAxthelm AlexAxthelm closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: write STDOUT log to file determine a better way to share secrets with the container
3 participants