Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/md5 usedforsecurity #17866

Merged

Conversation

jlloyd-widen
Copy link
Contributor

Summary & Motivation

For FIPS enabled systems the MD5 function is disabled in openssl. Since Dagster is using hashlib.md5 in various locations (dagster, dagster-dbt, and dagster-k8s), on a FIPS enabled environment the UI will deliver the following error when trying to load the code location:

ValueError: [digital envelope routines: EVP_DigestInit_ex] disabled for FIPS

  File "/opt/dagster/app/venv/lib/python3.11/site-packages/dagster/_grpc/server.py", line 609, in _get_serialized_external_repository_data
    external_repository_data_from_def(
  File "/opt/dagster/app/venv/lib/python3.11/site-packages/dagster/_core/host_representation/external_data.py", line 1341, in external_repository_data_from_def
    asset_graph = external_asset_graph_from_defs(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dagster/app/venv/lib/python3.11/site-packages/dagster/_core/host_representation/external_data.py", line 1531, in external_asset_graph_from_defs
    atomic_execution_unit_id = assets_def.unique_id
                               ^^^^^^^^^^^^^^^^^^^^
  File "/opt/dagster/app/venv/lib/python3.11/site-packages/dagster/_core/definitions/assets.py", line 1254, in unique_id
    return hashlib.md5((json.dumps(sorted(self.keys))).encode("utf-8")).hexdigest()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A web search indicates that flagging such hashlib.md5 uses with the usedforsecurity=False parameter will resolve this error. As far as I can ascertain, each of the modified usages are indeed NOT used for the security of the md5 algorithm but instead used to determine the uniqueness of the item(s) being hashed. If this is not the case, my PR will need to be corrected.

How I Tested These Changes

I have deployed these changes on my own companies FIPS-enabled, k8s-based systems and seen the error resolved.

@jamiedemaria
Copy link
Contributor

maybe @alangenfeld or @smackesey or @gibsondan might be good reviewers for this?

@alangenfeld
Copy link
Member

correct these are not security based hashes so i think this PR is fine once the inlines are addressed

jamiedemaria
jamiedemaria previously approved these changes Nov 9, 2023
Copy link
Contributor

@jamiedemaria jamiedemaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. It looks like buildkite is failing on ruff you'll need to run make ruff and push up the changes

@alangenfeld
Copy link
Member

oof, looks like the usedforsecurity argument is new in 3.9 and we support 3.8 so we may need a more clever solution to handle the back compat

@jlloyd-widen
Copy link
Contributor Author

oof, looks like the usedforsecurity argument is new in 3.9 and we support 3.8 so we may need a more clever solution to handle the back compat

@alangenfeld Attempting to try/catch it

@alangenfeld alangenfeld dismissed jamiedemaria’s stale review November 27, 2023 15:47

dismissing due to substantive changes

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to capture this in a shared utility function ie non_secure_md5 and use that at each callsite.

The implementation oft this function could instead check the python version to decide whether to use the usedforsecurity kwarg instead of resorting to try catch as well.

@jlloyd-widen
Copy link
Contributor Author

I think it would be preferable to capture this in a shared utility function ie non_secure_md5 and use that at each callsite.

@alangenfeld , perhaps I'm unaware of how you implement utility functions but 3 of the instances of the usage of these md5 calls are in complete separate packages (i.e., dagster core, dagster-dbt, and dagster-k8s). Is there a way to share a function among them all? Or do I need to add a new function, as you described, to each package?

@alangenfeld
Copy link
Member

dagter-dbt and dagster-k8s depend on dagster core, so you can throw something in core maybe in dagster._utils and use it in all three. We currently enforce that all dagster packages are on the same version.

@alangenfeld
Copy link
Member

alangenfeld commented Dec 1, 2023

nice, i think this is close. Ran the test suite https://buildkite.com/dagster/unit-tests/builds/245

  • looks like we are calling encode twice a few places. Can drop it from callsites
  • adding some type hints likely useful
  • make ruff to fix formatting

@jlloyd-widen
Copy link
Contributor Author

@alangenfeld Thanks for the feedback. I think I've incorporated most of that feedback. Happy to take further feedback.

@alangenfeld
Copy link
Member

unblocked builds again - this should be good to go once the build is green

@jlloyd-widen
Copy link
Contributor Author

Looks like pyright doesn't appreciate our method. Not sure how to address that ...

@alangenfeld
Copy link
Member

No parameter named "usedforsecurity" (reportGeneralTypeIssues)

will just have to use # type: ignore on this since the type checker cant tell you are py version gating

For the other callsite, I think you could either

  • # type: ignore it
  • maybe change the function to non_secure_md5_hash_str and move the encode and hashdigest calls in to the utility fn and type it as str -> str

@jlloyd-widen
Copy link
Contributor Author

Alright, made those changes. Should be good to go 🤞

@jlloyd-widen
Copy link
Contributor Author

Pyright again doesn't like something, but this time it doesn't appear to be related to the things I have changed, except that it's in the same file as one of my changes ...

@alangenfeld
Copy link
Member

Hmm thats frustrating. Why don't you rebase/merge to resolve the conflict and if the weird error is still present i will just merge it and deal with it.

@jlloyd-widen
Copy link
Contributor Author

Hmm thats frustrating. Why don't you rebase/merge to resolve the conflict and if the weird error is still present i will just merge it and deal with it.

done.

@alangenfeld alangenfeld merged commit 55f7522 into dagster-io:master Dec 5, 2023
1 check passed
@alangenfeld
Copy link
Member

thanks for the PR!

zyd14 pushed a commit to zyd14/dagster that referenced this pull request Jan 20, 2024
## Summary & Motivation
For FIPS enabled systems the MD5 function is disabled in `openssl`.
Since Dagster is using `hashlib.md5` in various locations (`dagster`,
`dagster-dbt`, and `dagster-k8s`), on a FIPS enabled environment the UI
will deliver the following error when trying to load the code location:

```
ValueError: [digital envelope routines: EVP_DigestInit_ex] disabled for FIPS

  File "/opt/dagster/app/venv/lib/python3.11/site-packages/dagster/_grpc/server.py", line 609, in _get_serialized_external_repository_data
    external_repository_data_from_def(
  File "/opt/dagster/app/venv/lib/python3.11/site-packages/dagster/_core/host_representation/external_data.py", line 1341, in external_repository_data_from_def
    asset_graph = external_asset_graph_from_defs(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dagster/app/venv/lib/python3.11/site-packages/dagster/_core/host_representation/external_data.py", line 1531, in external_asset_graph_from_defs
    atomic_execution_unit_id = assets_def.unique_id
                               ^^^^^^^^^^^^^^^^^^^^
  File "/opt/dagster/app/venv/lib/python3.11/site-packages/dagster/_core/definitions/assets.py", line 1254, in unique_id
    return hashlib.md5((json.dumps(sorted(self.keys))).encode("utf-8")).hexdigest()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

A web search [indicates](s3tools/s3cmd#1005)
that flagging such `hashlib.md5` uses with the `usedforsecurity=False`
parameter will resolve this error. As far as I can ascertain, each of
the modified usages are indeed NOT used for the security of the md5
algorithm but instead used to determine the uniqueness of the item(s)
being hashed. If this is not the case, my PR will need to be corrected.

## How I Tested These Changes
I have deployed these changes on my own companies FIPS-enabled,
k8s-based systems and seen the error resolved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants