Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Refactor usage_stats_opt_out method in DataContext #5339

Conversation

Shinnnyshinshin
Copy link
Contributor

@Shinnnyshinshin Shinnnyshinshin commented Jun 17, 2022

Changes proposed in this pull request:

  • Follow up to [MAINTENANCE] Move _apply_global_config_overrides() to AbstractDataContext #5285

  • Small refactor to _check_global_usage_statistics_opt_out() so that we dont have to "flip" the boolean in our minds. Before if opt_out = True, then usage_statistics_enabled = False.

  • Now we use _get_global_usage_statistics_override() which returns True if usage_statistics_enabled = True, and False if usage_statistics_enabled = False.

  • GREAT-953

Update 2022-08-26

anonymous_usage_stats_enabled can be set in the following 3 places
- great_expectations.yml (YML)
- GE_USAGE_STATS environment variable (env)
- conf file (conf)

If it is set as False in any of the 3 places, the global value is set to False

  • This is a change from previous behavior which set the following precedence: environment variable > home folder > /etc > yml
  • This PR adds a 3 tests that check if setting YML, env, or conf to False will set the global usage_statistics flag to False.
  • It also removes tests that used to check the test percedence.

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have run any local integration tests and made sure that nothing is broken.

Shinnnyshinshin added 30 commits June 9, 2022 16:46
…ure/GREAT-953/migration-part1-move-to-abc

* bugfix/GREAT-953/tests-deleting-env-variable:
  pushing fix for test
…ure/GREAT-953/migration-part1-move-to-abc

* bugfix/GREAT-953/tests-deleting-env-variable:
  removed extra fixture
* develop:
  [MAINTENANCE] Suppress persisting of temporary ExpectationSuite configurations in Rule-Based Profiler computations (#5305)
  [DOCS] Update standard arguments doc for Expectations to not reference datasets. (#5052)
  [MAINTENANCE] Handle compare Expectation in presence of high precision floating point numbers and NaN values (#5298)
  [MAINTENANCE] `DataAssistantResult` cleanup and extensibility enhancements (#5259)
  chore: check for mypy existence (#5291)
…ure/GREAT-953/migration-part1-move-to-abc

* bugfix/GREAT-953/tests-deleting-env-variable:
  Update migration_guide_postgresql_v3_api.py
  [MAINTENANCE] Suppress persisting of temporary ExpectationSuite configurations in Rule-Based Profiler computations (#5305)
  [DOCS] Update standard arguments doc for Expectations to not reference datasets. (#5052)
  [MAINTENANCE] Handle compare Expectation in presence of high precision floating point numbers and NaN values (#5298)
  [MAINTENANCE] `DataAssistantResult` cleanup and extensibility enhancements (#5259)
  chore: check for mypy existence (#5291)
Will Shin added 4 commits November 10, 2022 16:19
…r-usage-stats-opt

* develop:
  [MAINTENANCE] In ExecutionEngine: Make variable names and usage more descriptive of their purpose. (#6342)
  [MAINTENANCE] Refactor variable aggregation/substitution logic into `ConfigurationProvider` hierarchy (#6321)
  [MAINTENANCE] Add missing one-line docstrings and try to make the others consistent (#6340)
  [BUGFIX] Fix issue with misaligned indentation in docs snippets (#6339)
@@ -2437,6 +2440,21 @@ def _get_metric_configuration_tuples(

return metric_configurations_list

@classmethod
def validate_config(
cls, project_config: Union[DataContextConfig, Mapping]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_config() 's behavior has been extended to always return a DataContextConfig or error out. This helps with mypy errors, which were complaining that parameters were being Union[DataContextConfig, Mapping] in one place and DataContextConfig in another.

It has also been moved up to AbstractDataContext because BaseDataContext and CloudDataContext both use it.

This is the previous implementation

    @classmethod
    def validate_config(cls, project_config: Union[DataContextConfig, Mapping]) -> bool:
        if isinstance(project_config, DataContextConfig):
            return True
        try:
            dataContextConfigSchema.load(project_config)
        except ValidationError:
            raise
        return True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this name still appropriate? Is there perhaps a better name to signal what it is doing now that it isn't returning a bool?

Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Kilo59 Kilo59 self-requested a review November 14, 2022 14:05
)

project_data_context_config: DataContextConfig = (
BaseDataContext.validate_config(project_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For static/class methods called in instances, do we want to use the class name or self? Good either way but let's be consistent. I think I'm the one who added this class call but I might be the outlier.

Copy link
Contributor Author

@Shinnnyshinshin Shinnnyshinshin Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great question @cdkini. I had a look, and I think it's a mix of class name and self 😅 My opinion is that we should use class name to distinguish between these and instance methods

@@ -88,8 +88,13 @@ def __init__(
project_config = self.retrieve_data_context_config_from_ge_cloud(
ge_cloud_config=self._ge_cloud_config,
)

project_data_context_config: DataContextConfig = (
CloudDataContext.validate_config(project_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replied above

@@ -1668,7 +1669,7 @@ def convert_ndarray_decimal_to_float_dtype(data: np.ndarray) -> np.ndarray:


def get_context(
project_config: Optional[Union["DataContextConfig", dict]] = None,
project_config: Optional[Union["DataContextConfig", Mapping]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fix up this DataContextConfig type annotation by using from __future__ import annotations and if TYPE_CHECKING?

Copy link
Contributor Author

@Shinnnyshinshin Shinnnyshinshin Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call :) doing now, and for a few more parameters below.

@@ -94,7 +94,7 @@ def test_common_usage_stats_are_sent_no_mocking(
context._usage_statistics_handler._close_worker()

# Make sure usage stats are enabled
assert not context._check_global_usage_statistics_opt_out()
assert context._get_global_usage_statistics_override()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we had this issue before but I don't love directly accessing a private method here - is it possible to get around this at all? Do we need this assertion?

Feels like we might be testing an implementation detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand this concern. I think we can get around this by checking the boolean flag on anonymous_usage_statistics property, which we do directly below. If that is the case then we can just remove this line

Copy link
Member

@cdkini cdkini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some misc comments but mainly LGTM!

Copy link
Member

@anthonyburdi anthonyburdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a non blocking rename suggestion.

@@ -2530,21 +2545,24 @@ def _load_config_variables(self) -> Dict:
return config_var_provider.get_values()
return {}

def _check_global_usage_statistics_opt_out(self) -> bool:
@staticmethod
def _get_global_usage_statistics_override() -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like _is_usage_stats_enabled() -> bool: since this method returns a boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this suggestion :) changing now

@@ -83,7 +83,7 @@ def test_consistent_name_anonymization(


@pytest.mark.base_data_context
def test_opt_out_environment_variable_base_data_context(
def test_global_override_environment_variable_base_data_context(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

- great_expectations.yml (YML)
- GE_USAGE_STATS environment variable (env)
- conf file (conf)
If it is set as `False` in *any* of the 3 places, the global value is set to `False`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@Shinnnyshinshin Shinnnyshinshin enabled auto-merge (squash) November 14, 2022 21:01
@Shinnnyshinshin Shinnnyshinshin merged commit 93ee0e9 into develop Nov 14, 2022
@Shinnnyshinshin Shinnnyshinshin deleted the feature/GREAT-953/migration-part2-refactor-usage-stats-opt branch November 14, 2022 21:49
Shinnnyshinshin pushed a commit that referenced this pull request Nov 15, 2022
* develop:
  [MAINTENANCE] Update column_reflection_fallback to also use schema name for Trino (#6350)
  [MAINTENANCE] Misc cleanup of GX Cloud helpers (#6352)
  [MAINTENANCE] Pin `mypy` to `0.990` (#6361)
  Subject: Support to include ID/PK in validation result for each row t… (#5876)
  [MAINTENANCE] Fix computed metrics type hint in ExecutionEngine.resolve_metrics() method (#6347)
  [MAINTENANCE] Refactor `usage_stats_opt_out` method in DataContext (#5339)
  Zep PostgresDatasource returns a list of batches. (#6341)
  [DOCS] Add `yarn snippet-check` command (#6351)
  [MAINTENANCE] Refactor out `termcolor` dependency (#6348)
  [MAINTENANCE] Move Cloud-specific enums to `cloud_constants.py` (#6349)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants