Drop haul-transect/transect-region requirements #274

brandynlucca · 2024-09-14T00:28:04Z

This PR applies a bug-fix to the NASC ingestion procedure by deprecating the creation of the haul-transect and transection-region key external files. This removes the need to intermittently generate these files since the haul-transect map is not required until the biological data are ingested. In theory, there may be a need to retain reading in the biodata_gear.xlsx file for conserving precise transect information for the biological data (i.e. for bootstrapping/randomization). But this is not necessary for the currently available workflow, and possibly avoids issues in the future for datasets where the biodata_gear.xlsx file(s) have transect_num missing entirely.

for more information, see https://pre-commit.ci

leewujung

Hey @brandynlucca : thanks for the PR, and sorry for the delay in review it. I put all my comments below instead of inline comments, since these touch upon broader context than the changes you made. Some are cosmetic (like reorganization) and we can discuss if it is necessary. I think the docs one will be good to have for v0.4.1 though.

`read_validated_data`

This section below is identical in both the if and else conditions under load_dataset - can they be merged (moved outside of the if-else)?

# Validate datatypes within dataset and make appropriate changes to dtypes
# ---- This first enforces the correct dtype for each imported column
# ---- This then assigns the imported data to the correct class attribute
read_validated_data(
    input_dict,
    configuration_dict,
    file_name,
    sheet_name,
    config_map,
    validation_settings,
)

`map_imported_datasets`

map_imported_datasets is used first under load_data and then under prepare_input_data, but prepare_input_data is called in load_data. There seems to be some redundancy here?

`prepare_input_data`

For prepare_input_data I have a few questions and thoughts:

Are the if conditions designed for the cases when only a subset of these exist? What are those conditions -- I was under the impression that all of these need to exist for the workflow to execute.

Related to the above -- what happens when none of the if conditions is true?
In prepare_input_data there are clear sections handling different data types. The function is so long that I wonder if it makes sense to factor some of them out as standalone functions.
Related to the above, since prepare_input_data is only used once under load_data, depending on what the answers of 1-2 are, it might make sense to make individual functions for each data type and just call all of them in load_data?

docs

While reviewing the code, I was trying to understand what is in the config table and found terms such as "superlayer" and "name" pretty confusing. I couldn't find a documentation page that explains what's what in the config file -- is there one? If not, I think we should create such a page with a block diagram showing the flow of operations in load_data. There are multiple mapping and cleaning/subselection in the code, and this will help users/readers understand what is done and why, and potentially help with future refactoring/simplification.

about loading NASC

I think this is the first time I skimmed through the NASC loading functions, so 2 questions:

Are there currently tests for the result of load_nasc to compare values from those loaded into Matlab? If not, can that be done? Not sure if there is any bug in the Matlab code to prevent that.
What is the difference between read_echoview_export and batch_read_echoview_export?

Drop haul-transect/transect-region requirements

bcc5ff9

brandynlucca self-assigned this Sep 14, 2024

brandynlucca and others added 2 commits September 13, 2024 17:29

Small pre-commit fix

b52a7c4

[pre-commit.ci] auto fixes from pre-commit.com hooks

a11a7fb

for more information, see https://pre-commit.ci

brandynlucca requested a review from leewujung September 14, 2024 00:36

brandynlucca added bug_in_python Something in the Python implementation does not match what's in Matlab design refactor labels Sep 14, 2024

brandynlucca added this to the v0.4.1 (docs, LS fitting) milestone Sep 14, 2024

This was linked to issues Sep 14, 2024

Deprecate haul_to_transect_mapping_YYYY_COUNTRY.xlsx file requirement #265

Open

Transect number in gear data #266

Open

Removed commented out sections of code

9b01efc

leewujung reviewed Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop haul-transect/transect-region requirements #274

Drop haul-transect/transect-region requirements #274

brandynlucca commented Sep 14, 2024

leewujung left a comment

Drop haul-transect/transect-region requirements #274

Are you sure you want to change the base?

Drop haul-transect/transect-region requirements #274

Conversation

brandynlucca commented Sep 14, 2024

leewujung left a comment

Choose a reason for hiding this comment

read_validated_data

map_imported_datasets

prepare_input_data

docs

about loading NASC

`read_validated_data`

`map_imported_datasets`

`prepare_input_data`