Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading Error: Invalid Member Definition #436

Closed
Ananya2003Gupta opened this issue Jun 29, 2023 · 10 comments · Fixed by #437
Closed

Misleading Error: Invalid Member Definition #436

Ananya2003Gupta opened this issue Jun 29, 2023 · 10 comments · Fixed by #437

Comments

@Ananya2003Gupta
Copy link
Contributor

  • OS version: Ubuntu 22.04.2

  • Reproduced by:
    python ../python/podio_class_generator.py ../tests/datalayout.yaml ../Tmp data ROOT

  • Input:
    datalayout.yaml

options:
  getSyntax: False
  exposePODMembers: True

components:
  # Component representing position
  PositionComponent:
    Members:
      - float x
      - float y
      - float z

datatypes:
  # Data type for calorimeter hits
  CalorimeterHit:
    Description: "Calorimeter Hits"
    Author: "Me"
    Members:
     - float energy
     - PositionComponent position
    OneToOneRelations:
     - Particle particle // One-to-one relation with Particle

  # Data type for particles
  Particle:
    Description: "Particle"
    Author: "Me"
    Members:
     - int pdgId
     - float energy
     - PositionComponent momentum
    OneToOneRelations:
     - CalorimeterHit calorimeterHit // One-to-one relation with CalorimeterHit
     
  • Error Message:
    Error while generating the datamodel: 'float energy' is not a valid member definition

  • Issue:
    The error message is misleading and does not accurately describe the root cause of the problem.

  • Analysis:
    After thorough debugging and documentation review, it was discovered that the error was due to the absence of comments after each member definition.

  • Solution:
    Adding comments to the member definitions resolved the error.

Correction:

datatypes:
  # Data type for calorimeter hits
  CalorimeterHit:
    Description: "Calorimeter Hits"
    Author: "Me"
    Members:
     - float energy //energy
     - PositionComponent position //position
    OneToOneRelations:
     - Particle particle // One-to-one relation with Particle

  # Data type for particles
  Particle:
    Description: "Particle"
    Author: "Me"
    Members:
     - int pdgId //pdgId
     - float energy //energy
     - PositionComponent momentum //momentum
    OneToOneRelations:
     - CalorimeterHit calorimeterHit // One-to-one relation with CalorimeterHit
     
  • Expected Behavior:
    The error message should accurately reflect the issue, indicating that comments are required for each member definition.

  • Actual Behavior:
    The error message is misleading and does not provide clear guidance on the root cause, leading to confusion during debugging.

  • Possible Solution:
    Updating the error message to include information about the requirement of comments in each member definition to prevent confusion and provide more accurate troubleshooting guidance.

@tmadlener
Copy link
Collaborator

Hi @Ananya2003Gupta, thanks for the detailed report. I agree that this can be a bit confusing, and it is also not explicitly stated in our documentation. Having had a brief look at the code that parses the YAML files I think it should be possible to fix this, but I am not yet entirely sure when I can get to that. In case you are in any way interested in having a go at this yourself, let me know, and I can put together some information about what I think would need to be done.

@Ananya2003Gupta
Copy link
Contributor Author

Yes, I would love to work on this issue.

@tmadlener
Copy link
Collaborator

tmadlener commented Jun 29, 2023

Very nice :)

Then let me try to explain the current situation as well as how I would solve this a bit. The exception you see in this case is raised here:

@staticmethod
def _parse_with_regexps(string, regexps_callbacks):
"""Parse the string using the passed regexps and corresponding callbacks that
take the match and return a MemberVariable from it"""
for rgx, callback in regexps_callbacks:
result = rgx.match(string)
if result:
return callback(result)
raise DefinitionError(f"'{string}' is not a valid member definition")

As you can see, this currently only checks whether any of the regexes in the regex_callbacks that are passed in matches the string and if so, it calls the corresponding callback on the match result. The string in this case is always one of the member definitions in the YAML file. If none of the passed regexes match, it simply falls through to raise the error that you have seen in the end.

The list of (regex, callback) pairs is assembled in the parse method:

def parse(self, string, require_description=True):
"""Parse the passed string"""
matchers_cbs = [
(self.full_array_re, self._full_array_conv),
(self.member_re, self._full_member_conv)
]
if not require_description:
matchers_cbs.extend((
(self.bare_array_re, self._bare_array_conv),
(self.bare_member_re, self._bare_member_conv)
))
return self._parse_with_regexps(string, matchers_cbs)

This just puts together a list of possible matches that will be tried in order, until one of them matches or the error is raised.

I am not sure it is possible to solve this issue in a really general way, but this specific issue could be solved in the following way:

  • If the require_description argument to parse is True
  • Try to parse with the default matchers_cbs and return if that is successful
  • If that fails, try again with the matchers that do not require a description and check if that succeeds
  • If it succeeds, raise a DefinitionError with a more descriptive error, otherwise re-raise the original DefinitionError

In principle you could then also have a look at the unittests for the MemberParser, specifically at test_parse_invalid to see if it is easily possible to also check the message of the DefinitionError, because currently we simply check that a DefinitionError is raised.

Let me know if you need more information or clarifications.

@Ananya2003Gupta
Copy link
Contributor Author

I had a doubt. Is it really necessary to have comments for each member definition? Can't we set the require_description 's default value to False ?

@tmadlener
Copy link
Collaborator

Is it really necessary to have comments for each member definition?

Yes, that was a conscious choice to mandate to have a description for members of datatypes, since we generate docstrings from them.

@Ananya2003Gupta
Copy link
Contributor Author

Ananya2003Gupta commented Jun 29, 2023

@staticmethod
def _parse_with_regexps(string, regexps_callbacks):
        """Parse the string using the passed regexps and corresponding callbacks that
        take the match and return a MemberVariable from it"""
        for rgx, callback in regexps_callbacks:
            result = rgx.match(string)
            if result:
                return callback(result)

        raise DefinitionError(f"'{string}' is not a valid member definition.")

def parse(self, string, require_description=True):
        """Parse the passed string"""
        default_matchers_cbs = [
            (self.full_array_re, self._full_array_conv),
            (self.member_re, self._full_member_conv)
        ]

        try:
            return self._parse_with_regexps(string, default_matchers_cbs)
        except DefinitionError:
            if not require_description:
                matchers_cbs = [
                    (self.bare_array_re, self._bare_array_conv),
                    (self.bare_member_re, self._bare_member_conv)
                ]

                try:
                    return self._parse_with_regexps(string, matchers_cbs)
                except DefinitionError:
                    raise DefinitionError("Check the syntax of the member definition.")

        raise DefinitionError("Description (Comment after member definition) is missing.")

I have given it an attempt.
Is the approach right?
Open to suggestions on more descriptive error messages as well.

@tmadlener
Copy link
Collaborator

Can you open a pull request with these changes in place? It is easier to make suggestions and comments there (see our contribution guidelines for a brief description of how to do that in case you do not already know).

You should be able to do ctest -R pyunittest after you have built podio. You can also open a pull request and wait for the the results of the tests that are run there.

In the code you posted, the DefinitionErrors you raise no longer have the information which member variable caused the problem. You would have to use the string in the messages for that. I still have to check whether the logic makes sense in this case.

@Ananya2003Gupta
Copy link
Contributor Author

Ananya2003Gupta commented Jun 29, 2023

I was testing the code myself and I did encounter one logical error.
Since, the default value is set to true, so even if the error is raised because of lack of comment or due to syntax issue, it is still going to raise the last error
raise DefinitionError("Description (Comment after member definition) is missing.")
try block under the if not require_description is not going to work : (

@tmadlener
Copy link
Collaborator

I think the logic should look something like this:

def parse(self, string, require_description=True):
  default_matchers_cbs = [
            (self.full_array_re, self._full_array_conv),
            (self.member_re, self._full_member_conv)
        ]

  no_desc_matchers_cbs = [
                    (self.bare_array_re, self._bare_array_conv),
                    (self.bare_member_re, self._bare_member_conv)
      ]

  if require_description:
    try:
      return self._parse_with_regexps(string, default_matchers_cbs)
    except DefinitionError:
      self._parse_with_regexps(string, no_desc_matchers_cbs)
      raise DefinitionError("...")  # useful error message

  return self._parse_with_regexps(string, default_matchers_cbs + no_desc_matchers_cbs)
  • If the parsing requiring a description raise an error, we try again without the without the descriptions
  • If they succeed, we know that the problem was the missing description and we can raise an error with an appropriate message
  • If they fail we will simply use the default error from there
  • The path where we don't require a description remains unchanged.

@Ananya2003Gupta
Copy link
Contributor Author

Ananya2003Gupta commented Jun 29, 2023

Okay.
I also tried one more approach, but I think it has complicated the code.
So we can go with this one. As, it has the essence of original code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants