Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change the matching criterion of Capture constraints (global conditions) from exact string matching to any subexpression matching. #494

Closed
JiyuanAn opened this issue Feb 6, 2024 · 5 comments

Comments

@JiyuanAn
Copy link

JiyuanAn commented Feb 6, 2024

Hi!
I would like to know how to change the matching criteria of Capture constraints (global conditions) from an exact string match to an arbitrary subexpression match.
For example, BCQL: A.phr_b = B.phr_e (the content of A.phr_b is "QP_4_5|NP_4_6"; the content of B.phr_e is "NP_4_6|VP_2_6"), which obviously cannot be judged as equal in the current BlackLab. But I hope that if the parts separated by the symbol "|" in the two global conditions have a common intersection ("NP_4_6"), they will be considered equal.
I observed the code in BlackLab that parses BCQL (such as SpanQueryConstrained.java), executes Lucene (such as Search.java), and retrieves the result post-processing code (such as QueryTool.java). But I haven't found where the function of matching global conditions is implemented.
So I would like to ask you about the specific code files and locations that need to be modified to implement the above functions.
Thank you for your time and assistance.

@jan-niestadt
Copy link
Member

jan-niestadt commented Feb 6, 2024

You probably want to have a look at the MatchFilter classes. A constraint like :: A.phr_b = B.phr_e would be resolved by the MatchFilterEquals class, and you're correct that this only handles exactly matching terms at the moment.

CWB (where the query language originates from) does handle regex constraints like :: A.phr_b = ".*\|NP_4_6" I think; we'd eventually like to add that to BlackLab as well. But that's not what you want here of course; you want to compare two annotation values using some custom logic.

I think adding a function may be the best approach. MatchFilterFunctionCall already handles functions start(A) and end(B) to get the boundaries of a capture, but it's limited to functions that take a single argument (name of the capture). You probably want to add a function like :: phr_match(A, "phr_b", B, "phr_e").

What would be needed:

  1. Generalize MatchFilterFunctionCall so it can take multiple parameters of type capture or string (and possibly other types in the future) and update the parser accordingly.
  2. Add your phr_match (or a better name :) function in MatchFilterFunctionCall.evaluate
  3. Eventually, we probably want to add a simple way for users to "plug in" these types of custom functions. If you want to try, see ExtensionFunctionClass (used by e.g. XFDebug), which can be used to add custom functions in the regular query part (left of the ::). A similar mechanism could be used for custom constraint functions.

(1 and 2 would suffice for your use case I think; if we also have 3, that would be something we could merge into BlackLab, so extension functions can be kept in a separate .jar and activated as needed by users of BlackLab)

I hope this helps you get started. Don't hesitate to ask follow-up questions, and please create a pull request when you get this working!

@JiyuanAn
Copy link
Author

JiyuanAn commented Feb 6, 2024

Thank you very much for your immediate reply! Your answer helped me a lot.
But unfortunately I didn't find the MatchFilterFunction class you mentioned, I only found the MatchFilterFunctionCall class. When I set breakpoints in the MatchFilterFunctionCall.equals and MatchFilterFunctionCall.evaluate functions respectively and used CQL to query: A:[] B:[] :: A.phr_b=B.phr_e, neither of the above two breakpoints were triggered. . So, I'm confused because I can't find where you say I need to modify it, and the engine\target\classes\nl\inl\blacklab\search\matchfilterMatchFilterFunctionCall.evaluate function doesn't seem to be used during execution.
Looking forward to your answer, thank you again for your time and patience.

@jan-niestadt
Copy link
Member

jan-niestadt commented Feb 6, 2024

Sorry, I did mean MatchFilterFunctionCall. Now fixed in my reply above.

Your breakpoint in MatchFilterFunctionCall.evaluate should trigger if you try this query: A:[] B:[] :: start(A) = end(B).

You can also set one in MatchFilterEquals.evaluate, which should trigger on your query, A:[] B:[] :: A.phr_b=B.phr_e.

But like I said, I think you instead want a query like A:[] B:[] :: phr_match(A, "phr_b", B, "phr_e") and get that to work like I explained above. Good luck!

@JiyuanAn JiyuanAn closed this as completed Feb 6, 2024
@JiyuanAn JiyuanAn reopened this Feb 6, 2024
@JiyuanAn
Copy link
Author

JiyuanAn commented Feb 6, 2024

Thanks again for your help and immediate reply!

As you said, I set a breakpoint in MatchFilterEquals.evaluate (engine\target\classes\nl\inl\blacklab\search\matchfilter\MatchFilterEquals.class) and used the query: A:[] B:[] :: A.phr_b=B.phr_e. But this breakpoint does not take effect.

image

From my observation, this query uses MatchFilterTokenAnnotationEqualsString class. I don't understand the reason, so I'm very sorry to ask you for leave.

Looking forward to your answer, thank you again for your time and patience.

@jan-niestadt
Copy link
Member

That's very strange! Ik tried it just now on the dev branch with the query A:[] B:[] :: A.word=B.lemma (which is pretty much the same as your query, just with different annotation names), and it does hit MatchFilterEquals.evaluate for me.

Could you also place a breakpoint in MatchFilterEquals.rewrite to see if the object gets rewritten there? Normally speaking if both operands are annotations, as they are with your query, it should use MatchFilterEquals. Only for a query like A:[] B:[] :: A.word='test' should it use MatchFilterTokenAnnotationEqualsString.

Other interesting breakpoints would be SpanQueryConstrained.rewrite (specifically on the line with constraint.rewrite()) and SpansConstrained.accept (specifically the line with constraint.evaluate(currentFiDoc, matchInfo)). You should be able to step into these function calls and track what is happening (first the rewrite, which determines which classes will ultimately be used to evaluate the constraint, and then later evaluate for the actual evaluation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants