-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to index CoNLL-U sub-features? #515
Comments
You should be able to configure this yourself in your own version of |
@jan-niestadt Could you pls show me an example? |
If your features column contains values like - name: feats
displayName: Features
valuePath: 6
multipleValues: true
- name: number
valuePath: 6
process:
- action: replace
find: "^.*Number=([^\\|]+).*$"
replace: "$1"
- name: person
valuePath: 6
process:
- action: replace
find: "^.*Person=([^\\|]+).*$"
replace: "$1" |
@jan-niestadt Thank you very much! If some of my corpus CoNLL-U files have no FEATs part ( for the language doesn't support this output), will it be automatically ignored and bypassed? Hence, other corpus files with the FEATs part keep working? |
If column 6 is empty in those files, the regex won't match, so nothing will be replaced, so the original empty value will be indexed for Person. That shouldn't be a problem. What could be a problem is when FEAT sometimes contains Person and sometimes contains only other features (but not Person). For a value without Person, the Maybe you could solve this with an extra - name: person
valuePath: 6
process:
# If string doesn't contain "Person=", make it empty
- action: replace
find: "^(?!.*Person=).*$" # matches any string that doesn't contain "Person="
replace: ""
# Remove everything except the value of the Person FEAT (or leave the string unmodified if regex doesn't match)
- action: replace
find: "^.*Person=([^\\|]+).*$" # match the value for Person in group 1
replace: "$1" |
I notcie here, they are commented out. What's you roadmap or any other hacking workarounds? Many thanks!
The text was updated successfully, but these errors were encountered: