Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text fields should not expose internal field types #63446

Closed
jtibshirani opened this issue Oct 7, 2020 · 3 comments · Fixed by #64597
Closed

Text fields should not expose internal field types #63446

jtibshirani opened this issue Oct 7, 2020 · 3 comments · Fixed by #64597
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@jtibshirani
Copy link
Contributor

When text fields has index_prefixes or index_phrases enabled, they create subfields of the form field._index_prefixes. These fields are only meant for internal optimizations and shouldn't be searched on directly.

However the text mapper exposes these hidden fields as if they were configured as multi-fields. This means that field caps includes the hidden fields, as when requesting all fields through fields=*. The search 'fields' option will also match against them, which can add duplication to the output.

Perhaps we should stop exposing these fields, keeping them an implementation detail. Note that the same consideration applies to search_as_you_type.

@jtibshirani jtibshirani added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Oct 7, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 7, 2020
@romseygeek
Copy link
Contributor

The index-time analyzer for a MapperService needs to know about all index-time fields, and it currently learns about them via the iterator() method on Mapper, which is the same way that the MappingLookup learns about field types. I've wanted to change this for a while - if we can instead register analyzers and field types separately, then we don't need to have full-blown Mapper implementations for these subfields and can instead handle everything in the parent parseCreateField.

search_as_you_type is slightly different because it works by letting users search directly on the subfields. It would be nice to hide this implementation detail behind a specialised query instead.

@romseygeek
Copy link
Contributor

Thinking about this more, separating the registration of Mappers and MappedFieldTypes wouldn't actually help here, because we still need the MappedFieldType to be set up so that it can be searched. Perhaps a better move would be to make MappedFieldType responsible for building IndexFieldCapabilities, so that an implementation can choose to not make itself visible to field caps by returning null?

jimczi added a commit to jimczi/elasticsearch that referenced this issue Oct 27, 2020
The new fields option allows to fetch the value of all fields in the mapping.
However, internal fields that are used by some field mappers are also shown when
concrete fields retrieved through a pattern (`*` or `foo*`).
We have a [long term plan](elastic#63446) to hide these fields in field_caps and from pattern resolution
so this change is just a hot fix to ensure that they don't break the retrieval in the meantime.
The `flattened._keyed field will show up as an empty field when using a pattern that match the
flattened field.

Relates elastic#63446
jimczi added a commit that referenced this issue Oct 28, 2020
The new fields option allows to fetch the value of all fields in the mapping.
However, internal fields that are used by some field mappers are also shown when
concrete fields retrieved through a pattern (`*` or `foo*`).
We have a [long term plan](#63446) to hide these fields in field_caps and from pattern resolution
so this change is just a hot fix to ensure that they don't break the retrieval in the meantime.
The `flattened._keyed field will show up as an empty field when using a pattern that match the
flattened field.

Relates #63446
jimczi added a commit that referenced this issue Oct 28, 2020
The new fields option allows to fetch the value of all fields in the mapping.
However, internal fields that are used by some field mappers are also shown when
concrete fields retrieved through a pattern (`*` or `foo*`).
We have a [long term plan](#63446) to hide these fields in field_caps and from pattern resolution
so this change is just a hot fix to ensure that they don't break the retrieval in the meantime.
The `flattened._keyed field will show up as an empty field when using a pattern that match the
flattened field.

Relates #63446
jimczi added a commit that referenced this issue Oct 28, 2020
The new fields option allows to fetch the value of all fields in the mapping.
However, internal fields that are used by some field mappers are also shown when
concrete fields retrieved through a pattern (`*` or `foo*`).
We have a [long term plan](#63446) to hide these fields in field_caps and from pattern resolution
so this change is just a hot fix to ensure that they don't break the retrieval in the meantime.
The `flattened._keyed field will show up as an empty field when using a pattern that match the
flattened field.

Relates #63446
romseygeek added a commit that referenced this issue Nov 5, 2020
TextFieldMapper can optionally index data into subfields for accelerated
prefix and phrase queries. Currently, these subfields are implemented
as FieldMappers in their own right, made available via TextFieldMapper's
iterator() method and with their own standalone MappedFieldType objects.

This has the disadvantage that these subfields are directly available for
searching, and appear in APIs such as field caps. In addition, because
exists queries are not implemented on them, an exists query against an
object which contains a text field with one of the subfields enabled can
throw an error (see #63585).

This commit reworks the subfields so that they are no longer implemented
as FieldMappers, and are no longer exposed to classes outside
TextFieldMapper either as MappedFieldTypes or as FieldMappers. The
parent TextFieldMapper handles indexing and analyzer registration,
PhraseFieldType is removed entirely, and PrefixFieldType is retained as
a private implementation for fast prefix queries but is unavailable for
querying directly.

Fixes #63585
Closes #63446
romseygeek added a commit that referenced this issue Nov 5, 2020
TextFieldMapper can optionally index data into subfields for accelerated
prefix and phrase queries. Currently, these subfields are implemented
as FieldMappers in their own right, made available via TextFieldMapper's
iterator() method and with their own standalone MappedFieldType objects.

This has the disadvantage that these subfields are directly available for
searching, and appear in APIs such as field caps. In addition, because
exists queries are not implemented on them, an exists query against an
object which contains a text field with one of the subfields enabled can
throw an error (see #63585).

This commit reworks the subfields so that they are no longer implemented
as FieldMappers, and are no longer exposed to classes outside
TextFieldMapper either as MappedFieldTypes or as FieldMappers. The
parent TextFieldMapper handles indexing and analyzer registration,
PhraseFieldType is removed entirely, and PrefixFieldType is retained as
a private implementation for fast prefix queries but is unavailable for
querying directly.

Fixes #63585
Closes #63446
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
3 participants