Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add declarative parameters to FieldMappers #58663

Merged
merged 17 commits into from
Jul 9, 2020

Conversation

romseygeek
Copy link
Contributor

The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.

This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.

Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.

@romseygeek romseygeek added :Search Foundations/Mapping Index mappings, including merging and defining field types >refactoring v8.0.0 v7.9.0 labels Jun 29, 2020
@romseygeek romseygeek self-assigned this Jun 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 29, 2020
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! I left a few questions.

@@ -157,8 +157,6 @@ static KeywordFieldMapper createExtractQueryFieldBuilder(String name, BuilderCon
static BinaryFieldMapper createQueryBuilderFieldBuilder(BuilderContext context) {
BinaryFieldMapper.Builder builder = new BinaryFieldMapper.Builder(QUERY_BUILDER_FIELD_NAME);
builder.docValues(true);
builder.indexOptions(IndexOptions.NONE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self - these were the defaults so they are safe to drop.


public static class Builder extends ParametrizedFieldMapper.Builder {

final ParametrizedFieldMapper.Parameter<Boolean> fixed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd import the inner class to make these a little shorter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know some folks hate importing inner classes but I don't really have hard and fast rules and think in this case it'd be easier to read.

return builder;
}

public static Parameter<Boolean> boolParam(String name, boolean updateable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These smell a little like our settings which have grown a bit difficult to read over the years. There, I think a builder pattern would have helped deal with how many optional things there are. But here I dunno.

I think merger might need to take the current value so it can make a call on more complex stuff, right?

If a value isn't updateable do you need to define a merger at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking above you use acceptsNull in a very bulder like way, which I'm cool with.

I think it'd be good to make the "for declaration" method public with javadoc. And the "for use by ParameterizedFieldMapper" methods private or at least package private. That way it is more obvious what we can call when building the mappers.

I'm sort of sad that this is bit of a parallel world to the xcontent parsers that we already have, but I know how we got here. And it is where we are. So, yeah, I'm +1 on building it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ to making the 'used internally only' methods private. I've renamed 'merger' to 'initializer' as that's what it does, takes an initial value from an existing mapper.


public final void merge(FieldMapper in, Conflicts conflicts) {
for (Parameter<?> param : getParameters()) {
if (param.updateable == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be clearer if merge checked for updateable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, good spot. Begone, freeze


public Parameter<T> init(FieldMapper toInit) {
assert frozen == false;
this.value = merger.apply(toInit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think merger isn't really the right name for this, actually - it is more like read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I've used initializer

super(name);
}

// For testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make them package private that does a pretty good job implying they are only exposed for testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used outside the package, annoyingly. I've changed it so that if you need to set it programmatically you can use a constructor parameter.

@romseygeek
Copy link
Contributor Author

For some reason, the ML mappings (and only the ML mappings) are now being serialized differently, even though this PR makes no changes to ObjectMapper serialization. 🤷

@nik9000
Copy link
Member

nik9000 commented Jun 30, 2020

For some reason, the ML mappings (and only the ML mappings) are now being serialized differently, even though this PR makes no changes to ObjectMapper serialization

Neat! Always good to find a new surprise.


import static org.hamcrest.Matchers.containsString;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.instanceOf;

public class BinaryFieldMapperTests extends FieldMapperTestCase<BinaryFieldMapper.Builder> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this we don't test store and stuff like that, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So you have the tests in ParameterizedFieldTest for things like this. Do you think it is important to also test that the field merges here too? I guess not, but I'm worried about dropping something accidentally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea would be that we don't have 'common' fields any more - all mappers declare all their parameters outright (I am sharing the 'meta' parameter at the moment, but I think I'm going to stop doing that). So the tests for each field mapper should cover all the parameters that they declare, but general things like 'this parameter can't be updated so it should throw an error on merge' are tested in ParametrizedFieldTest.


import static org.hamcrest.Matchers.containsString;

public class BooleanFieldMapperTests extends FieldMapperTestCase<BooleanFieldMapper.Builder> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, right?

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it too, I left a couple of questions

context.doc().add(new Field(fieldType().name(), value ? "T" : "F", Defaults.FIELD_TYPE));
}
if (stored) {
context.doc().add(new StoredField(fieldType().name(), value ? "T" : "F"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like if the field is searchable and stored too, we end up adding two fields, while before we would only add one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a field with a FieldType that has stored set to true and adding a separate StoredField amount to the same thing. This way, we don't need to change the field type at all, and it's clearer what's happening at the point where we're adding documents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was odd that we were using Field to get stored fields but doc values. Field just does a bunch of stuff.

= new Parameter<>("null_value", false, null, (n, o) -> XContentMapValues.nodeBooleanValue(o), m -> toType(m).nullValue);
private final Parameter<Boolean> stored = Parameter.boolParam("store", false, m -> toType(m).stored, false);
private final Parameter<Map<String, String>> meta
= new Parameter<>("meta", true, Collections.emptyMap(), TypeParsers::parseMeta, m -> m.fieldType().meta());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get confused here that stored seems to be treated differently from e.g. doc_values. Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenience, mainly - doc_values and index can be derived from the field type values, but stored can't be. I can change it so that doc_values and index are fields directly on the mapper as well if you think that's clearer? Certainly for things like searchable docvalues fields then we'll need to distinguish between whether or not something is indexed and whether it's searchable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I am trying to see what the difference is between stored and indexed/doc_values. I would expect them to be treated in the same way/same place. Why can't stored be derived from the field type values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands, MappedFieldType doesn't have anything that tells you if the field is stored or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made a boolean hasDocValues on BinaryFieldMapper for this. Is it worth making one here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add indexed and hasDocValues directly to the mappers as it sounds as though that will be less confusing.

Map.Entry<String, Object> entry = iterator.next();
final String propName = entry.getKey();
final Object propNode = entry.getValue();
if (Objects.equals("fields", propName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means that every field supports multi fields and copy_to, I think. But I think that runtime fields won't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add disabling feature flags later if we need to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every field does support multi fields and copy_to right now. I'm happy to juggle things for runtime fields in a follow up.

@romseygeek
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I think @javanna might want another look too.

@@ -332,6 +332,13 @@ public static String nodeStringValue(Object node, String defaultValue) {
return node.toString();
}

public static String nodeStringValue(Object node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably worth adding javadoc about how this is different from Objects.toString.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

private final Parameter<Boolean> stored = Parameter.boolParam("store", false, m -> toType(m).stored, false);
private final Parameter<Boolean> hasDocValues = Parameter.boolParam("doc_values", false, m -> toType(m).hasDocValues, false);
private final Parameter<Map<String, String>> meta
= new Parameter<>("meta", true, Collections.emptyMap(), TypeParsers::parseMeta, m -> m.fieldType().meta());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to eventually move meta over to the FieldMapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta is used in field caps, which currently are based off MappedFieldType so I think it will probably stay there?

= new Parameter<>("null_value", false, null, (n, o) -> XContentMapValues.nodeBooleanValue(o), m -> toType(m).nullValue);
private final Parameter<Boolean> stored = Parameter.boolParam("store", false, m -> toType(m).stored, false);
private final Parameter<Map<String, String>> meta
= new Parameter<>("meta", true, Collections.emptyMap(), TypeParsers::parseMeta, m -> m.fieldType().meta());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made a boolean hasDocValues on BinaryFieldMapper for this. Is it worth making one here too?

context.doc().add(new Field(fieldType().name(), value ? "T" : "F", Defaults.FIELD_TYPE));
}
if (stored) {
context.doc().add(new StoredField(fieldType().name(), value ? "T" : "F"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was odd that we were using Field to get stored fields but doc values. Field just does a bunch of stuff.

Map.Entry<String, Object> entry = iterator.next();
final String propName = entry.getKey();
final Object propNode = entry.getValue();
if (Objects.equals("fields", propName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every field does support multi fields and copy_to right now. I'm happy to juggle things for runtime fields in a follow up.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romseygeek romseygeek merged commit 219b7db into elastic:master Jul 9, 2020
@romseygeek romseygeek deleted the mapper/mergebuilders branch July 9, 2020 09:56
romseygeek added a commit that referenced this pull request Jul 9, 2020
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.

This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.

Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
@cjcenizal
Copy link
Contributor

cjcenizal commented Jul 13, 2020

Maybe my comment from #59381 is best addressed here.

@jtibshirani
Copy link
Contributor

@romseygeek this is an exciting change! While updating the fields-retrieval branch, I noticed that this makes it harder to unit test field mappers. For example, before I was able to construct a field mapper like this:

BooleanFieldMapper mapper = new BooleanFieldMapper.Builder("field")
    .nullValue(true)
    .build(context);

Now I think we can't configure the parameters directly, they can only be set by parsing a mapping definition. Do you have any ideas around making field mapper construction easier?

@romseygeek
Copy link
Contributor Author

@jtibshirani I've tried out a couple of ideas here, but nothing is very satisfying - in particular, it's hard to put anything on the base ParametrizedFieldMapper.Builder object because of generic type erasure. When I was first implementing this (and making other mapper changes, eg #56747) I found that very few unit tests actually build mappers in this way, which was why I was happy to remove the builder methods. But maybe I should add them back? Or we could add back only specific setters that are needed by tests - this isn't really a public API, so I think it's fine to have a mix of setters on there.

@jtibshirani
Copy link
Contributor

When I was first implementing this (and making other mapper changes, eg #56747) I found that very few unit tests actually build mappers in this way, which was why I was happy to remove the builder methods.

I also noticed that most unit tests don't create builders directly. However I think this just points the fact that we rely too much on integration testing (or 'pseudo unit tests') to test mapper functionality.

But maybe I should add them back? Or we could add back only specific setters that are needed by tests - this isn't really a public API, so I think it's fine to have a mix of setters on there.

This could work, I'll experiment with this as I keep the branch up to date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants