Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Changed how checklist balance is used in action prediction #1204

Merged
merged 2 commits into from
May 13, 2018

Conversation

pdasigi
Copy link
Member

@pdasigi pdasigi commented May 12, 2018

Summary of changes:

  1. Made the output projection layer the same in both mml and erm parsers for both nlvr and wikitables parsers
  2. The checklist balance in nlvr (and unlinked checklist balance in wikitables) is now used to select appropriate action embeddings, the sum of which is added to predicted action embedding.
  3. There are now learned scalar parameters that regularize the addition to predicted embedding (in nlvr and wikitables) and action logits (in wikitables).

Copy link
Contributor

@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplification to the code; how does this actually affect performance?

the current hidden state, attended encoder input and the current checklist balance into the
action space. The size of the checklist balance vector is the same as the number of
terminals. This is not needed if we are training the parser using target action sequences.
use_coverage : ``bool``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to mention this is optional here, and give the default value.

@@ -132,10 +127,12 @@ def take_step(self, # type: ignore
# action_mask: (group_size, num_embedded_actions)
action_embeddings, embedded_action_mask = self._get_action_embeddings(state,
global_actions_to_embed)
action_query = self._get_action_query(state, hidden_state, attended_sentence)
action_query = torch.cat([hidden_state, attended_sentence], dim=-1)
# (group_size, action_embedding_dim)
predicted_action_embedding = self._output_projection_layer(action_query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but you probably want a non-linearity here. See #1150, where I added this to the wikitables model. There's another spot where you probably don't have one but probably want one, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, now that I think about it, because this is getting a dot product with action embeddings, I wonder if relu isn't the best non-linearity. We're basically saying that any embedding dimension with a negative value is entirely ignored, and thus artificially constraining the space that's available to the dot product... Maybe tanh would be better?

Copy link
Member Author

@pdasigi pdasigi May 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find that adding relus worked better? In my other PR I added dropout and a relu here to match the wikitables parser, and that seemed to give slightly better results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a non-linearity at the decoder input as well, and made both of them tanh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it with tanh instead of relu last night, and it didn't really change anything. It looked like it was learning faster - epoch 1 performance was a bit higher - but in the end, final performance was within the normal variance that I've seen across runs. So, it probably doesn't matter.


if state.checklist_state[0] is not None:
embedding_addition = self._get_predicted_embedding_addition(state)
predicted_action_embedding += self._checklist_embedding_multiplier * embedding_addition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't ever use += on a tensor. It doesn't do what you expect. Use x = x + y instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I figured out that += causes in-place updates, messing up the computation graph while x = x+y does not. I changed the other places I did in-place updates, but I guess I missed these two. Thanks, fixed them.

embedding_addition = self._get_predicted_embedding_addition(state,
self._unlinked_terminal_indices,
unlinked_balance)
predicted_action_embedding += self._unlinked_checklist_multiplier * embedding_addition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use +=.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pdasigi pdasigi force-pushed the better_balance_projection branch from 22fd0f2 to 7ef6b24 Compare May 13, 2018 15:39
@pdasigi
Copy link
Member Author

pdasigi commented May 13, 2018

@matt-gardner On NLVR, the variant with this change is at least slightly better (about 0.5 pp) than the run without this change. The experiment is still running though.

@pdasigi pdasigi force-pushed the better_balance_projection branch from 7ef6b24 to ada48af Compare May 13, 2018 18:03
@pdasigi pdasigi merged commit 2dda95b into allenai:master May 13, 2018
@pdasigi pdasigi deleted the better_balance_projection branch May 13, 2018 18:14
gabrielStanovsky pushed a commit to gabrielStanovsky/allennlp that referenced this pull request Sep 7, 2018
)

* changed action projection in nlvr

* removed +=
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants