Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing AttentionPool weights with 2 * Identity matrix. #21

Closed
dohlee opened this issue Feb 11, 2023 · 1 comment
Closed

Initializing AttentionPool weights with 2 * Identity matrix. #21

dohlee opened this issue Feb 11, 2023 · 1 comment

Comments

@dohlee
Copy link

dohlee commented Feb 11, 2023

Hi, thanks for this great implementation. I'm learning a lot from it :)

I noticed that the commit 0dc46e4 makes the internal AttentionPool weight initialized with randomly-sampled values rather than 2 * Identity-matrix, which is specified in the Enformer paper. (If there's something I am missing, please let me know!)

Indeed, this will not affect the performance of Enformer loaded with pretrained parameters, but I think it may lead to slightly worse (according to the paper) performance when trained from scratch.

Perhaps some simple manual weight initialization like

self.to_attn_logits = nn.Conv2d(dim, dim, 1, bias = False)
self.to_attn_logits.data.zero_()
self.to_attn_logits.data.squeeze().fill_diagonal_(2)

will do.

If you think it'll be okay, please let me know then I'll open a PR right away.

Thanks again for this great repo!

Best,
Dohoon

lucidrains added a commit that referenced this issue Feb 11, 2023
@lucidrains
Copy link
Owner

@dohlee Hi Dohoon! Thank you for catching this and glad to see another researcher applying attention to genomics! I've made the change here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants