Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a mistake in jupyter #1

Open
CalebDu opened this issue Sep 30, 2022 · 14 comments
Open

a mistake in jupyter #1

CalebDu opened this issue Sep 30, 2022 · 14 comments

Comments

@CalebDu
Copy link

CalebDu commented Sep 30, 2022

image

bias should be initialized to zero.

@CalebDu
Copy link
Author

CalebDu commented Oct 2, 2022

because HW2 forum is not open for online students. I present some of what I think are wrong and ambiguous in hw2 code at github.

  1. dataloader class in question 4, jupyter notebook says Combines a dataset and a sampler. but there is no implementation of sampler in HW2. So I just deal with this part by naive index. And i am not sure how to implement shuffled dataloader. I do it by np.random.shuffle function in every epoch. However. there is no exact assert check for shuffled dataloader in test code. I don't know whether my implementation of shuffled dataloader is right in next mlp train test. different shuffled order may cause different final loss in "mlp train" test.

  2. epoch function in question 5, the returne value includes acc and loss. jupyter have not detail about how to compute acc metric in train mode. Parameters of model is updated in every batch. So the first batch may have more poor acc performace (about 0.1) than last batch (about 0.6). jupyter says Returns the average accuracy (as a float) and the average loss over all samples (as a float). i compute acc in 2 ways. One is to sum of correct prediction in each batch during training,divided by num of sample. the other is to iterate entire train dataloader again after train a epoch, then sum of correct prediction in each batch after training, divided by num of sample. but there is a big difference betweent test code answer and 2 my solutions, and value of loss is same as test code answer.

  3. epoch function in question 5, the following test code illustrate that a model without any train has 0.9164 acc in eval mode. it's so confusing. Mybe the acc is wrong?

image

I passed almost all the tests, except the epoch and mlp train test.
in conclution. I know there will be many problems and room for improvement in the first release of dlsys course. i suggest that the every task in jupyter book could contain more elaborated detail or hints. Because of answer in test code is fixed and some tasks with ambiguous description,I spent a lot of time going over and over the test code to figure out the true intention of these tasks. if there are more key hints or more detail of main procedure. students could save lots of time and avoid misunderstandings.

@weixliu
Copy link

weixliu commented Oct 7, 2022

I encountered precision issue during finishing BatchNorm1d in Q2 and SGD in Q3.
For BatchNorm1d in Q2, test_nn_batchnorm_backward_1 passed. And it's hard to find the issue root cause.

tests/test_nn_and_optim.py::test_nn_batchnorm_check_model_eval_switches_training_flag_1 PASSED [ 12%]
tests/test_nn_and_optim.py::test_nn_batchnorm_forward_1 PASSED           [ 25%]
tests/test_nn_and_optim.py::test_nn_batchnorm_forward_affine_1 PASSED    [ 37%]
tests/test_nn_and_optim.py::test_nn_batchnorm_backward_1 PASSED          [ 50%]
tests/test_nn_and_optim.py::test_nn_batchnorm_backward_affine_1 FAILED   [ 62%]
tests/test_nn_and_optim.py::test_nn_batchnorm_running_mean_1 PASSED      [ 75%]
tests/test_nn_and_optim.py::test_nn_batchnorm_running_var_1 PASSED       [ 87%]
tests/test_nn_and_optim.py::test_nn_batchnorm_running_grad_1 PASSED      [100%]

=================================== FAILURES ===================================
_____________________ test_nn_batchnorm_backward_affine_1 ______________________

    def test_nn_batchnorm_backward_affine_1():
        np.testing.assert_allclose(batchnorm_backward(5, 4, affine=True),
            np.array([[ 3.8604736e-03, 4.2676926e-05, -1.4114380e-04, -3.2424927e-05],
             [-6.9427490e-03, -3.3140182e-05, 9.1552734e-05, -8.5830688e-05],
             [ 4.6386719e-03, -8.9883804e-05, -4.5776367e-05, 4.3869019e-05],
             [-7.7133179e-03, 2.7418137e-05, 6.6757202e-05, 7.4386597e-05],
             [ 6.1874390e-03, 5.2213669e-05, 2.8610229e-05, -1.9073486e-06]],
>            dtype=np.float32), rtol=1e-5, atol=1e-5)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-05, atol=1e-05
E       
E       Mismatched elements: 2 / 20 (10%)
E       Max absolute difference: 2.18214902e-05
E       Max relative difference: 0.90173392
E        x: array([[ 3.861948e-03,  4.296782e-05, -1.406976e-04, -3.312205e-05],
E              [-6.964571e-03, -3.295710e-05,  9.240071e-05, -8.447380e-05],
E              [ 4.646928e-03, -8.917992e-05, -4.493684e-05,  4.450350e-05],...
E        y: array([[ 3.860474e-03,  4.267693e-05, -1.411438e-04, -3.242493e-05],
E              [-6.942749e-03, -3.314018e-05,  9.155273e-05, -8.583069e-05],
E              [ 4.638672e-03, -8.988380e-05, -4.577637e-05,  4.386902e-05],...

tests/test_nn_and_optim.py:774: AssertionError
============== 1 failed, 7 passed, 82 deselected in 0.95 seconds ===============

For SGD in Q3, test_optim_sgd_weight_decay_1 and test_optim_sgd_momentum_1 passed. So I think there should not be logic error with the implementation. For the FAILED test case, it's hard to debug the root cause. I guessed that it might be some precision issue.

tests/test_nn_and_optim.py::test_optim_sgd_vanilla_1 PASSED
tests/test_nn_and_optim.py::test_optim_sgd_momentum_1 PASSED
tests/test_nn_and_optim.py::test_optim_sgd_weight_decay_1 PASSED
tests/test_nn_and_optim.py::test_optim_sgd_momentum_weight_decay_1 FAILED
tests/test_nn_and_optim.py::test_optim_sgd_layernorm_residual_1 PASSED
tests/test_nn_and_optim.py::test_optim_sgd_z_memory_check_1 PASSED

=================================== FAILURES ===================================
____________________ test_optim_sgd_momentum_weight_decay_1 ____________________

    def test_optim_sgd_momentum_weight_decay_1():
        np.testing.assert_allclose(learn_model_1d(64, 16, lambda z: nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 16)), ndl.optim.SGD, lr=0.01, momentum=0.9, weight_decay=0.01),
>           np.array(3.306993), rtol=1e-5, atol=1e-5)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-05, atol=1e-05
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference: 0.0010395
E       Max relative difference: 0.00031433
E        x: array(3.305954, dtype=float32)
E        y: array(3.306993)

tests/test_nn_and_optim.py:975: AssertionError
============== 1 failed, 5 passed, 84 deselected in 39.17 seconds ==============

@CalebDu
Copy link
Author

CalebDu commented Oct 7, 2022

i encountered same problem in BatchNorm1d. Because i wrongly coding w * (x - mean) / ((var + self.eps)**0.5) + b to (x - mean) / ((var + self.eps)**0.5)*w + b. Maybe you get same mistake? lol. i am not sure.

for sgd problem. i also meet same problem. i solve it by browsing source code in pytorch. it cost me lots of time to find error. referring to weight decay. when you compute momentum item u_i, you should consider gradient of L2 regularization. you need to plus theta to theta.grad. Then using the new grad to compute u_i. the same goes for Adam
image

hope this helps you.

@weixliu
Copy link

weixliu commented Oct 8, 2022

Thank you. For the BatchNorm1d, I did not make the same mistake. But I just found the test precision was adjusted 4 days ago and I found the test passed with the adjusted precision.

dtype=np.float32), rtol=1e-5, atol=1e-4)
. So I'm not sure the reason why you can pass the test case with original precision.

Let me try your resolution of the SGD. I used the below equation to implement the sgd_momentum_weight_decay_1.

grad = param.grad.detach().cached_data
u_(i+1) = momentum * u_i + (1 - momentum) * grad
param = (1 - lr * weight_decay) * param - (lr * u_(i+1))

it is a very method to refer to the source code of PyTorch. I also took a long time to debug the Homework. I think it's because the HW was becoming more and more complex.

@CalebDu
Copy link
Author

CalebDu commented Oct 8, 2022

Thank you. For the BatchNorm1d, I did not make the same mistake. But I just found the test precision was adjusted 4 days ago and I found the test passed with the adjusted precision.

dtype=np.float32), rtol=1e-5, atol=1e-4)

. So I'm not sure the reason why you can pass the test case with original precision.
Let me try your resolution of the SGD. I used the below equation to implement the sgd_momentum_weight_decay_1.

grad = param.grad.detach().cached_data
u_(i+1) = momentum * u_i + (1 - momentum) * grad
param = (1 - lr * weight_decay) * param - (lr * u_(i+1))

it is a very method to refer to the source code of PyTorch. I also took a long time to debug the Homework. I think it's because the HW was becoming more and more complex.

i pass all sgd test by following equations. these equations are different to counterparts in slices. So i think jupyter needs to provide more hints.

grad = param.grad.detach().cached_data + weight_decay * param.detach()
u_(i+1) = momentum * u_i + (1 - momentum) * grad
param -= (lr * u_(i+1))

@weixliu
Copy link

weixliu commented Oct 9, 2022

Thank you. For the BatchNorm1d, I did not make the same mistake. But I just found the test precision was adjusted 4 days ago and I found the test passed with the adjusted precision.

dtype=np.float32), rtol=1e-5, atol=1e-4)

. So I'm not sure the reason why you can pass the test case with original precision.
Let me try your resolution of the SGD. I used the below equation to implement the sgd_momentum_weight_decay_1.

grad = param.grad.detach().cached_data
u_(i+1) = momentum * u_i + (1 - momentum) * grad
param = (1 - lr * weight_decay) * param - (lr * u_(i+1))

it is a very method to refer to the source code of PyTorch. I also took a long time to debug the Homework. I think it's because the HW was becoming more and more complex.

i pass all sgd test by following equations. these equations are different to counterparts in slices. So i think jupyter needs to provide more hints.

grad = param.grad.detach().cached_data + weight_decay * param.detach()
u_(i+1) = momentum * u_i + (1 - momentum) * grad
param -= (lr * u_(i+1))

Yes. Your equations could pass all SGD test cases.
What equation do you use in computing the BatchNorm1d? I just found that equation (9) is different from the other I searched in Pytorch. There was no m in there.
截屏2022-10-09 上午9 50 38

@Mmmofan
Copy link

Mmmofan commented Oct 13, 2022

Yes. Your equations could pass all SGD test cases. What equation do you use in computing the BatchNorm1d? I just found that equation (9) is different from the other I searched in Pytorch. There was no m in there. 截屏2022-10-09 上午9 50 38

I used equation x = (x - mean) / (std + eps)**0.5 * weight + bias in training mode and x = (x - running_mean) / ((running_var + eps)**0.5) in eval mode, which passed all tests of BatchNorm.

But I met some problems with Adam optim, I passed all tests of SGD but can not pass adam_batchnorm_eval_mode test of Adam, did you make it?

@weixliu
Copy link

weixliu commented Oct 14, 2022

Yes. Your equations could pass all SGD test cases. What equation do you use in computing the BatchNorm1d? I just found that equation (9) is different from the other I searched in Pytorch. There was no m in there. 截屏2022-10-09 上午9 50 38

I used equation x = (x - mean) / (std + eps)**0.5 * weight + bias in training mode and x = (x - running_mean) / ((running_var + eps)**0.5) in eval mode, which passed all tests of BatchNorm.

But I met some problems with Adam optim, I passed all tests of SGD but can not pass adam_batchnorm_eval_mode test of Adam, did you make it?

Yes. I'm struggling with the Adam optim test cases. I passed all BatchNorm1D and LayerNorm1D test cases also all SGD test cases. But I still failed in test_optim_adam_batchnorm_eval_mode_1 and test_optim_adam_layernorm_1. The test results are similar but the precision did not meet 1e-5. So currently, I still have no idea about these failures.
I suspect there might be some numerical issue in my implementation.

@Mmmofan
Copy link

Mmmofan commented Oct 14, 2022

Haha I don't think it's caused by numerical precision, I met some problems seems like numerical precision while all because of error coding.

Well I failed test_optim_adam_batchnorm_eval_mode_1 and test_optim_adam_z_memory_check_1 (I passed SGD though), but passed all submisssion.

BTW, I passed all tests of DropOut but failed submission...

@weixliu
Copy link

weixliu commented Oct 14, 2022

Haha I don't think it's caused by numerical precision, I met some problems seems like numerical precision while all because of error coding.

Well I failed test_optim_adam_batchnorm_eval_mode_1 and test_optim_adam_z_memory_check_1 (I passed SGD though), but passed all submisssion.

BTW, I passed all tests of DropOut but failed submission...

test_optim_adam_z_memory_check_1 could be ignored because you might create less tensor. You could add grad into the computation graph to pass it, such as using grad instead of grad.data.

For DropOut, if it is an evaluation, you should not drop out the input.
I am also struggling failed test case with test_mlp_resnet_forward_2. Because test_mlp_resnet_forward_1 has passed. So I suspect there might be some issue with my BatchNorm1D implementation.

Can you share your implementation BatchNorm1D and LayerNorm1D? I will share my implementaition below and delete them later.

@Mmmofan
Copy link

Mmmofan commented Oct 14, 2022

I passed DropOut now. Here's my implementation of BN and LN, I'll delete them 2 hours later:

delete

@weixliu
Copy link

weixliu commented Oct 14, 2022

You could delete them. I think our implementations are the same. Let me check my other codes, -_-.
Hard to debug, do you have any good ideas for debugging?

@Mmmofan
Copy link

Mmmofan commented Oct 14, 2022

You could delete them. I think our implementations are the same. Let me check my other codes, -_-. Hard to debug, do you have any good ideas for debugging?

Just clone to local, and write another script, copy the test function code, set break point in IDE and you can debug it.

For example, create a 'debug.py' script, and copy function learn_model_1d to it (also with other utilities function which are used):

# debug.py

def learn_model_1d(feature_size, nclasses, _model, optimizer, epochs=1, **kwargs):
    ...

out = learn_model_1d(64, 16, lambda z: nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 16)), ndl.optim.Adam, lr=0.001)

@lordidiot
Copy link

lordidiot commented Jan 6, 2024

Edit: I've figured out my problem, sorry for the ping!

For DropOut, if it is an evaluation, you should not drop out the input. I am also struggling failed test case with test_mlp_resnet_forward_2. Because test_mlp_resnet_forward_1 has passed. So I suspect there might be some issue with my BatchNorm1D implementation.

@weixliu sorry to ping you on an old issue. I am also facing this problem now, do you remember what was the root-cause of the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants