RuntimeError: Error(s) in loading state_dict for KAN #271

chrico-bu-uab · 2024-06-17T04:40:49Z

Error message:
RuntimeError: Error(s) in loading state_dict for KAN:
Missing key(s) in state_dict: "biases.3.weight", "act_fun.3.grid", "act_fun.3.coef", "act_fun.3.scale_base", "act_fun.3.scale_sp", "act_fun.3.mask", "symbolic_fun.3.mask", "symbolic_fun.3.affine".
size mismatch for biases.0.weight: copying a param with shape torch.Size([1, 1]) from checkpoint, the shape in current model is torch.Size([1, 2]).
size mismatch for biases.1.weight: copying a param with shape torch.Size([1, 4]) from checkpoint, the shape in current model is torch.Size([1, 10]).
size mismatch for biases.2.weight: copying a param with shape torch.Size([1, 5]) from checkpoint, the shape in current model is torch.Size([1, 1]).
size mismatch for act_fun.0.grid: copying a param with shape torch.Size([10, 5]) from checkpoint, the shape in current model is torch.Size([20, 7]).
size mismatch for act_fun.0.coef: copying a param with shape torch.Size([10, 6]) from checkpoint, the shape in current model is torch.Size([20, 7]).
size mismatch for act_fun.0.scale_base: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([20]).
size mismatch for act_fun.0.scale_sp: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([20]).
size mismatch for act_fun.0.mask: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([20]).
size mismatch for act_fun.1.grid: copying a param with shape torch.Size([4, 5]) from checkpoint, the shape in current model is torch.Size([20, 7]).
size mismatch for act_fun.1.coef: copying a param with shape torch.Size([4, 6]) from checkpoint, the shape in current model is torch.Size([20, 7]).
size mismatch for act_fun.1.scale_base: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([20]).
size mismatch for act_fun.1.scale_sp: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([20]).
size mismatch for act_fun.1.mask: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([20]).
size mismatch for act_fun.2.grid: copying a param with shape torch.Size([20, 5]) from checkpoint, the shape in current model is torch.Size([10, 7]).
size mismatch for act_fun.2.coef: copying a param with shape torch.Size([20, 6]) from checkpoint, the shape in current model is torch.Size([10, 7]).
size mismatch for act_fun.2.scale_base: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for act_fun.2.scale_sp: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for act_fun.2.mask: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for symbolic_fun.0.mask: copying a param with shape torch.Size([1, 10]) from checkpoint, the shape in current model is torch.Size([2, 10]).
size mismatch for symbolic_fun.0.affine: copying a param with shape torch.Size([1, 10, 4]) from checkpoint, the shape in current model is torch.Size([2, 10, 4]).
size mismatch for symbolic_fun.1.mask: copying a param with shape torch.Size([4, 1]) from checkpoint, the shape in current model is torch.Size([10, 2]).
size mismatch for symbolic_fun.1.affine: copying a param with shape torch.Size([4, 1, 4]) from checkpoint, the shape in current model is torch.Size([10, 2, 4]).
size mismatch for symbolic_fun.2.mask: copying a param with shape torch.Size([5, 4]) from checkpoint, the shape in current model is torch.Size([1, 10]).
size mismatch for symbolic_fun.2.affine: copying a param with shape torch.Size([5, 4, 4]) from checkpoint, the shape in current model is torch.Size([1, 10, 4]).

Code:

def train_model(model, params, prune_threshold, r2_threshold):
    import os
    if os.path.exists("ckpt.pth"):
        os.remove("ckpt.pth")

    X_train = params["dataset"]["train_input"]
    training = {"train_loss": [], "test_loss": [], "reg": []}

    warnings.filterwarnings("ignore")

    try:
        # train model
        for key, value in model.train(**params).items():
            training[key].extend(x.item() for x in value)
        if np.isnan(training["train_loss"]).any():
            print("NAN detected after initial training, exiting.")
        else:
            model.save_ckpt("ckpt.pth")

            # prune
            model.prune(prune_threshold)
            if np.isnan(model.forward(X_train).detach().numpy()).any():
                print("Pruning failed. Loading previous model.")
                model.load_ckpt("ckpt.pth")
            else:
                print("Pruning succeeded.")
                model.save_ckpt("ckpt.pth")

            # auto symbolic
            autosym_model(model, r2_threshold)
            if np.isnan(model.forward(X_train).detach().numpy()).any():
                print("Auto Symbolic failed. Loading previous model.")
                model.load_ckpt("ckpt.pth")
            else:
                print("Auto Symbolic succeeded.")
                model.save_ckpt("ckpt.pth")

            # try training again
            for key, value in model.train(**params).items():
                training[key].extend(x.item() for x in value)
            if np.isnan(training["train_loss"]).any():
                print("NAN detected after symbolic training, exiting.")
                model.load_ckpt("ckpt.pth")
            else:
                print("Symbolic training succeeded.")

chrico-bu-uab closed this as completed Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Error(s) in loading state_dict for KAN #271

RuntimeError: Error(s) in loading state_dict for KAN #271

chrico-bu-uab commented Jun 17, 2024

RuntimeError: Error(s) in loading state_dict for KAN #271

RuntimeError: Error(s) in loading state_dict for KAN #271

Comments

chrico-bu-uab commented Jun 17, 2024