Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation for scene boundary prediction #5

Open
BrunoSader opened this issue Mar 23, 2022 · 7 comments
Open

Explanation for scene boundary prediction #5

BrunoSader opened this issue Mar 23, 2022 · 7 comments

Comments

@BrunoSader
Copy link

Hello,
I read that you might be working on a demo on how to predict on a single video.
I was able to create my own dataloader and call trainer.predict() but the output is not binary (boundary or not boundary).
Does this model support scene boundary prediction (if so could you detail what are the steps? I just need to understand how i can make it work) or is it only a shot encoding model?

Thank you very much

@JonghwanMun
Copy link
Contributor

Simply, apply softmax to generate probaility and then thresholding the value at the second dimension by 0.5 provides the binary prediction result.

For example,

logits = model(input)
probs = F.softmax(logits, dim=-1)
preds = probs[:, 1] > 0.5

@BrunoSader
Copy link
Author

Thank you.
I have some more questions if you don't mind.
The model I am using is the trained BaSSl 40epochs, here is how I load it.

    cfg = init_hydra_config(mode="extract_shot")
    apply_random_seed(cfg)
    cfg = load_pretrained_config(cfg)

    # init model
    cfg, model = init_model(cfg)

    # init trainer
    cfg, trainer = init_trainer(cfg)

Is this right?
And I don't understand what I am supposed to give it as an input.
Do I just create a dataloader of tensors for each image in my movie?
Thank you very much for your help 😄

@JonghwanMun
Copy link
Contributor

For loading a BaSSL 40 epochs scene segmentation model in inference, you need to convert load_pretrained_config to load_finetuned_config function, for example,

def load_finetuned_config(cfg):
     ckpt_root = cfg.CKPT_PATH
     load_from = cfg.LOAD_FROM

     with open(os.path.join(ckpt_root, load_from, "config.json"), "r") as fopen:
         finetuned_cfg = json.load(fopen)
         finetuned_cfg = easydict.EasyDict(finetuned_cfg)

     # override configuration of pre-trained model
     cfg.MODEL = finetuned_cfg.MODEL
     cfg.PRETRAINED_LOAD_FROM = finetuned_cfg.PRETRAINED_LOAD_FROM

     cfg.TRAIN.USE_SINGLE_KEYFRAME = False
     cfg.MODEL.contextual_relation_network.params.trn.pooling_method = "center"

     # override neighbor size of an input sequence of shots
     sampling = finetuned_cfg.LOSS.sampling_method.name
     nsize = finetuned_cfg.LOSS.sampling_method.params[sampling]["neighbor_size"]
     cfg.LOSS.sampling_method.params["sbd"]["neighbor_size"] = nsize

     return cfg

Then, you also need to specify LOAD_FROM option to tell the path of a finetuned model.
It may be same with EXPR_NAME used during finetuning stage.

For an input, our algorithm works on top of shot.
you first need to divide a movie into a series of shots and extract three key-frames for each shot (refer http://docs.movienet.site/movie-toolbox/tools/shot_detector).
Then, you need to feed three key-frames for each shot as input of the network.

@LFavano
Copy link

LFavano commented May 24, 2022

Hello, I would also be interested in knowing more details on how to run the code for inference starting from a fine-tuned model, I tried using @JonghwanMun but couldn't come up with working code.

Is it correct to init the cfg this way, and would "finetune" be the correct mode here?

cfg = init_hydra_config(mode="finetune")
apply_random_seed(cfg)
cfg = load_finetuned_config(cfg)

About the data, I have two questions:

  • Can the init_data_loader util function be used for the input that trainer.predict() expects to receive?
  • If using the data loader is not possible, can I ask what code would generate the right input for the network? I have extracted the key frames for each shot, but from running model(data) it seems that the expected shape is [64,3,7,7], is this the right behavior?

Thank you

@barry2025
Copy link

Hello, I see FinetuningWrapper.load_from_checkpoint in main_utils.py, but i cannot find the implementation of load_from_checkpoint in finetune_wrapper.py, I wonder how it works, thanks

@JonghwanMun
Copy link
Contributor

@barry2025
load_from_checkpoint() is a function inherited from LightningModule of pytorch lightning;
It initializes the parameters from the checkpoint given by checkpoint_path when constructing FinetuningWrapper instance.
Please refer to pytorch lightning document for more details.

@barry2025
Copy link

Thanks! I never used pytorch lightning before, I'll try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants