Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kedro simple project run error: ModuleNotFoundError followed by ValueError #3198

Closed
braKhAbid opened this issue Oct 19, 2023 · 5 comments
Closed

Comments

@braKhAbid
Copy link

braKhAbid commented Oct 19, 2023

Full code available on: https://github.com/braKhAbid/kedro_mnist

I am trying to setup a simple python ML project in Kedro. It is very simple, one pipeline of three nodes, a data loading node, a model computing node and a model evaluating node that just prints the accuracy. This is just to learn to use Kedro for an upcoming project.

Despite the simplicity I am struggling to make this work. Basically my kedro run outputs
ModuleNotFoundError: No module named 'kedro_mnist.pipelines.'
followed by
KeyError: 'pipeline' The above exception was the direct cause of the following exception:
and finally
ValueError: Failed to find the pipeline named 'pipeline'. It needs to be generated and returned by the 'register_pipelines' function.
I tried debugging using Google and ChatGPT. ChatGPT made me create an additional hooks.py file where I register my pipeline, but it didn't solve the problem.

Also posted on stackoverflow here. One suggestion from a Kedro developer was to remove ipynb checkpoints folder, which didn't solve the problem, neither did creating a new project without notebook. Same error pops on spaceflights starter folder from kedro.

Any help would be greatly appreciated. I am using Kedro 0.18.13.

I expect normally a float value to be printed in the console.

The full error traceback is:

PS D:\Files\IA divers\kedro_mnist\kedro-mnist> kedro run --pipeline pipeline
[10/18/23 11:49:05] INFO     Kedro project kedro-mnist                                                   session.py:364
                    WARNING  c:\users\rashid\anaconda3\lib\site-packages\kedro\framework\project\__init warnings.py:109
                             __.py:359: UserWarning: An error occurred while importing the
                             'kedro_mnist.pipelines..ipynb_checkpoints' module. Nothing defined therein
                             will be returned by 'find_pipelines'.

                             Traceback (most recent call last):
                               File
                             "c:\users\rashid\anaconda3\lib\site-packages\kedro\framework\project\__ini
                             t__.py", line 357, in find_pipelines
                                 pipeline_module = importlib.import_module(pipeline_module_name)
                               File "c:\users\rashid\anaconda3\lib\importlib\__init__.py", line 127, in
                             import_module
                                 return _bootstrap._gcd_import(name[level:], package, level)
                               File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
                               File "<frozen importlib._bootstrap>", line 991, in _find_and_load
                               File "<frozen importlib._bootstrap>", line 961, in
                             _find_and_load_unlocked
                               File "<frozen importlib._bootstrap>", line 219, in
                             _call_with_frames_removed
                               File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
                               File "<frozen importlib._bootstrap>", line 991, in _find_and_load
                               File "<frozen importlib._bootstrap>", line 973, in
                             _find_and_load_unlocked
                             ModuleNotFoundError: No module named 'kedro_mnist.pipelines.'

                               warnings.warn(

┌─────────────────────────────── Traceback (most recent call last) ────────────────────────────────┐
│ c:\users\rashid\anaconda3\lib\site-packages\kedro\framework\session\session.py:381 in run        │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\kedro\framework\project\__init__.py:137 in inner     │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
KeyError: 'pipeline'

The above exception was the direct cause of the following exception:

┌─────────────────────────────── Traceback (most recent call last) ────────────────────────────────┐
│ c:\users\rashid\anaconda3\lib\runpy.py:194 in _run_module_as_main                                │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\runpy.py:87 in _run_code                                           │
│                                                                                                  │
│ in <module>:7                                                                                    │
│                                                                                                  │
│   4 from kedro.framework.cli import main                                                         │
│   5 if __name__ == '__main__':                                                                   │
│   6 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ > 7 │   sys.exit(main())                                                                         │
│   8                                                                                              │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\kedro\framework\cli\cli.py:211 in main               │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\click\core.py:1157 in __call__                       │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\kedro\framework\cli\cli.py:139 in main               │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\click\core.py:1078 in main                           │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\click\core.py:1688 in invoke                         │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\click\core.py:1434 in invoke                         │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\click\core.py:783 in invoke                          │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\kedro\framework\cli\project.py:453 in run            │
│                                                                                                  │
│ c:\users\rashid\anaconda3\lib\site-packages\kedro\framework\session\session.py:383 in run        │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
ValueError: Failed to find the pipeline named 'pipeline'. It needs to be generated and returned by the
'register_pipelines' function.

@astrojuanlu
Copy link
Member

Hi @braKhAbid ! Can you try removing all .ipynb_checkpoints directories and run Kedro again? If it fails it's a new bug, if it works we will close this as a duplicate of #3194

@ankatiyar
Copy link
Contributor

Hi @braKhAbid, I took a look at your project, the problem is that the nodes.py and the pipeline.py files are under the pipelines folder. If you take a look at the pandas-iris starter - these files should be under kedro_mnist/src/kedro_mnist/nodes.py and kedro_mnist/src/kedro_mnist/pipeline.py not inside the pipelines folder. OR you could use kedro pipeline create <pipeline_name> command which will create a pipeline_name folder under kedro_mnist/src/kedro_mnist/pipelines/<pipeline_name> with nodes.py and pipeline.py in it.

Also all nodes require inputs and outputs, so the load_data() function still needs a dummy input. Join the Kedro slack where you can ask us more questions if you need help, I'm gonna close this issue since it's not a bug or a feature request.

@braKhAbid
Copy link
Author

Hi @braKhAbid ! Can you try removing all .ipynb_checkpoints directories and run Kedro again? If it fails it's a new bug, if it works we will close this as a duplicate of #3194

Hi @astrojuanlu ! actually yes I tried that, I even tried creating a new project without any notebook created and it still fails (I answered your post on the stackoverflow post)

@astrojuanlu
Copy link
Member

Hi @astrojuanlu ! actually yes I tried that, I even tried creating a new project without any notebook created and it still fails (I answered your post on the stackoverflow post)

Thanks for checking! Could you share the complete traceback after you deleted the .ipynb_checkpoints? Maybe it's what @ankatiyar mentioned. Without the full traceback we cannot help.

@ankatiyar
Copy link
Contributor

FWIW, I was able to get the project running after a bit of hacking around. Things I changed -

  • Add a conf/local folder. (Kedro currently requires a local folder in conf/
  • Move nodes.py & pipeline.py to src/<project_name> folder from src/<project_name>/pipelines
  • Delete src/kedro_mnist/pipelines folder
  • To conf/base/parameters.yml add a dummy: "dummy" parameter
  • in nodes.py, modify load_data() fn definition to accept a dummy input
  • in pipeline.py, add input to load_data -
def create_pipeline(**kwargs):
    return Pipeline(
        [
            node(load_data, inputs="param:dummy", outputs=["X_train", "X_test", "y_train", "y_test"]),
            node(create_and_train_model, inputs=["X_train", "y_train"], outputs="model"),
            node(evaluate_model, inputs=["model", "X_test", "y_test"], outputs="value"),
        ]
    )
  • Add scikit-learn to src/requirements.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants