-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify new pyspark
add-ons flow supports Alloy projects
#3220
Comments
I tested this using: https://github.com/McK-Internal/InsureX-insurex/blob/a98305850c94996eed91ab715de7ab98b987908e/src/apps/home_fraud.yml and the instructions here https://brix.quantumblack.com/products/alloy/docs/03_tutorial/02_running_app/#building-an-app. I updated home_fraud:
metadata:
name: home_fraud
description: Fraud detection for household claims.
version:
attr: src.home_fraud.__version__
readme:
file:
- README.md
template:
directory: /Users/merel_theisen/Projects/kedro-starters/spaceflights-pyspark
root: "{{ cookiecutter.repo_name }}"
requirements: "{{ cookiecutter.repo_name }}/requirements.txt"
cookiecutter_json:
_copy_without_render:
- src/insurex/hooks/studio_hooks.py
project_name: home_fraud
repo_name: home_fraud
python_package: home_fraud
kedro_version: 0.18.15
add_ons: "['Testing', 'Custom Logging', 'Data Structure', 'Pyspark']"
docs:
type: sphinx
build:
app_class: kedro
packages:
- insurex
- insurex-datasets
- customerone-lib-ab-testing
- feature-helpers
Differences found in old and new flow:
Notes for alloy team/verticals:
|
thanks for testing this Merel! 🙏 QQ:
does |
Yes, but with the deliberate changes noted in my comment:
I also noticed that running |
Hi Merel! 👋 Thank you for the detailed description of new changes and steps. I've tested new starter for our default demo project which previously used kedro-starters/pyspark. Using following template:
and having kedro@develop installed locally, demo project is built successfully. I'm having some small issues with After we render the template, I observed that kedro
and the all data is removing from data/01_raw. https://github.com/kedro-org/kedro/blob/develop/kedro/templates/project/hooks/utils.py#L116 Is it expected? How we can overcome this on alloy side and disable this overwrite? The data is present in template and only disappear when we render the template using cookiecutter. Another small issue is that for template, we already generate # via spaceflightstools
boltons
# via spaceflights
When template is rendered, sort_requirements(requirements_file_path) is executed from
It is possible to disable sorting from alloy side? Thank you very much! |
Thanks for testing @nmorcotiloqb! To answer your questions:
We're currently implementing the last step of the new project creation flow which allows you to include an example. With that enabled you will get the spaceflights data as well as the example pipelines. When @datajoely reached out he said you were using the
At the moment we don't have any settings to enable/disable the sorting of requirements. We could just not sort them if this causes big problems for the verticals. cc: @astrojuanlu @marc-solomon |
Thanks for the detailed answer @merelcht !
Yes, that's true, we are using pyspark starter that doesn't come with any data by default. On the build time, package can have some data which It would be possible to have additional add-on, similar with What do you think? |
I think that would be a too specific |
Closing this specific issue and will move forward by scheduling a meeting to align product visions. |
Description
Projects created with Alloy rely on the
pyspark
starter. This starter is being archived in favour of the new add-ons flow. When #3073 is completed we need to double check that projects using Alloy can still be created as expected without thepyspark
starter.The text was updated successfully, but these errors were encountered: