update samples from Release-8 as a part of 1.4.0 SDK stable release

fomensah · Apr 27, 2020 · fd2b09e · fd2b09e
1 parent 7970209
commit fd2b09e
Show file tree

Hide file tree

Showing 30 changed files with 774 additions and 72 deletions.
diff --git a/configuration.ipynb b/configuration.ipynb
@@ -103,7 +103,7 @@
       "source": [
         "import azureml.core\n",
         "\n",
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },

diff --git a/how-to-use-azureml/automated-machine-learning/automl_env.yml b/how-to-use-azureml/automated-machine-learning/automl_env.yml
@@ -28,7 +28,6 @@ dependencies:
   - azureml-pipeline
   - pytorch-transformers==1.0.0
   - spacy==2.1.8
-  - onnxruntime==1.0.0
   - https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
 
 channels:

diff --git a/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml b/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml
@@ -29,7 +29,6 @@ dependencies:
   - azureml-pipeline
   - pytorch-transformers==1.0.0
   - spacy==2.1.8
-  - onnxruntime==1.0.0
   - https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz  
 
 channels:

diff --git a/...tion-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb b/...tion-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb
@@ -105,7 +105,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },

diff --git a/...-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb b/...-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb
@@ -93,7 +93,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },

diff --git a/.../automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.ipynb b/.../automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.ipynb
@@ -97,7 +97,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },
@@ -194,8 +194,8 @@
         "    '''\n",
         "    remove = ('headers', 'footers', 'quotes')\n",
         "    categories = [\n",
-        "        'alt.atheism',\n",
-        "        'talk.religion.misc',\n",
+        "        'rec.sport.baseball',\n",
+        "        'rec.sport.hockey',\n",
         "        'comp.graphics',\n",
         "        'sci.space',\n",
         "        ]\n",
@@ -345,7 +345,8 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "You can test the model locally to get a feel of the input/output. This step may require additional package installations such as pytorch."
+        "You can test the model locally to get a feel of the input/output. When the model contains BERT, this step will require pytorch and pytorch-transformers installed in your local environment. The exact versions of these packages can be found in the **automl_env.yml** file located in the local copy of your MachineLearningNotebooks folder here:\n",
+        "MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/automl_env.yml"
       ]
     },
     {
@@ -481,7 +482,7 @@
       "source": [
         "script_folder = os.path.join(os.getcwd(), 'inference')\n",
         "os.makedirs(script_folder, exist_ok=True)\n",
-        "shutil.copy2('infer.py', script_folder)"
+        "shutil.copy('infer.py', script_folder)"
       ]
     },
     {

diff --git a/...ml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.yml b/...ml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.yml
@@ -5,7 +5,6 @@ dependencies:
   - azureml-train-automl
   - azureml-widgets
   - matplotlib
-  - azurmel-train
   - https://download.pytorch.org/whl/cpu/torch-1.1.0-cp35-cp35m-win_amd64.whl
   - sentencepiece==0.1.82
   - pytorch-transformers==1.0

diff --git a/...reml/automated-machine-learning/continuous-retraining/auto-ml-continuous-retraining.ipynb b/...reml/automated-machine-learning/continuous-retraining/auto-ml-continuous-retraining.ipynb
@@ -88,7 +88,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },

diff --git a/.../automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb b/.../automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb
@@ -114,7 +114,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },
@@ -572,7 +572,7 @@
         "\n",
         "script_folder = os.path.join(os.getcwd(), 'inference')\n",
         "os.makedirs(script_folder, exist_ok=True)\n",
-        "shutil.copy2('infer.py', script_folder)"
+        "shutil.copy('infer.py', script_folder)"
       ]
     },
     {

diff --git a/...ml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb b/...ml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb
@@ -87,7 +87,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },
@@ -453,8 +453,8 @@
         "\n",
         "script_folder = os.path.join(os.getcwd(), 'forecast')\n",
         "os.makedirs(script_folder, exist_ok=True)\n",
-        "shutil.copy2('forecasting_script.py', script_folder)\n",
-        "shutil.copy2('forecasting_helper.py', script_folder)"
+        "shutil.copy('forecasting_script.py', script_folder)\n",
+        "shutil.copy('forecasting_helper.py', script_folder)"
       ]
     },
     {

diff --git a/...omated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb b/...omated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb
@@ -97,7 +97,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },

diff --git a/.../automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.ipynb b/.../automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.ipynb
@@ -95,7 +95,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },
@@ -355,9 +355,24 @@
         "                             label_column_name=target_label,\n",
         "                             **time_series_settings)\n",
         "\n",
-        "remote_run = experiment.submit(automl_config, show_output=False)\n",
-        "remote_run.wait_for_completion()\n",
-        "\n",
+        "remote_run = experiment.submit(automl_config, show_output=False)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "remote_run.wait_for_completion()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
         "# Retrieve the best model to use it further.\n",
         "_, fitted_model = remote_run.get_output()"
       ]

diff --git a/...hine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb b/...hine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb
@@ -65,7 +65,8 @@
         "\n",
         "from azureml.core.workspace import Workspace\n",
         "from azureml.core.experiment import Experiment\n",
-        "from azureml.train.automl import AutoMLConfig"
+        "from azureml.train.automl import AutoMLConfig\n",
+        "from azureml.automl.core.featurization import FeaturizationConfig"
       ]
     },
     {
@@ -81,7 +82,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },
@@ -318,6 +319,36 @@
         "target_column_name = 'Quantity'"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Customization\n",
+        "\n",
+        "The featurization customization in forecasting is an advanced feature in AutoML which allows our customers to change the default forecasting featurization behaviors and column types through `FeaturizationConfig`. The supported scenarios include,\n",
+        "1. Column purposes update: Override feature type for the specified column. Currently supports DateTime, Categorical and Numeric. This customization can be used in the scenario that the type of the column cannot correctly reflect its purpose. Some numerical columns, for instance, can be treated as Categorical columns which need to be converted to categorical while some can be treated as epoch timestamp which need to be converted to datetime. To tell our SDK to correctly preprocess these columns, a configuration need to be add with the columns and their desired types.\n",
+        "2. Transformer parameters update: Currently supports parameter change for Imputer only. User can customize imputation methods, the supported methods are constant for target data and mean, median, most frequent and constant for training data. This customization can be used for the scenario that our customers know which imputation methods fit best to the input data. For instance, some datasets use NaN to represent 0 which the correct behavior should impute all the missing value with 0. To achieve this behavior, these columns need to be configured as constant imputation with `fill_value` 0.\n",
+        "3. Drop columns: Columns to drop from being featurized. These usually are the columns which are leaky or the columns contain no useful data.\n",
+        "\n",
+        "This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page.](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "featurization_config = FeaturizationConfig()\n",
+        "featurization_config.drop_columns = ['logQuantity']  # 'logQuantity' is a leaky feature, so we remove it.\n",
+        "# Force the CPWVOL5 feature to be numeric type.\n",
+        "featurization_config.add_column_purpose('CPWVOL5', 'Numeric')\n",
+        "# Fill missing values in the target column, Quantity, with zeros.\n",
+        "featurization_config.add_transformer_params('Imputer', ['Quantity'], {\"strategy\": \"constant\", \"fill_value\": 0})\n",
+        "# Fill missing values in the INCOME column with median value.\n",
+        "featurization_config.add_transformer_params('Imputer', ['INCOME'], {\"strategy\": \"median\"})"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -349,8 +380,8 @@
         "|**debug_log**|Log file path for writing debugging information|\n",
         "|**time_column_name**|Name of the datetime column in the input data|\n",
         "|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
-        "|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
-        "|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|"
+        "|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|\n",
+        "|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*.|"
       ]
     },
     {
@@ -362,7 +393,6 @@
         "time_series_settings = {\n",
         "    'time_column_name': time_column_name,\n",
         "    'grain_column_names': grain_column_names,\n",
-        "    'drop_column_names': ['logQuantity'],  # 'logQuantity' is a leaky feature, so we remove it.\n",
         "    'max_horizon': n_test_periods\n",
         "}\n",
         "\n",
@@ -374,6 +404,7 @@
         "                             label_column_name=target_column_name,\n",
         "                             compute_target=compute_target,\n",
         "                             enable_early_stopping=True,\n",
+        "                             featurization=featurization_config,\n",
         "                             n_cross_validations=3,\n",
         "                             verbosity=logging.INFO,\n",
         "                             **time_series_settings)"
@@ -425,6 +456,33 @@
         "model_name = best_run.properties['model_name']"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Transparency\n",
+        "\n",
+        "View updated featurization summary"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "custom_featurizer = fitted_model.named_steps['timeseriestransformer']"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "custom_featurizer.get_featurization_summary()"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},

diff --git a/...run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb b/...run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb
@@ -95,7 +95,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },
@@ -370,7 +370,7 @@
       "metadata": {},
       "source": [
         "#### Initialize the Mimic Explainer for feature importance\n",
-        "For explaining the AutoML models, use the MimicWrapper from azureml.explain.model package. The MimicWrapper can be initialized with fields in automl_explainer_setup_obj, your workspace and a LightGBM model which acts as a surrogate model to explain the AutoML model (fitted_model here). The MimicWrapper also takes the automl_run object where engineered explanations will be uploaded."
+        "For explaining the AutoML models, use the MimicWrapper from azureml.explain.model package. The MimicWrapper can be initialized with fields in automl_explainer_setup_obj, your workspace and a surrogate model to explain the AutoML model (fitted_model here). The MimicWrapper also takes the automl_run object where engineered explanations will be uploaded."
       ]
     },
     {
@@ -379,13 +379,14 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n",
         "from azureml.explain.model.mimic_wrapper import MimicWrapper\n",
-        "explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, \n",
+        "explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator,\n",
+        "                         explainable_model=automl_explainer_setup_obj.surrogate_model, \n",
         "                         init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,\n",
         "                         features=automl_explainer_setup_obj.engineered_feature_names, \n",
         "                         feature_maps=[automl_explainer_setup_obj.feature_map],\n",
-        "                         classes=automl_explainer_setup_obj.classes)"
+        "                         classes=automl_explainer_setup_obj.classes,\n",
+        "                         explainer_kwargs=automl_explainer_setup_obj.surrogate_model_params)"
       ]
     },
     {

diff --git a/...g/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb b/...g/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb
@@ -98,7 +98,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },

diff --git a/...zureml/automated-machine-learning/regression-explanation-featurization/train_explainer.py b/...zureml/automated-machine-learning/regression-explanation-featurization/train_explainer.py
@@ -11,7 +11,6 @@
 from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
 from azureml.explain.model.mimic_wrapper import MimicWrapper
 from automl.client.core.common.constants import MODEL_PATH
-from azureml.automl.core.shared.constants import MODEL_EXPLANATION_TAG
 from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer, save
 
 
@@ -69,9 +68,6 @@
                                      raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
                                      eval_dataset=automl_explainer_setup_obj.X_test_transform)
 
-# Set tag that explanations completed
-automl_run.tag(MODEL_EXPLANATION_TAG, 'True')
-
 print("Engineered and raw explanations computed successfully")
 
 # Initialize the ScoringExplainer

diff --git a/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb b/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb
@@ -92,7 +92,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.4.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },

diff --git a/...reml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb b/...reml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb
@@ -696,6 +696,7 @@
         "1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
         "1. Inferencing time: deploy a classification model and explainer:\n",
         "    1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
+        "    1. [Deploy a locally-trained keras model and explainer](../scoring-time/train-explain-model-keras-locally-and-deploy.ipynb)\n",
         "    1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
       ]
     },

diff --git a/.../explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb b/.../explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb
@@ -591,6 +591,7 @@
         "1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n",
         "1. Inferencing time: deploy a classification model and explainer:\n",
         "    1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
+        "    1. [Deploy a locally-trained keras model and explainer](../scoring-time/train-explain-model-keras-locally-and-deploy.ipynb)\n",
         "    1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
       ]
     },