diff --git a/docs/demo/GCP/Mortgage-ETL-GPU.ipynb b/docs/demo/GCP/Mortgage-ETL-GPU.ipynb index 574eb18ed56..059a38082b9 100644 --- a/docs/demo/GCP/Mortgage-ETL-GPU.ipynb +++ b/docs/demo/GCP/Mortgage-ETL-GPU.ipynb @@ -16,8 +16,69 @@ "\n", "### Prerequisite\n", "\n", - "This notebook runs in a Dataproc cluster with GPU nodes, with [Spark RAPIDS](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids) set up.\n", + "This notebook runs in a Dataproc cluster with GPU nodes, with [Spark RAPIDS](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids) set up." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define Data Input/Output location\n", + "\n", + "You need to first configure a bucket with Fannie Mae dataset as mentioned above. Here are some commands you can use once you have the tgz file (Example, for full mortgage dataset its `mortgage_2000-2016.tgz` which is about 23.3 GB).\n", + "\n", + "Replace `TARGET_BUCKET` with the bucket name you'd like to use." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use pigz (Parallel gzip) to decompress the file, this will generate mortgage_2000-2016.tar (about 195 GB) file\n", + "!pigz -d mortgage_2000-2016.tgz\n", "\n", + "# untar the file\n", + "!tar xvf mortgage_2000-2016.tar -C mortgage_full/\n", + "\n", + "# upload it to the desired bucket\n", + "!gsutil -m cp -r mortgage_full/* gs://TARGET_BUCKET/mortgage_full/ &\n", + " \n", + "# verify the upload\n", + "!gsutil ls gs://TARGET_BUCKET/mortgage_full\n", + "!gsutil du -hs gs://TARGET_BUCKET/mortgage_full" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "More information on Pigz [here](https://github.com/madler/pigz).\n", + "\n", + "Now lets configure data input/output locations." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "bucket = \"TARGET_BUCKET\"\n", + "\n", + "orig_perf_path = 'gs://'+bucket+'/mortgage_full/perf/*'\n", + "orig_acq_path = 'gs://'+bucket+'/mortgage_full/acq/*'\n", + "train_path = 'gs://'+bucket+'/mortgage_full/train/'\n", + "test_path = 'gs://'+bucket+'/mortgage_full/test/'\n", + "tmp_perf_path = 'gs://'+bucket+'/mortgage_parquet_gpu/perf/'\n", + "tmp_acq_path = 'gs://'+bucket+'/mortgage_parquet_gpu/acq/'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "### Define ETL Process\n", "\n", "Define data schema and steps to do the ETL process:" @@ -25,12 +86,12 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import time\n", - "from pyspark import broadcast\n", + "from pyspark import broadcast, SparkConf\n", "from pyspark.sql import SparkSession\n", "from pyspark.sql.functions import *\n", "from pyspark.sql.types import *\n", @@ -373,11 +434,12 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ - "sc.stop()\n", + "if \"sc\" in globals():\n", + " sc.stop()\n", "\n", "conf = SparkConf().setAppName(\"MortgageETL\")\n", "conf.set('spark.rapids.sql.explain', 'ALL')\n", @@ -406,28 +468,6 @@ "sc = spark.sparkContext" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define Data Input/Output location" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "orig_perf_path = 'gs://dataproc-nv-demo/mortgage_full/perf/*'\n", - "orig_acq_path = 'gs://dataproc-nv-demo/mortgage_full/acq/*'\n", - "\n", - "train_path = 'gs://dataproc-nv-demo/mortgage_full/train/'\n", - "test_path = 'gs://dataproc-nv-demo/mortgage_full/test/'\n", - "tmp_perf_path = 'gs://dataproc-nv-demo/mortgage_parquet_gpu/perf/'\n", - "tmp_acq_path = 'gs://dataproc-nv-demo/mortgage_parquet_gpu/acq/'" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -437,14 +477,14 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "108.28529238700867\n" + "126.0085141658783\n" ] } ], @@ -470,15 +510,15 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "137.99262690544128\n", - "171.97584056854248\n" + "443.94579005241394\n", + "603.1326625347137\n" ] } ], @@ -515,39 +555,40 @@ "output_type": "stream", "text": [ "== Physical Plan ==\n", - "*(2) GpuColumnarToRow false\n", - "+- GpuProject [gpucoalesce(orig_channel#1922, 0) AS orig_channel#3686, gpucoalesce(first_home_buyer#2124, 0) AS first_home_buyer#3687, gpucoalesce(loan_purpose#2326, 0) AS loan_purpose#3688, gpucoalesce(property_type#2528, 0) AS property_type#3689, gpucoalesce(occupancy_status#2730, 0) AS occupancy_status#3690, gpucoalesce(property_state#2932, 0) AS property_state#3691, gpucoalesce(relocation_mortgage_indicator#3134, 0) AS relocation_mortgage_indicator#3692, gpucoalesce(seller_name#3336, 0) AS seller_name#3693, gpucoalesce(id#1728, 0) AS mod_flag#3694, gpucoalesce(gpunanvl(orig_interest_rate#297, null), 0.0) AS orig_interest_rate#3695, gpucoalesce(orig_upb#298, 0) AS orig_upb#3696, gpucoalesce(orig_loan_term#299, 0) AS orig_loan_term#3697, gpucoalesce(gpunanvl(orig_ltv#302, null), 0.0) AS orig_ltv#3698, gpucoalesce(gpunanvl(orig_cltv#303, null), 0.0) AS orig_cltv#3699, gpucoalesce(gpunanvl(num_borrowers#304, null), 0.0) AS num_borrowers#3700, gpucoalesce(gpunanvl(dti#305, null), 0.0) AS dti#3701, gpucoalesce(gpunanvl(borrower_credit_score#306, null), 0.0) AS borrower_credit_score#3702, gpucoalesce(num_units#310, 0) AS num_units#3703, gpucoalesce(zip#313, 0) AS zip#3704, gpucoalesce(gpunanvl(mortgage_insurance_percent#314, null), 0.0) AS mortgage_insurance_percent#3705, gpucoalesce(current_loan_delinquency_status#240, 0) AS current_loan_delinquency_status#3706, gpucoalesce(gpunanvl(current_actual_upb#234, null), 0.0) AS current_actual_upb#3707, gpucoalesce(gpunanvl(interest_rate#233, null), 0.0) AS interest_rate#3708, gpucoalesce(gpunanvl(loan_age#235, null), 0.0) AS loan_age#3709, ... 3 more fields]\n", - " +- GpuBroadcastHashJoin [mod_flag#241], [mod_flag#3404], LeftOuter, BuildRight\n", - " :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, zip#313, mortgage_insurance_percent#314, orig_channel#1922, first_home_buyer#2124, loan_purpose#2326, property_type#2528, occupancy_status#2730, ... 3 more fields]\n", - " : +- GpuBroadcastHashJoin [seller_name#1402], [seller_name#3202], LeftOuter, BuildRight\n", - " : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, zip#313, mortgage_insurance_percent#314, orig_channel#1922, first_home_buyer#2124, loan_purpose#2326, property_type#2528, ... 3 more fields]\n", - " : : +- GpuBroadcastHashJoin [relocation_mortgage_indicator#318], [relocation_mortgage_indicator#3000], LeftOuter, BuildRight\n", - " : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, zip#313, mortgage_insurance_percent#314, relocation_mortgage_indicator#318, orig_channel#1922, first_home_buyer#2124, loan_purpose#2326, ... 3 more fields]\n", - " : : : +- GpuBroadcastHashJoin [property_state#312], [property_state#2798], LeftOuter, BuildRight\n", - " : : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, property_state#312, zip#313, mortgage_insurance_percent#314, relocation_mortgage_indicator#318, orig_channel#1922, first_home_buyer#2124, ... 3 more fields]\n", - " : : : : +- GpuBroadcastHashJoin [occupancy_status#311], [occupancy_status#2596], LeftOuter, BuildRight\n", - " : : : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, occupancy_status#311, property_state#312, zip#313, mortgage_insurance_percent#314, relocation_mortgage_indicator#318, orig_channel#1922, ... 3 more fields]\n", - " : : : : : +- GpuBroadcastHashJoin [property_type#309], [property_type#2394], LeftOuter, BuildRight\n", + "*(5) GpuColumnarToRow false\n", + "+- GpuProject [gpucoalesce(orig_channel#1926, 0) AS orig_channel#3690, gpucoalesce(first_home_buyer#2128, 0) AS first_home_buyer#3691, gpucoalesce(loan_purpose#2330, 0) AS loan_purpose#3692, gpucoalesce(property_type#2532, 0) AS property_type#3693, gpucoalesce(occupancy_status#2734, 0) AS occupancy_status#3694, gpucoalesce(property_state#2936, 0) AS property_state#3695, gpucoalesce(relocation_mortgage_indicator#3138, 0) AS relocation_mortgage_indicator#3696, gpucoalesce(seller_name#3340, 0) AS seller_name#3697, gpucoalesce(id#1728, 0) AS mod_flag#3698, gpucoalesce(gpunanvl(orig_interest_rate#297, null), 0.0) AS orig_interest_rate#3699, gpucoalesce(orig_upb#298, 0) AS orig_upb#3700, gpucoalesce(orig_loan_term#299, 0) AS orig_loan_term#3701, gpucoalesce(gpunanvl(orig_ltv#302, null), 0.0) AS orig_ltv#3702, gpucoalesce(gpunanvl(orig_cltv#303, null), 0.0) AS orig_cltv#3703, gpucoalesce(gpunanvl(num_borrowers#304, null), 0.0) AS num_borrowers#3704, gpucoalesce(gpunanvl(dti#305, null), 0.0) AS dti#3705, gpucoalesce(gpunanvl(borrower_credit_score#306, null), 0.0) AS borrower_credit_score#3706, gpucoalesce(num_units#310, 0) AS num_units#3707, gpucoalesce(zip#313, 0) AS zip#3708, gpucoalesce(gpunanvl(mortgage_insurance_percent#314, null), 0.0) AS mortgage_insurance_percent#3709, gpucoalesce(current_loan_delinquency_status#240, 0) AS current_loan_delinquency_status#3710, gpucoalesce(gpunanvl(current_actual_upb#234, null), 0.0) AS current_actual_upb#3711, gpucoalesce(gpunanvl(interest_rate#233, null), 0.0) AS interest_rate#3712, gpucoalesce(gpunanvl(loan_age#235, null), 0.0) AS loan_age#3713, ... 3 more fields]\n", + " +- GpuBroadcastHashJoin [mod_flag#241], [mod_flag#3408], LeftOuter, BuildRight\n", + " :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, zip#313, mortgage_insurance_percent#314, orig_channel#1926, first_home_buyer#2128, loan_purpose#2330, property_type#2532, occupancy_status#2734, ... 3 more fields]\n", + " : +- GpuBroadcastHashJoin [seller_name#1402], [seller_name#3206], LeftOuter, BuildRight\n", + " : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, zip#313, mortgage_insurance_percent#314, orig_channel#1926, first_home_buyer#2128, loan_purpose#2330, property_type#2532, ... 3 more fields]\n", + " : : +- GpuBroadcastHashJoin [relocation_mortgage_indicator#318], [relocation_mortgage_indicator#3004], LeftOuter, BuildRight\n", + " : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, zip#313, mortgage_insurance_percent#314, relocation_mortgage_indicator#318, orig_channel#1926, first_home_buyer#2128, loan_purpose#2330, ... 3 more fields]\n", + " : : : +- GpuBroadcastHashJoin [property_state#312], [property_state#2802], LeftOuter, BuildRight\n", + " : : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, property_state#312, zip#313, mortgage_insurance_percent#314, relocation_mortgage_indicator#318, orig_channel#1926, first_home_buyer#2128, ... 3 more fields]\n", + " : : : : +- GpuBroadcastHashJoin [occupancy_status#311], [occupancy_status#2600], LeftOuter, BuildRight\n", + " : : : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, num_units#310, occupancy_status#311, property_state#312, zip#313, mortgage_insurance_percent#314, relocation_mortgage_indicator#318, orig_channel#1926, ... 3 more fields]\n", + " : : : : : +- GpuBroadcastHashJoin [property_type#309], [property_type#2398], LeftOuter, BuildRight\n", " : : : : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, property_type#309, num_units#310, occupancy_status#311, property_state#312, zip#313, mortgage_insurance_percent#314, relocation_mortgage_indicator#318, ... 3 more fields]\n", - " : : : : : : +- GpuBroadcastHashJoin [loan_purpose#308], [loan_purpose#2192], LeftOuter, BuildRight\n", + " : : : : : : +- GpuBroadcastHashJoin [loan_purpose#308], [loan_purpose#2196], LeftOuter, BuildRight\n", " : : : : : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, loan_purpose#308, property_type#309, num_units#310, occupancy_status#311, property_state#312, zip#313, mortgage_insurance_percent#314, ... 3 more fields]\n", - " : : : : : : : +- GpuBroadcastHashJoin [first_home_buyer#307], [first_home_buyer#1990], LeftOuter, BuildRight\n", + " : : : : : : : +- GpuBroadcastHashJoin [first_home_buyer#307], [first_home_buyer#1994], LeftOuter, BuildRight\n", " : : : : : : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, first_home_buyer#307, loan_purpose#308, property_type#309, num_units#310, occupancy_status#311, property_state#312, zip#313, ... 3 more fields]\n", - " : : : : : : : : +- GpuBroadcastHashJoin [orig_channel#295], [orig_channel#1788], LeftOuter, BuildRight\n", + " : : : : : : : : +- GpuBroadcastHashJoin [orig_channel#295], [orig_channel#1792], LeftOuter, BuildRight\n", " : : : : : : : : :- GpuProject [interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042, orig_channel#295, seller_name#1402, orig_interest_rate#297, orig_upb#298, orig_loan_term#299, orig_ltv#302, orig_cltv#303, num_borrowers#304, dti#305, borrower_credit_score#306, first_home_buyer#307, loan_purpose#308, property_type#309, num_units#310, occupancy_status#311, property_state#312, ... 3 more fields]\n", - " : : : : : : : : : +- GpuShuffledHashJoin [loan_id#230L, quarter#261], [loan_id#294L, quarter#319], Inner, BuildRight\n", + " : : : : : : : : : +- GpuShuffledHashJoin [loan_id#230L, quarter#261], [loan_id#294L, quarter#319], Inner, BuildRight, false\n", " : : : : : : : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(loan_id#230L, quarter#261, 160), true, [id=#3294]\n", + " : : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(loan_id#230L, quarter#261, 160), true, [id=#4208]\n", " : : : : : : : : : : +- GpuProject [quarter#261, loan_id#230L, interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, delinquency_12#1042]\n", - " : : : : : : : : : : +- GpuShuffledHashJoin [quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint)], [quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L], LeftOuter, BuildRight\n", + " : : : : : : : : : : +- GpuShuffledHashJoin [quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint)], [quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L], LeftOuter, BuildRight, false\n", " : : : : : : : : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#3124]\n", - " : : : : : : : : : : : +- GpuProject [loan_id#230L, interest_rate#233, current_actual_upb#234, loan_age#235, msa#239, current_loan_delinquency_status#240, mod_flag#241, non_interest_bearing_upb#256, quarter#261, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#231, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#1070, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#231, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#1106]\n", - " : : : : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", - " : : : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,interest_rate#233,current_actual_upb#234,loan_age#235,msa#239,current_loan_delinquency_status#240,mod_flag#241,non_interest_bearing_upb#256,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : : : : : : : : : +- *(3) GpuColumnarToRow false\n", + " : : : : : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : : : : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : : : : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : : : : : : : : : +- *(1) GpuColumnarToRow false\n", + " : : : : : : : : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", - " : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : : : : : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : : : : : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -629,73 +673,76 @@ " : : : : : : : : : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : : : : : : : : : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : : : : : : : : : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : : : : : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : : : : : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : : : : : : : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : : : : : : : : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : : : : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : : : : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : : : : : : : : : +- *(2) GpuColumnarToRow false\n", + " : : : : : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", " : : : : : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : : : : : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : : : : : : : : : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : : : : : : : : +- *(3) GpuColumnarToRow false\n", + " : : : : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : : : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : : : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : : : : : : : : +- *(1) GpuColumnarToRow false\n", + " : : : : : : : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", - " : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : : : : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : : : : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -703,73 +750,76 @@ " : : : : : : : : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : : : : : : : : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : : : : : : : : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : : : : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : : : : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : : : : : : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : : : : : : : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : : : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : : : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : : : : : : : : +- *(2) GpuColumnarToRow false\n", + " : : : : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : : : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", " : : : : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : : : : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : : : : : : : : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : : : : : : : +- *(3) GpuColumnarToRow false\n", + " : : : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : : : : : : : +- *(1) GpuColumnarToRow false\n", + " : : : : : : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : : : : : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", - " : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : : : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : : : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -777,73 +827,76 @@ " : : : : : : : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : : : : : : : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : : : : : : : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : : : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : : : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : : : : : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : : : : : : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : : : : : : : +- *(2) GpuColumnarToRow false\n", + " : : : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", " : : : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : : : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : : : : : : : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : : : : : : +- *(3) GpuColumnarToRow false\n", + " : : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : : : : : : +- *(1) GpuColumnarToRow false\n", + " : : : : : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : : : : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", - " : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -851,73 +904,76 @@ " : : : : : : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : : : : : : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : : : : : : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : : : : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : : : : : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : : : : : : +- *(2) GpuColumnarToRow false\n", + " : : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", " : : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : : : : : : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : : : : : +- *(3) GpuColumnarToRow false\n", + " : : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : : : : : +- *(1) GpuColumnarToRow false\n", + " : : : : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : : : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", - " : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -925,73 +981,76 @@ " : : : : : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : : : : : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : : : : : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : : : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : : : : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : : : : : +- *(2) GpuColumnarToRow false\n", + " : : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : : +- GpuCoalesceBatches RequireSingleBatch\n", " : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : : : : : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : : : : +- *(3) GpuColumnarToRow false\n", + " : : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : : : : +- *(1) GpuColumnarToRow false\n", + " : : : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : +- GpuCoalesceBatches RequireSingleBatch\n", - " : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -999,73 +1058,76 @@ " : : : : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : : : : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : : : : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : : : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : : : : +- *(2) GpuColumnarToRow false\n", + " : : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : : +- GpuCoalesceBatches RequireSingleBatch\n", " : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : : : : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : : : +- *(3) GpuColumnarToRow false\n", + " : : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : : : +- *(1) GpuColumnarToRow false\n", + " : : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : +- GpuCoalesceBatches RequireSingleBatch\n", - " : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -1073,73 +1135,76 @@ " : : : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : : : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : : : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : : : +- *(2) GpuColumnarToRow false\n", + " : : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : : +- GpuCoalesceBatches RequireSingleBatch\n", " : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : : : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : : +- *(3) GpuColumnarToRow false\n", + " : : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : : +- *(1) GpuColumnarToRow false\n", + " : : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : +- GpuCoalesceBatches RequireSingleBatch\n", - " : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -1147,73 +1212,76 @@ " : : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : : +- *(2) GpuColumnarToRow false\n", + " : : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : : +- GpuCoalesceBatches RequireSingleBatch\n", " : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : : +- *(3) GpuColumnarToRow false\n", + " : : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : +- GpuColumnarExchange gpuhashpartitioning(quarter#261, loan_id#230L, cast(timestamp_year#1106 as bigint), cast(timestamp_month#1070 as bigint), 160), true, [id=#558]\n", + " : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : +- *(1) Project [loan_id#230L, mod_flag#241, quarter#261, month(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#1070, year(cast(cast(unix_timestamp(monthly_reporting_period#231, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#1106]\n", + " : : +- *(1) GpuColumnarToRow false\n", + " : : +- GpuFilter ((NOT quarter#261 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#230L)) AND gpuisnotnull(quarter#261))\n", + " : : +- GpuFileGpuScan parquet [loan_id#230L,monthly_reporting_period#231,mod_flag#241,quarter#261] Batched: true, DataFilters: [NOT quarter#261 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#230L), isnotnull(quarter#261)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : +- GpuCoalesceBatches RequireSingleBatch\n", - " : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#339]\n", + " : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, timestamp_year#996L, timestamp_month#1025L, 160), true, [id=#585]\n", " : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : +- GpuHashAggregate(keys=[quarter#1173, loan_id#1142L, josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, month_y#937], functions=[]), filters=ArrayBuffer())\n", " : +- GpuProject [quarter#1173, FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) AS josh_mody_n#953L, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686, loan_id#1142L, month_y#937]\n", @@ -1221,38 +1289,40 @@ " : +- GpuFilter (gpuisnotnull(CASE WHEN ((((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) = 0) THEN 12 ELSE (((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast(month_y#937 as bigint)) pmod 12) END) AND gpuisnotnull(FLOOR((cast(((24000 + (FLOOR((cast(((((timestamp_year#776 * 12) + timestamp_month#740) - 24000) - month_y#937) as double) / 12.0)) * 12)) + cast((month_y#937 - 1) as bigint)) as double) / 12.0))))\n", " : +- GpuGenerate false, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686], [month_y#937]\n", " : +- GpuProject [loan_id#1142L, quarter#1173, timestamp_month#740, timestamp_year#776, ever_30#693, ever_90#694, ever_180#695, delinquency_30#682, delinquency_90#684, delinquency_180#686]\n", - " : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight\n", + " : +- GpuShuffledHashJoin [quarter#1173, loan_id#1142L], [quarter#922, loan_id#891L], LeftOuter, BuildRight, false\n", " : :- GpuCoalesceBatches TargetSize(536870912)\n", - " : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#322]\n", - " : : +- GpuProject [quarter#1173, loan_id#1142L, gpumonth(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_month#740, gpuyear(cast(cast(gpuunixtimestamp(monthly_reporting_period#1143, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date)) AS timestamp_year#776]\n", - " : : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", - " : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", + " : : +- GpuColumnarExchange gpuhashpartitioning(quarter#1173, loan_id#1142L, 160), true, [id=#565]\n", + " : : +- GpuRowToColumnar TargetSize(536870912)\n", + " : : +- *(2) Project [quarter#1173, loan_id#1142L, month(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_month#740, year(cast(cast(unix_timestamp(monthly_reporting_period#1143, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date)) AS timestamp_year#776]\n", + " : : +- *(2) GpuColumnarToRow false\n", + " : : +- GpuFilter ((NOT quarter#1173 INSET (2016Q1,2016Q2,2016Q3,2016Q4) AND gpuisnotnull(loan_id#1142L)) AND gpuisnotnull(quarter#1173))\n", + " : : +- GpuFileGpuScan parquet [loan_id#1142L,monthly_reporting_period#1143,quarter#1173] Batched: true, DataFilters: [NOT quarter#1173 IN (2016Q1,2016Q2,2016Q3,2016Q4), isnotnull(loan_id#1142L), isnotnull(quarter#1..., Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [Not(In(quarter, [2016Q1,2016Q2,2016Q3,2016Q4])), IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct\n", " : +- GpuCoalesceBatches RequireSingleBatch\n", " : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[gpumax(current_loan_delinquency_status#901), gpumin(delinquency_30#664), gpumin(delinquency_90#665), gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", " : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#327]\n", + " : +- GpuColumnarExchange gpuhashpartitioning(quarter#922, loan_id#891L, 160), true, [id=#573]\n", " : +- GpuHashAggregate(keys=[quarter#922, loan_id#891L], functions=[partial_gpumax(current_loan_delinquency_status#901), partial_gpumin(delinquency_30#664), partial_gpumin(delinquency_90#665), partial_gpumin(delinquency_180#666)]), filters=ArrayBuffer(None, None, None, None))\n", - " : +- GpuProject [quarter#922, loan_id#891L, current_loan_delinquency_status#901, CASE WHEN (current_loan_delinquency_status#901 >= 1) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(gpuunixtimestamp(monthly_reporting_period#892, MM/dd/yyyy, %m/%d/%Y, None) as timestamp) as date) END AS delinquency_180#666]\n", - " : +- GpuCoalesceBatches TargetSize(536870912)\n", - " : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", - " : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://dataproc-nv-demo/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct= 1) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_30#664, CASE WHEN (current_loan_delinquency_status#901 >= 3) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_90#665, CASE WHEN (current_loan_delinquency_status#901 >= 6) THEN cast(cast(unix_timestamp(monthly_reporting_period#892, MM/dd/yyyy, Some(Etc/UTC)) as timestamp) as date) END AS delinquency_180#666]\n", + " : +- *(3) GpuColumnarToRow false\n", + " : +- GpuFilter (gpuisnotnull(loan_id#891L) AND gpuisnotnull(quarter#922))\n", + " : +- GpuFileGpuScan parquet [loan_id#891L,monthly_reporting_period#892,current_loan_delinquency_status#901,quarter#922] Batched: true, DataFilters: [isnotnull(loan_id#891L), isnotnull(quarter#922)], Format: Parquet, Location: InMemoryFileIndex[gs://akshita-snap/mortgage_parquet_gpu/perf], PartitionFilters: [], PushedFilters: [IsNotNull(loan_id), IsNotNull(quarter)], ReadSchema: struct