Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AWS CUR 2.0 format #318

Open
stoiev opened this issue Feb 8, 2024 · 5 comments
Open

Add support for AWS CUR 2.0 format #318

stoiev opened this issue Feb 8, 2024 · 5 comments

Comments

@stoiev
Copy link
Collaborator

stoiev commented Feb 8, 2024

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
Support for AWS CUR 2.0 format.

Additional context
It's not clear if AWS will deprecate actual CUR format, but they are classifying it as "Legacy" in all documentation and web console.

For newer integrations, it's suitable to start with some format that does not have this adjective.

Some ideas of implementation already had been exposed here, but I think it's better to discuss this in a specific issue.

@varunmittal91
Copy link
Collaborator

Hi @stoiev, do you have ideas around on what we can do this to achieve this?

@oll-davidschneider
Copy link
Contributor

oll-davidschneider commented Mar 7, 2024

@stoiev in case it is helpful, I was able to create a CUR2.0 billing file using the new "Data Exports" functionality in AWS. This is exposed through the "bcm-data-exports" client in boto3. One thing I did find necessary was the tool expects a column "product_region" so I renamed "product_region_code" through a manual QueryStatement definition. I am not sure how the specification expects region data so hopefully that isn't an issue but it doesn't seem to be particular enough from what I see in the PDF.

So, this code block generates CUR2.0 billing files that I was able to pass to the tool without issue (apologies for any minor bugs caused by removing company-related things):

import boto3


account_id = 'XXXXXXXXXXXX'
# region must be us-east-1 for bcm-data-exports functionality
region_name = 'us-east-1'
# install CLI and use `aws configure`
profile_name = 'supply_your_own_creds'
session = boto3.session.Session(profile_name=profile_name, region_name=region_name)
# requires boto3 >= 1.29.7
client = session.client('bcm-data-exports')

sql_columns = ','.join[
	'bill_bill_type'
	,'bill_billing_entity'
	,'bill_billing_period_end_date'
	,'bill_billing_period_start_date'
	,'bill_invoice_id'
	,'bill_invoicing_entity'
	,'bill_payer_account_id'
	,'bill_payer_account_name'
	,'cost_category'
	,'discount'
	,'discount_bundled_discount'
	,'discount_total_discount'
	,'identity_line_item_id'
	,'identity_time_interval'
	,'line_item_availability_zone'
	,'line_item_blended_cost'
	,'line_item_blended_rate'
	,'line_item_currency_code'
	,'line_item_legal_entity'
	,'line_item_line_item_description'
	,'line_item_line_item_type'
	,'line_item_net_unblended_cost'
	,'line_item_net_unblended_rate'
	,'line_item_normalization_factor'
	,'line_item_normalized_usage_amount'
	,'line_item_operation'
	,'line_item_product_code'
	,'line_item_resource_id'
	,'line_item_tax_type'
	,'line_item_unblended_cost'
	,'line_item_unblended_rate'
	,'line_item_usage_account_id'
	,'line_item_usage_account_name'
	,'line_item_usage_amount'
	,'line_item_usage_end_date'
	,'line_item_usage_start_date'
	,'line_item_usage_type'
	,'pricing_currency'
	,'pricing_lease_contract_length'
	,'pricing_offering_class'
	,'pricing_public_on_demand_cost'
	,'pricing_public_on_demand_rate'
	,'pricing_purchase_option'
	,'pricing_rate_code'
	,'pricing_rate_id'
	,'pricing_term'
	,'pricing_unit'
	,'product'
	,'product_comment'
	,'product_fee_code'
	,'product_fee_description'
	,'product_from_location'
	,'product_from_location_type'
	,'product_from_region_code'
	,'product_instance_family'
	,'product_instance_type'
	,'product_instancesku'
	,'product_location'
	,'product_location_type'
	,'product_operation'
	,'product_pricing_unit'
	,'product_product_family'
	,'product_region_code AS product_region'
	,'product_servicecode'
	,'product_sku'
	,'product_to_location'
	,'product_to_location_type'
	,'product_to_region_code'
	,'product_usagetype'
	,'reservation_amortized_upfront_cost_for_usage'
	,'reservation_amortized_upfront_fee_for_billing_period'
	,'reservation_availability_zone'
	,'reservation_effective_cost'
	,'reservation_end_time'
	,'reservation_modification_status'
	,'reservation_net_amortized_upfront_cost_for_usage'
	,'reservation_net_amortized_upfront_fee_for_billing_period'
	,'reservation_net_effective_cost'
	,'reservation_net_recurring_fee_for_usage'
	,'reservation_net_unused_amortized_upfront_fee_for_billing_period'
	,'reservation_net_unused_recurring_fee'
	,'reservation_net_upfront_value'
	,'reservation_normalized_units_per_reservation'
	,'reservation_number_of_reservations'
	,'reservation_recurring_fee_for_usage'
	,'reservation_reservation_a_r_n'
	,'reservation_start_time'
	,'reservation_subscription_id'
	,'reservation_total_reserved_normalized_units'
	,'reservation_total_reserved_units'
	,'reservation_units_per_reservation'
	,'reservation_unused_amortized_upfront_fee_for_billing_period'
	,'reservation_unused_normalized_unit_quantity'
	,'reservation_unused_quantity'
	,'reservation_unused_recurring_fee'
	,'reservation_upfront_value'
	,'resource_tags'
	,'savings_plan_amortized_upfront_commitment_for_billing_period'
	,'savings_plan_end_time'
	,'savings_plan_instance_type_family'
	,'savings_plan_net_amortized_upfront_commitment_for_billing_period'
	,'savings_plan_net_recurring_commitment_for_billing_period'
	,'savings_plan_net_savings_plan_effective_cost'
	,'savings_plan_offering_type'
	,'savings_plan_payment_option'
	,'savings_plan_purchase_term'
	,'savings_plan_recurring_commitment_for_billing_period'
	,'savings_plan_region'
	,'savings_plan_savings_plan_a_r_n'
	,'savings_plan_savings_plan_effective_cost'
	,'savings_plan_savings_plan_rate'
	,'savings_plan_start_time'
	,'savings_plan_total_commitment_to_date'
	,'savings_plan_used_commitment'
]

data_export = {
        'DataQuery': {
            'QueryStatement': f"SELECT {sql_columns} FROM COST_AND_USAGE_REPORT",
            'TableConfigurations': {
                'COST_AND_USAGE_REPORT': {
                    'INCLUDE_RESOURCES': 'TRUE',
                    'TIME_GRANULARITY': 'HOURLY',
                    'INCLUDE_SPLIT_COST_ALLOCATION_DATA': 'FALSE',
                    'INCLUDE_MANUAL_DISCOUNT_COMPATIBILITY': 'FALSE'
                }
            }
        },
        'DestinationConfigurations': {
            'S3Destination': {
                'S3Bucket': f'awsbilling-{account_id}',
                'S3OutputConfigurations': {
                    'Compression': 'PARQUET',
                    'Format': 'PARQUET',
                    'OutputType': 'CUSTOM',
                    'Overwrite': 'CREATE_NEW_REPORT'
                },
                'S3Prefix': 'ResourceHourlyDetail',
                'S3Region': region_name
            }
        },
        'Name': 'CostUsageReport',
        'RefreshCadence': {
            'Frequency': 'SYNCHRONOUS'
        }
    }

client.create_export(Export=export_def)

I pass the parquet files created by the Data Export to the tool as follows:
focus-converter convert --provider aws --data-path CostUsageReport-00001.snappy.parquet --data-format parquet --parquet-data-format file --export-path ./

@stoiev
Copy link
Collaborator Author

stoiev commented Mar 11, 2024

Thanks @davidschneider2W ! I've also noticed that product_region is the only collumn that differs between CUR1 and CUR2 that breaks current implementation of focus_converter.

Instead of changing source, I've changed focus_converter related configuration file and got same results (just for concept proofing). But thinking about a workaround, your approach is much better.

The migration steps recommended by AWS is here. SQL's approuch is the first one, and maybe it could be the recommended solution for now in focus_converter (by documenting it). In AWS doc, product_region renaming is not mentioned, and I submitted a feedback on this documentation to get more accurate info.

However, we will face issues regarding data that are represented as scalars (one column per value) in CUR1 and became maps (one column with all values) in CUR2, as metioned in AWS docs:

You also need to update your data ingestion pipeline and your business intelligence tools to process the following new columns with nested key-values: product, resource_tags, cost_category, and discounts.

At least resource_tags and cost_category - two ways of tagging resources in AWS - will be a need for focus_converter users. I will try to map all the differences between those two formats and feed this issue with more concrete results.

But as far as I know, differences between CUR 1 and 2 seems to be minimal. Maybe it worth to invest in a generic conversion configuration that works for both formats.

PS: I'm saying that withouth know how extensible and generic conversion configurations could be.

@varunmittal91
Copy link
Collaborator

Hi @stoiev and @davidschneider2W this is awesome. Could anyone of you please create a pull request with the proposed steps to convert data.

@stoiev
Copy link
Collaborator Author

stoiev commented Mar 19, 2024

I've advanced in the idea of conversion config's compatibility for CUR 1 and 2 versions.

#333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants