Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(hubspot): provide default lists of properties #311

Merged
merged 7 commits into from
Dec 28, 2023

Conversation

IlyaFaer
Copy link
Contributor

Fixes #305

The error 414 URI too long happens, because we request the list of all the possible properties for an object here:

def _get_property_names(api_key: str, object_type: str) -> List[str]:
"""
Retrieve property names for a given entity from the HubSpot API.
Args:
entity: The entity name for which to retrieve property names.
Returns:
A list of property names.
Raises:
Exception: If an error occurs during the API request.
"""
properties = []
endpoint = f"/crm/v3/properties/{OBJECT_TYPE_PLURAL[object_type]}"
for page in fetch_data(endpoint, api_key):
properties.extend([prop["name"] for prop in page])
return properties

In some cases, object has so many properties, that their list joined by , becomes longer than 8000 of symbols, which causes 414.

Proposed solution:
Instead of reading all the possible properties (most of which are very low-level and unlikely needed, like number of the broker, which processed the record), we create a default list of properties for every endpoint. The default lists are copied from the Hubspot search (see https://developers.hubspot.com/docs/api/crm/search#crm-objects), so for Hubspot users it should be an expected behavior.

However, if user wants, they can initiate a resource with their own list of properties to read, e.g.:

load_info = pipeline.run(
            contacts(api_key="fake_key", include_history=True, props=["prop1", "prop2"])
        )

In case the user still wants to try to read all the possible properties, they can send props=None - in this case, we'll work the old way. The length of the properties list depends on many factors, so it can still work fine in many cases.

sources/hubspot/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is good. There are a few issues we'll need to fix (some of them legacy)

  • the source binds the parameters to the resources companies(include_history=include_history) this prevents user from binding those params later
    what to do: please how facebook_ads uses inner resource functions and returns a list of unbound resources in the source:
    return campaigns, ads, ad_sets, ad_creatives, ads | leads
    we need to do the same so users can bind parameters etc themselves
  • it would be amazing if you could add another global flag like include_history: this flag would automatically add all user-defined properties to each call
  • we will need to improve our docs and probably add an example how to change the default properties (to the demo pipeline and tests)

@IlyaFaer
Copy link
Contributor Author

@rudolfix, okay, I think I got it. Pushed the code.

There is one hubspot test failing - it's not related, I saw these failures before started the task. After I update the docs, I can take a look at it, if needed. It seems like a test data cleanup error, not functionality.

@IlyaFaer IlyaFaer marked this pull request as ready for review December 25, 2023 08:49
sources/hubspot/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and stuff is in place! I have one more request? can we automatically add all the user-defined properties to all endpoints by default?

we could use a global flag to disable it. my take is that our users will love it.

here's how they are added for contacts: https://legacydocs.hubspot.com/docs/methods/contacts/v2/create_contacts_property

they will be listed in _get_property_names I think there's a flag hubspotDefined (but I'm just guessing here)

sources/hubspot/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test is failing because something got added to the test account. you'll need to update the asserts in the test.

@IlyaFaer
Copy link
Contributor Author

@rudolfix, there is no easy way in Hubspot to read only custom properties, as in new versions of the API they dropped most of the properties filtering functionality. However, in the newest version, all the Hubspot driven properties have prefix hs_ (while for custom properties this prefix is restricted), so we can read all the properties and filter them before sending a request.

Also, I added constants, because all this properties stuff becomes too complex.

Copy link
Contributor

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's iterate once more :)

tests/hubspot/test_hubspot_source.py Outdated Show resolved Hide resolved
sources/hubspot/__init__.py Show resolved Hide resolved
sources/hubspot/__init__.py Outdated Show resolved Hide resolved
rudolfix
rudolfix previously approved these changes Dec 27, 2023
Copy link
Contributor

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IlyaFaer this is really good now!
@AstrakhantsevaAA and @burnash please take a look. we do not try to fetch all possible fields from CRM - just certain defaults and all custom properties

we can merge that at any moment,

Copy link
Collaborator

@AstrakhantsevaAA AstrakhantsevaAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! I have only one question (see companies resource).
Also I would replace all these function with dlt.resource() to create them dynamically, but we can do it later or keep it like this at all.

sources/hubspot/__init__.py Outdated Show resolved Hide resolved
@AstrakhantsevaAA
Copy link
Collaborator

And don't forget to update the documentation here: dlt-hub/dlt#857

@IlyaFaer
Copy link
Contributor Author

I think it's better to merge it already. It fixes an error - errors should not stay in code for long time, especially considering it's a widespread error and it can't be walked around.

Other changes can be done later.

@AstrakhantsevaAA AstrakhantsevaAA merged commit ffa3ed0 into master Dec 28, 2023
14 checks passed
@AstrakhantsevaAA AstrakhantsevaAA deleted the hubspot_414 branch December 28, 2023 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in HubSpot Pipeline Execution: HTTPError 414 Client Error: URI Too Long
3 participants