-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[new]: get_file(destination_table, duckdb_sql)
#165
Labels
new-bigfunction
Suggest a New BigFunction
Comments
In a 2nd phase, CREATE SECRET secret1 (
TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/xxxxxx/bPxRfiCYEXAMPLEKEY',
REGION 'us-east-1'
); before executing user's query SELECT *
FROM 's3://my-bucket/file.parquet'; |
Other option : configure some file types that we can query through DuckDB select bigfunction.eu.load_file(
'your_project.your_dataset.random_sales', -- table_name
'csv', -- src_type
"https://xxxxx", -- url
'{}' -- optional_args
); solution : build SQL query using ibis dataframe API import ibis
con = ibis.duckdb.connect() some simple csvselect bigfunction.eu.load_file(
'your_project.your_dataset.random_sales',
'csv',
"https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv",
'{}'
); will be interpreted as achats = con.read_csv(
"https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv"
)
print(achats)
print(achats.to_pandas()) crappy csv : codes_postauxselect bigfunction.eu.load_file(
'your_project.your_dataset.dim_french_postalcodes',
'csv',
"https://www.data.gouv.fr/fr/datasets/r/2f75293b-3ee5-4cb5-971b-93e754dc96ea",
'''{
"columns": {
"code_commune_insee": "VARCHAR",
"nom_commune_insee": "VARCHAR",
"code_postal": "VARCHAR",
"lb_acheminement": "VARCHAR",
"ligne_5": "VARCHAR",
},
"delim": ";",
"skip": 1,
}'''
); will be interpreted as codes_postaux = con.read_csv(
"https://www.data.gouv.fr/fr/datasets/r/2f75293b-3ee5-4cb5-971b-93e754dc96ea",
columns={
"code_commune_insee": "VARCHAR",
"nom_commune_insee": "VARCHAR",
"code_postal": "VARCHAR",
"lb_acheminement": "VARCHAR",
"ligne_5": "VARCHAR",
},
delim=";",
skip=1,
)
print(codes_postaux)
print(codes_postaux.to_pandas()) some jsonselect bigfunction.eu.load_file(
'your_project.your_dataset.dim_french_departements',
'json',
"https://geo.api.gouv.fr/departements?fields=nom,code,codeRegion,region",
'{}'
); dep = con.read_json(
"https://geo.api.gouv.fr/departements?fields=nom,code,codeRegion,region"
)
print(dep)
print(dep.to_pandas()) |
implemented in a new commit in PR #166 :) |
unytics
pushed a commit
that referenced
this issue
Oct 4, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Check the idea has not already been suggested
Edit the title above with self-explanatory function name and argument names
BigFunction Description as it would appear in the documentation
With this function, we will use duckdb web reading capability to download data out of :
& let BigFunction's load it into a bigQuery
schema.table
of your choiceFunction inspired from recent bigFunctions.get_csv
Examples of (arguments, expected output) as they would appear in the documentation
example :
The text was updated successfully, but these errors were encountered: