Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal error: The "regex_replace" function can only accept strings #7345

Closed
JayjeetAtGithub opened this issue Aug 21, 2023 · 3 comments · Fixed by #7840
Closed

Internal error: The "regex_replace" function can only accept strings #7345

JayjeetAtGithub opened this issue Aug 21, 2023 · 3 comments · Fixed by #7840
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@JayjeetAtGithub
Copy link
Contributor

JayjeetAtGithub commented Aug 21, 2023

Describe the bug

On running the query below on the Clickbench multi file dataset,

SELECT REGEXP_REPLACE("Referer", '^https?://(?:www.)?([^/]+)/.*$', '1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

we get this error,

Internal error: The "regex_replace" function can only accept strings.. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

To Reproduce

Download the data using,

 ./benchmarks/bench.sh data clickbench_partitioned

A hits_multi directory with the parquet files will be created.

Execute the above queries,

datafusion-cli -c "CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 'hits_multi';" "{query}"

Expected behavior

The queries should run successfully without erroring.

Additional context

Datafusion 29.0.0

@JayjeetAtGithub JayjeetAtGithub added the bug Something isn't working label Aug 21, 2023
@alamb
Copy link
Contributor

alamb commented Aug 21, 2023

This looks similar to #7039 which @jonahgao fixed by adding a coercion from binary --> UTF8 for comparison. I think we could do something similar here.

@alamb alamb added the good first issue Good for newcomers label Aug 21, 2023
@alamb
Copy link
Contributor

alamb commented Aug 21, 2023

Marking as a good first issue as there is a reproducer and I think the fix should be relatively straightforward

@Weijun-H
Copy link
Member

I am glad to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants