Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remaining non-wrapped functions #767

Open
timsaucer opened this issue Jul 22, 2024 · 3 comments
Open

Add remaining non-wrapped functions #767

timsaucer opened this issue Jul 22, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@timsaucer
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We still have a few classes that do not yet have wrapper functions. Namely datafusion.object_store and datafusion.common. Additionally in datafusion.substrait we reference LogicalPlan that is not exposed.

Also worth doing is reviewing the excellent PR #751 to see how it now fits in with the updated python wrappers.

Describe the solution you'd like
Add missing wrappers and validate namespace corrections

Describe alternatives you've considered
None

Additional context
This is follow on work to #750

@timsaucer timsaucer added the enhancement New feature or request label Jul 22, 2024
@Michael-J-Ward
Copy link
Contributor

Question: Have you ever used or do you know of a tool to run queries over python / rust codebases?

It would be nice if we could generate a concrete report of what is not exposed.

@timsaucer
Copy link
Contributor Author

timsaucer commented Jul 26, 2024

No, but I did write a small script to check and this is what I see missing:

Missing attribute. Object name: datafusion, Attribute name: Catalog
Missing attribute. Object name: datafusion, Attribute name: Database
Missing attribute. Object name: datafusion, Attribute name: ExecutionPlan
Missing attribute. Object name: datafusion, Attribute name: LogicalPlan
Missing attribute. Object name: datafusion, Attribute name: RecordBatch
Missing attribute. Object name: datafusion, Attribute name: RecordBatchStream
Missing attribute. Object name: datafusion, Attribute name: Table
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: runtime
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: Catalog
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: Database
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: Table
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: AggregateUDF
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: LogicalPlan
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: ExecutionPlan
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: RecordBatch
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: RecordBatchStream
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: common
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: expr
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: functions
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: object_store
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: substrait
Missing attribute. Object name: datafusion.common, Attribute name: DFSchema
Missing attribute. Object name: datafusion.common, Attribute name: DataType
Missing attribute. Object name: datafusion.common, Attribute name: DataTypeMap
Missing attribute. Object name: datafusion.common, Attribute name: NullTreatment
Missing attribute. Object name: datafusion.common, Attribute name: PythonType
Missing attribute. Object name: datafusion.common, Attribute name: RexType
Missing attribute. Object name: datafusion.common, Attribute name: SqlFunction
Missing attribute. Object name: datafusion.common, Attribute name: SqlSchema
Missing attribute. Object name: datafusion.common, Attribute name: SqlStatistics
Missing attribute. Object name: datafusion.common, Attribute name: SqlTable
Missing attribute. Object name: datafusion.common, Attribute name: SqlType
Missing attribute. Object name: datafusion.common, Attribute name: SqlView
Missing attribute. Object name: datafusion.common, Attribute name: __all__
Missing attribute. Object name: datafusion.expr, Attribute name: EmptyRelation
Missing attribute. Object name: Expr, Attribute name: __radd__
Missing attribute. Object name: Expr, Attribute name: __rand__
Missing attribute. Object name: Expr, Attribute name: __rmod__
Missing attribute. Object name: Expr, Attribute name: __rmul__
Missing attribute. Object name: Expr, Attribute name: __ror__
Missing attribute. Object name: Expr, Attribute name: __rsub__
Missing attribute. Object name: Expr, Attribute name: __rtruediv__
Missing attribute. Object name: datafusion.expr, Attribute name: IsNull
Missing attribute. Object name: datafusion.expr, Attribute name: Unnest
Missing attribute. Object name: datafusion.expr, Attribute name: Window
Missing attribute. Object name: datafusion.expr, Attribute name: __all__
Missing attribute. Object name: datafusion.functions, Attribute name: __all__
Missing attribute. Object name: datafusion.object_store, Attribute name: AmazonS3
Missing attribute. Object name: datafusion.object_store, Attribute name: GoogleCloud
Missing attribute. Object name: datafusion.object_store, Attribute name: LocalFileSystem
Missing attribute. Object name: datafusion.object_store, Attribute name: MicrosoftAzure
Missing attribute. Object name: datafusion.object_store, Attribute name: __all__
Missing attribute. Object name: datafusion, Attribute name: runtime
Missing attribute. Object name: datafusion.substrait, Attribute name: __all__

Code to generate:

import datafusion
import datafusion.functions
import datafusion.object_store
import datafusion.substrait

def missing_exports(internal_obj, wrapped_obj):
    for attr in dir(internal_obj):
        if attr not in dir(wrapped_obj):
            print(f"Missing attribute. Object name: {wrapped_obj.__name__}, Attribute name: {attr}")
            continue
        internal_attr = getattr(internal_obj, attr)
        wrapped_attr = getattr(wrapped_obj, attr)
        if internal_attr is not None and wrapped_attr is None:
            print(f"Attribute exists but is None. Object name: {wrapped_obj.__name__}, Attribute name: {attr}")
        
        if attr in ["__self__", "__class__"]:
            continue
        if isinstance(internal_attr, list):
            for val in internal_attr:
                if val not in wrapped_attr:
                    print(f"Missing value in list. Object name: {wrapped_obj.__name__}, Attribute name: {attr}, Value: {val}")
        elif hasattr(internal_attr, '__dict__'):
            missing_exports(internal_attr, wrapped_attr)

missing_exports(datafusion._internal, datafusion)

I can work on adding these tomorrow morning and I can also add this code as a unit test.

@timsaucer
Copy link
Contributor Author

FWIW I don't know if all of these need to be exported. It's probably worth looking through each one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants