Skip to content

Loaders and Exporters

Lenz edited this page Feb 7, 2021 · 8 revisions

Loader and Exporter Classes

The loaders and exporters in bconv are classes with a shared interface. They are exposed in the top-level dictionaries bconv.LOADERS, bconv.FETCHERS and bconv.EXPORTERS.

For example, the following call

>>> doc = bconv.load(source, fmt=X, **options)

is equivalent to:

>>> loader = bconv.LOADERS[X](**options)
>>> doc = loader.load_one(source)

For many use cases, the top-level functions bconv.load[s], bconv.fetch and bconv.dump[s] are more convenient. In some contexts, a reusable loader or exporter can be more succinct, as it encapsulates the options:

loader = bconv.LOADERS['bioc_xml'](byte_offsets=False)
for path in Path('train').glob('*.xml'):
    for doc in loader.iter_documents(path):
        # process the training documents
for path in Path('test').glob('*.xml'):
    coll = loader.collection(path, id=path.stem)
    # process the test collections

Loaders/Fetchers

All concrete loader and fetcher types extend one of the three base classes DocLoader, CollLoader and DocIterator. As a common interface, all loaders have a load_one method:

DocLoader.load_one(source: str|Path|IO, id: int|str) -> bconv.Document

Load one file into a Document object.

CollLoader.load_one(source: str|Path|IO, id: int|str) -> bconv.Collection
DocIterator.load_one(source: str|Path|IO, id: int|str) -> bconv.Collection

Load one file into a Collection object.

Additionally, the following methods are available:

DocLoader.document(source: str|Path|IO, id: int|str) -> bconv.Document

Load one file into a Document object.

CollLoader.collection(source: str|Path|IO, id: int|str) -> bconv.Collection

Load one file into a Collection object.

CollLoader.iter_documents(source: str|Path|IO, id: int|str) -> Iterator[bconv.Document]
DocIterator.iter_documents(source: str|Path|IO, id: int|str) -> Iterator[bconv.Document]

Iterate over Document objects. Where possible, loading is performed lazily (see the lazy loading format property).

Like for the top-level functions, the parameter source can be a path (i.e. a str or a path-like object) or a readable file-like object. If it is an open file, its type (text or binary) must match the expectation of the respective format (cf. the stream type format property).

The mode parameter of the top-level functions is not part of the loader interface. Rather, it is a proxy for the choice of the loader method (e.g. mode="native" corresponds to the load_one method).

Exporters

All concrete exporter types extend Formatter. There are three public methods:

Formatter.export(content: bconv.Collection|bconv.Document, dest: str|Path = '.')

Write the Document or Collection object content to disk. The destination dest is specified as a path (type str or path-like). If dest points to an existing directory, a file name is constructed based on content.id or (if it is None/empty) content.filename. Otherwise dest is interpreted as the complete path.

Formatter.write(content: bconv.Collection|bconv.Document, stream: IO)
Formatter.dump(content: bconv.Collection|bconv.Document, stream: IO)  # alias

Write the Document or Collection object content to a writable file-like object stream. The stream type (text or binary) must match the expectation of the format (cf. the stream type format property).

Formatter.dumps(content: bconv.Collection|bconv.Document) -> str|bytes

Serialise the Document or Collection object content to a str or bytes object. The type of the return value depends on the format (cf. the stream type format property).

All formatters accept both Document and Collection objects. However, not all formats are equally well suited to represent both levels. For example, when serialising a collection to txt plain-text with brat stand-off annotations, the document boundaries are lost.