-
Notifications
You must be signed in to change notification settings - Fork 3
Loaders and Exporters
The loaders and exporters in bconv
are classes with a shared interface.
They are exposed in the top-level dictionaries bconv.LOADERS
, bconv.FETCHERS
and bconv.EXPORTERS
.
For example, the following call
>>> doc = bconv.load(source, fmt=X, **options)
is equivalent to:
>>> loader = bconv.LOADERS[X](**options)
>>> doc = loader.load_one(source)
For many use cases, the top-level functions bconv.load[s]
, bconv.fetch
and bconv.dump[s]
are more convenient.
In some contexts, a reusable loader or exporter can be more succinct, as it encapsulates the options:
loader = bconv.LOADERS['bioc_xml'](byte_offsets=False)
for path in Path('train').glob('*.xml'):
for doc in loader.iter_documents(path):
# process the training documents
for path in Path('test').glob('*.xml'):
coll = loader.collection(path, id=path.stem)
# process the test collections
All concrete loader and fetcher types extend one of the three base classes DocLoader
, CollLoader
and DocIterator
.
As a common interface, all loaders have a load_one
method:
DocLoader.load_one(source: str|Path|IO, id: int|str) -> bconv.Document
Load one file into a
Document
object.
CollLoader.load_one(source: str|Path|IO, id: int|str) -> bconv.Collection
DocIterator.load_one(source: str|Path|IO, id: int|str) -> bconv.Collection
Load one file into a
Collection
object.
Additionally, the following methods are available:
DocLoader.document(source: str|Path|IO, id: int|str) -> bconv.Document
Load one file into a
Document
object.
CollLoader.collection(source: str|Path|IO, id: int|str) -> bconv.Collection
Load one file into a
Collection
object.
CollLoader.iter_documents(source: str|Path|IO, id: int|str) -> Iterator[bconv.Document]
DocIterator.iter_documents(source: str|Path|IO, id: int|str) -> Iterator[bconv.Document]
Iterate over
Document
objects. Where possible, loading is performed lazily (see the lazy loading format property).
Like for the top-level functions, the parameter source
can be a path (i.e. a str
or a path-like object) or a readable file-like object.
If it is an open file, its type (text or binary) must match the expectation of the respective format (cf. the stream type format property).
The mode
parameter of the top-level functions is not part of the loader interface.
Rather, it is a proxy for the choice of the loader method (e.g. mode="native"
corresponds to the load_one
method).
All concrete exporter types extend Formatter
.
There are three public methods:
Formatter.export(content: bconv.Collection|bconv.Document, dest: str|Path = '.')
Write the
Document
orCollection
objectcontent
to disk. The destinationdest
is specified as a path (typestr
or path-like). Ifdest
points to an existing directory, a file name is constructed based oncontent.id
or (if it isNone
/empty)content.filename
. Otherwisedest
is interpreted as the complete path.
Formatter.write(content: bconv.Collection|bconv.Document, stream: IO)
Formatter.dump(content: bconv.Collection|bconv.Document, stream: IO) # alias
Write the
Document
orCollection
objectcontent
to a writable file-like objectstream
. The stream type (text or binary) must match the expectation of the format (cf. the stream type format property).
Formatter.dumps(content: bconv.Collection|bconv.Document) -> str|bytes
Serialise the
Document
orCollection
objectcontent
to astr
orbytes
object. The type of the return value depends on the format (cf. the stream type format property).
All formatters accept both Document
and Collection
objects.
However, not all formats are equally well suited to represent both levels.
For example, when serialising a collection to txt
plain-text with brat
stand-off annotations, the document boundaries are lost.