Skip to content

AlexanderHugestrand/nordic-epub3-dtbook-migrator

 
 

Repository files navigation

Nordic EPUB3/DTBook Migrator

Build Status

The main goal of this project is to provide a EPUB3 to DTBook conversion tool for the libraries in the Nordic countries providing accessible litterature to visually impaired readers (NLB, MTM, SPSM, Celia, Nota and SBS). The conversion will be implemented in XProc and XSLT and provided as a DAISY Pipeline 2 script. This conversion will allow the organizations to continue to use their respective DTBook-based tools for production of Braille and Synthetic Speech, as long as those are necessary.

This tool attempts to map EPUB3 to DTBook with as little loss as possible (a 1:1 mapping).

While the EPUB3 will consist of multiple HTML files internally, an intermediate single-page HTML representation is useful for converting to and from DTBook.

Scripts

This project provides the following Pipeline 2 scripts:

  • EPUB3 to DTBook
  • EPUB3 to HTML
  • EPUB3 Validator
  • EPUB3 ASCIIMath to MathML
  • HTML to EPUB3
  • HTML to DTBook
  • HTML Validator
  • DTBook to EPUB3
  • DTBook to HTML
  • DTBook Validator

The EPUB3 to DTBook script will be used to allow new EPUB3 files to be used with legacy DTBook-based systems.

The DTBook to EPUB3 script allows legacy DTBooks to be upgraded to new EPUB3-based production systems.

Scripts for converting to and from the intermediary single-HTML representation of the publications are also provided. These are useful either for debugging, or if a single-document HTML representation is needed as input to or output from a HTML-based production system.

Validators for EPUB3, DTBook and single-document HTML files are provided. The EPUB3 validator allows us to check that new EPUB3 files are valid according to the nordic markup guidelines. The DTBook and HTML validators can be useful for DTBook- or HTML-based production systems.

In the nordic markup guidelines, math is marked up using ASCIIMath. An experimental script for converting this ASCIIMath to MathML is provided.

The grammar used in the EPUB3, HTML and DTBook files is a strict subset of EPUB3, HTML and DTBook, and is defined in the Nordic markup guidelines. Most DTBooks will work with these scripts, there are few limitations to the input DTBook grammar. There are more limitations to the HTML/EPUB3 grammar however, because there must be a way to convert it to DTBook. Most notably, multimedia such as audio and video are currently not allowed in these EPUB3s.

Building

The nordic migrator can be built with Maven, either directly (with for instance mvn clean package), or indirectly with Docker (with for instance docker build .).

Releasing

A brief overview.

As a maven artifact:

As a docker image:

  • commits tagged with a version are built and tagged automatically
  • the tip of the master branch is built as :latest

References

See the project homepage for more information.

About

Tools for converting between a strict subset of DTBook and EPUB3.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 46.6%
  • XProc 30.1%
  • XSLT 21.2%
  • CSS 1.6%
  • Java 0.2%
  • Dockerfile 0.2%
  • Shell 0.1%