Skip to content

USFM Parsing and Translation

John Lambert edited this page Apr 9, 2024 · 6 revisions

Serval seeks to mirror how Paratext parses and interprets USFM files, both by providing support for USFM as per the documentation as well as seeking to be accommodating to some non-standard formats.

USFM Automatically Assigned Unique Identifiers

  • Unique Identifiers are generated to reference a specific text segment in a scripture text and act as a primary anchor point.
  • The reference is serialized in the following format: [verse reference]/[path element 1]/[path element 2]/....
  • Verse references follow the standard USFM identification and naming
  • Non-verse paths are identified by [localized instance #]:[USFM tag]
    • For example, the reference for the section header that occurs directly after MAT 1:1 would be represented as MAT 1:1/1:s.
  • Positions are 1-based (the position 0 is used when a position is not specified or unknown).
  • Some non-verse text segments can be nested in another element.
    • For example, a table cell might be represented as MAT 1:1/1:tr/1:tc1.
  • Introductory material that occurs at the beginning of a book before the first verse is referenced by the 1:0 verse reference.

Translation of Scripture Texts

When projects are read in, they are put in original versification. The source and target verse ranges are then merged with the other. All text from the verse ranges (or ranges of segments) are put on the first verse or segment range.

Translation of Non-Scripture Texts

Non-scripture text within the USFM structure is also translated. This includes:

  • Section Headers (any USFM paragraph type)
  • Footnotes, endnotes, etc.
  • Tables (USFM cells)
  • Note that tables and paragraphs will be stripped out when inside of a verse (segment) or a footnote. Paragraph and table formatting will otherwise be preserved.