ODD file for generating the TAN-TEI schema for the Text Alignment Network format

ODD file for generating the TAN-TEI schema for the Text Alignment Network format Joel Kalvesmaki Joel Kalvesmaki Revised 2020-05-24

Insofar as this ODD file constitutes an original work (see below) all material is released under a GNU General Public License, https://opensource.org/licenses/GPL-3.0

The notice in the next paragraph is preserved from the template upon which this ODD file is based.

TEI material can be licensed differently depending on the use you intend to make of it. Hence it is made available under both the CC+BY and BSD-2 licences. The CC+BY licence is generally appropriate for usages which treat TEI content as data or documentation. The BSD-2 licence is generally appropriate for usage of TEI content in a software environment. For further information or clarification, please contact the TEI Consortium (info@tei-c.org).

TEI All, adapted for the Text Alignment Network

This ODD describes parameters needed to turn any document that validates against TEI All (P5) into a form that can be used with the Text Alignment Network (TAN). The next two paragraphs are preserved from the template upon which this ODD file is based.

This TEI customization describes a schema that includes all of the TEI (P5) modules. This is a very useful starting place for manually creating your own customizations — it is much easier to delete the modules you do not want than to add the modules you do. Furthermore this customization often proves helpful for testing TEI software.

However, this particular TEI customization is not recommended for actual use for encoding documents. It produces schemas and reference documentation that will be much larger, and include many more elements, than almost anyone could conceivably ever need. Tempting though it may be simply to have absolutely everything, and just ignore elements not required, experience has shown that their presence makes the documentation harder to read and use, and makes a schema that is far more lax than desired.

Two headers are required, teiHeader and TAN's head, because the two heads are very different animals. The teiHeader is quite expansive, allowing you to spend time talking about all sorts of things that may be only very loosely related to the body, and it is generally designed to be written and read by humans. The TAN header on the other hand is restricted to metadata directly related to the data itself, and is designed on every level to be RDF-ready. See the TAN guidelines for more. A tag URN is required in the root element, uniquely identifying the document. Revisions need not be renamed, since ISO-compliant dates will be checked to determine the version of the document. tag:([\-a-zA-Z0-9._%+]+@)?[\-a-zA-Z0-9.]+\.[A-Za-z]{2,4},\d{4}(-(0\d|1[0-2]))?(-([0-2]\d|3[01]))?:[\-a-zA-Z0-9._~:%@/?!$&'*+,;=]+ A TAN version number is required 2020 The body contains either <div>s or empty elements (like <milestone>) Every <div> either contains purely <div>s / empty elements (like <milestone>) or it does not (leaf <div>s). This element is redefined to be much more like the HTML div, which may be any unit whatsover, even those that are inline stretches of text. It does the job of <ab>, <p>, <l>, and all other text segmentation elements, leaving to @type the job of defining exactly what kind(s) of division is (are) intended. Reference to agent or agents who have edited (added or modified) an element or its content .+ Reference to a date or time when an element or its content was edited (added or modified) This attribute signals that the parent element is to be replaced by all elements of the same name found in the file referred to by the corresponding inclusion. @type may take multiple values, space delimited. Each value is an idref or a name, pointing to a vocabulary item that provides the IRIs, names, and descriptions of the textual division. .+ names a <div> or <group>, or refers to a <div>'s @n. ¶ @n may consist of one or more values, space delimited, which are to be treated as synonyms. ¶ Each synonymous value of @n may be simple or complex. A simple value of @n is a set of word characters (or the underbar). A complex value of @n consists of word characters (or the underbar) separated by commas and hyphens. A complex value of @n refers to a range of references. The sequence of items in a complex value are significant. For example n="6, 8" signifies that the text straddles reference 6 then 8. But n="8, 6" signifies the converse. In the context of a <div>, the implication is that in neither case can the text be securely disentangled so as to create one <div> for 6 and another for 8. ¶ The hyphen-minus, - (U+002D, the most common form of hyphen), is reserved to specify a range. This feature is useful for cases where a <div> straddles more than one standard reference number (e.g., a translation of Aristotle that cannot be easily tied to Bekker numbers). ¶ If you need to use a hyphen-like character in an @n that does not specify a range, consider ‐ (U+2010 HYPHEN), ‑ (U+2011 NON-BREAKING HYPHEN), ‒ (U+2012 FIGURE DASH), – (U+2013 EN DASH), or − (U+2212 MINUS SIGN). ¶ The comma is reserved to specify a sequence of references. ¶ The space is reserved to separate synonymous values, or to pad commas and hyphens. If you wish to use a value of @n that would normally use word spaces, use the underbar, _, instead. ¶ @n does not permit other non-word spaces reserved by @ref, i.e., the period/full stop or colon, which delimit a hierarchy of @n's. ¶ Because @n is used to construct @ref, it is indirectly cumulatively inheritable. See main.xml#inheritable_attributes. ¶ Extra TAN vocabulary is available for @n, to provide built-in aliases. For more on this feature see main.xml#extra_n_vocabulary. For specific extra vocabulary see main.xml#vocabularies-n-bible-eng main.xml#vocabularies-n-bible-spa main.xml#vocabularies-n-quran-eng-ara main.xml#vocabularies-n-unlabeled-divs-1-eng [\w\._]+([\- ,]+[\w\._]+)*