TAN Applications

TAN ApplicationsStandard TAN applications are designed to take TAN or TEI files and create output that allows users to study particular aspects of the text through interaction, statistics, and visualization. These are advanced, complex programs, and not all the intended features may have been implemented. Because of their power, these applications have numerous parameters for configuration. You are encouraged to read closely the documentation in the application to determine how to make the application work for your particular goals.Each section below is generated automatically from the master file that drives the process. Any global parameters that are referred to in the discussion are explained in the file itself.

Diff+Location: applications/Diff+/Diff+.xsl Version 2021-09-06 Take any number of versions of a text, compare them, and view and study all the text differences in an HTML page. The HTML output allows you to see precisely where one version differs from the other. A small Javascript library allows you to change focus, remove versions, and explore statistics that show quantitatively how close the versions are to each other. Parameters allow you to make normalizations before making the comparison, and to weigh statistics accordingly. This application has been used not only for individual comparisons, but for more demanding needs: to analyze changes in documents passing through a multistep editorial workflow, to compare the quality of OCR results, and to study the relationship between ancient/medieval manuscripts (stemmatology). Examples of output:https://textalign.net/output/CFR-2017-title1-vol1-compared.xml XML master output file, comparing four years of the United States Code of Federal Regulations, vol. 1https://textalign.net/output/CFR-2017-title1-vol1-compared.html HTML comparison of four years of the United States Code of Federal Regulations, vol. 1https://textalign.net/output/diff-grc-2021-02-08-five-versions.html Comparison of results from four OCR processes against a benchmark, classical Greekhttps://textalign.net/clio/darwin-3diff.html Comparison of three editions of Darwin's works, samplehttps://textalign.net/clio/hom-01-coll-ignore-uv.html Comparison of five versions of Griffolini's translation of John Chrysostom's Homily 1 on the Gospel of John This master stylesheet is the public interface for the application. The parameters you will most likely want to change are listed and documented below, to help you customize the application to suit your needs. If you are relatively new to XSLT, or TAN applications, see Using TAN Applications and Utilities in the TAN Guidelines for general instructions. If you want to avoid changing the master application file, use the accompanying configuration file. Or make a copy of this file and edit and run it directly. Or create and configure a transformation scenario in Oxygen, defining the relevant parameters as you like. If you are comfortable with XSLT, try creating your own stylesheet, then import this one, and customize the process. To access the code base, follow the link in the <xsl:include> at the bottom of this file. Description This is a MIRU Stylesheet (MIRU = Main Input Resolved URIs) Primary input: any XML file, including this one (input is ignored) Secondary input: one or more files Primary output: perhaps diagnostics Secondary output: for each detectable language in the secondary input: (1) an XML file with the results of tan:diff() or tan:collate(), infused with select statistical analyses; (2) a rendering of #1 in an interactive, visually engaging HTML form Nota bene: This application is useful only if the input files have different versions of the same text in the same language. The XML output is a straightforward result of tan:diff() or tan:collate(), perhaps wrapped by an element that also includes prepended statistical analysis. The HTML output has been designed to work with specific JavaScript and CSS files, and the HTML output will not render correctly unless you have set up dependencies correctly. Currently, the HTML output is directed to the TAN output subdirectory, with the HTML pointing to the appropriate javascript and CSS files in the js and css directories. Warning: certain features have yet to be implementedRevise process that reinfuses a class 1 file with a diff/collate into a standard extra TAN function.Add parameter to allow serialization of input XML, for closer comparison of XML structures. This application currently just scratches the surface of what is possible. New features are planned! Some desiderata:Support a single TAN-A as the catalyst or MIRU provider, allowing <alias> to define the groups.Support MIRUs that point to non-TAN files, e.g., plain text, docx, xml.Allow one to decide whether Venn diagrams should adjust the common area or not.Enhance options on statistics.

ParabolaLocation: applications/Parabola/Parabola.xsl Version 2021-07-20 This application allows you to take a library of TAN/TEI files with multiple versions of each work and present them in an interactive HTML page. This master stylesheet is the public interface for the application. The parameters you will most likely want to change are listed and documented below, to help you customize the application to suit your needs. If you are relatively new to XSLT, or TAN applications, see Using TAN Applications and Utilities in the TAN Guidelines for general instructions. If you want to avoid changing the master application file, use the accompanying configuration file. Or make a copy of this file and edit and run it directly. Or create and configure a transformation scenario in Oxygen, defining the relevant parameters as you like. If you are comfortable with XSLT, try creating your own stylesheet, then import this one, and customize the process. To access the code base, follow the link in the <xsl:include> at the bottom of this file. Examples of output: http://textalign.net/output/aristotle-categories-ref-bekker-page-col-line.html Aristotle, Categories, in eight versions, six languageshttps://textalign.net/output/cpg%204425.TAN-A-div-2018-03-09.html Homilies on the Gospel of John, John Chrysostom, four versions, two languageshttps://evagriusponticus.net/cpg2430/cpg2430-full-for-reading.html The Praktikos by Evagrius of Pontus, three languages, with Bible quotationshttps://textalign.net/quran/quran.ara+grc+syr+lat+deu+eng.html Qur'an in eighteen versions, six languages Description Primary input: a TAN-A file Secondary input: its sources expanded Primary output: an interactive HTML page with the versions of the chosen work grouped and arranged in parallel, with annotations Secondary output: none This flagship TAN application was the catalyst for TAN itself. It was developed not only for highly polished, finalized web publication, but to support complex editorial processes. The impetus was a project of five scholars translating into English an ancient text that survives only fragmentarily in its original Greek, and that was translated into Syriac several times. The team intended to translate into English the Greek fragments that survive, as well as the Syriac translations, and to do so with rigorous consistency. In passages where the author (Evagrius of Pontus) quoted from Scripture or Aristotle, they needed to be able to consult the Greek or Syriac text behind the quoted source. Such demands required a shared digital infrastructure to coordinate roughly forty different versions, including the team's working English translations, which were changing week to week. Parabola was indispensible. Nota bene:This application has many fine-tuned configuration options. Read through the whole file to see what is available.This application processes a single work, assumed to be that of the first <source> in the catalyzing TAN-A file. If you want a different source, move the relevant <source> to the first position. Warning: certain features have yet to be implemented Simplify the routine. This was converted from an inferior workflow, and still takes too many passes to get to the output. Annotations need a lot of work. They should be placed into the merge early. In fact, the whole workflow needs to be revised, with most structural work finished before attempting to convert to HTML. Develop output option using nested HTML divs, to parallel the existing output that uses HTML tables Integrate diff/collate into cells, on both the global and local level. Develop the css bar to allow users to click source id labels on and off. Add labels for divs higher than version wrappers. Consider merging based upon the resolved file, not its expansion.

TAN OutLocation: applications/TAN%20Out/TAN%20Out.xsl Version 2021-09-06 This utility exports a TAN or TEI file to other media. Currently only HTML is supported, optimized for JavaScript and CSS within the output/js and output/css directories in the TAN file structure. This utility quickly renders a TAN or TEI file as HTML. It has been optimized for JavaScript and CSS within the output/js and output/css in the TAN file structure. This master stylesheet is the public interface for the application. The parameters you will most likely want to change are listed and documented below, to help you customize the application to suit your needs. If you are relatively new to XSLT, or TAN applications, see Using TAN Applications and Utilities in the TAN Guidelines for general instructions. If you want to avoid changing the master application file, use the accompanying configuration file. Or make a copy of this file and edit and run it directly. Or create and configure a transformation scenario in Oxygen, defining the relevant parameters as you like. If you are comfortable with XSLT, try creating your own stylesheet, then import this one, and customize the process. To access the code base, follow the link in the <xsl:include> at the bottom of this file. Description Primary input: any TAN or TEI file Secondary input: none Primary output: if no destination filename is specified, an HTML file Secondary output: if a destination filename is specified, an HTML file at the target location Nota bene:This application can be used to generate primary or secondary output, depending upon how parameters are configured (see below). Warning: certain features have yet to be implemented Need to wholly overhaul the default CSS and JavaScript files in output/css and output/js Need to build parameters to allow users to drop elements from the HTML DOM.Need to enrich output message with parameter settings.Need to support export to odt. Need to support export to docx. Need to support export to plain text.

TangramLocation: applications/Tangram/Tangram.xsl This application searches for and scores clusters of words shared across two groups of texts, allowing you to look for quotations, paraphrases, or shared topics. When configured correctly, Tangram can also find idioms and collocations. Each input file, which may come in a variety of formats (TAN, TEI, other XML formats, plain text, Word documents) must be assigned to one or both of two groups, each group representing a work. Members of a work-group can be from different languages. Users can specify how many ngrams ("words") should be found, and how far apart they can be from each other. Ngram order is disregarded (e.g., ngram "shear", "blue", "sheep" would match ngram "sheep", "blue", "shear"). Tangram first normalizes and tokenizes each text according to language rules. Each token is converted to one or more aliases. If lexico-morphological data is available through a TAN-A-lm file, or if there is a TAN-A-lm language library for the language of the text being processed, a token may be replaced by multiple lexemes (e.g., "rung" would attract aliases "ring" and "rung"); otherwise, a case-insensitive generic form of the word is used. Then each text in group 1 is compared to each text in group 2 that shares the same language. For each pair of texts, Tangram identifies clusters of tokens that share the same alias. It then consolidates adjacent clusters of ngrams, and scores the results based upon several criteria. Grouped clusters are then converted into a primitive TAN-A file consisting of claims that identify parallel passages of each pair of texts, and the output is rendered as sortable HTML, to facilitate better study of the results. Tangram was written primarily to support quotation detection in ancient Greek and Latin texts, which has rather demanding requirements. Because of these objectives, Tangram frequently operates in quadratic or cubic time, so can be quite time-consuming to run. A feature allows the user to save intermediate stages as temporary files, to reduce processing time. Version 2021-09-06 This master stylesheet is the public interface for the application. The parameters you will most likely want to change are listed and documented below, to help you customize the application to suit your needs. If you are relatively new to XSLT, or TAN applications, see Using TAN Applications and Utilities in the TAN Guidelines for general instructions. If you want to avoid changing the master application file, use the accompanying configuration file. Or make a copy of this file and edit and run it directly. Or create and configure a transformation scenario in Oxygen, defining the relevant parameters as you like. If you are comfortable with XSLT, try creating your own stylesheet, then import this one, and customize the process. To access the code base, follow the link in the <xsl:include> at the bottom of this file. Description Primary input: any XML file, including this one (input is ignored) Secondary input: one or more files allocated to two groups; perhaps temporary files; perhaps TAN-A-lm files, either associated with secondary input, or part of a language catalog Primary output: perhaps diagnostics Secondary output: (1) an XML file with TAN-A claims identifying quotations or parallels, with the most likely at the top; (2) an HTML file that renders #1 in a more legible format. Warning: certain features have yet to be implementedSupport the method pioneered by Shmidman, Koppel, and Porat: https://arxiv.org/abs/1602.08715v2 Make sure texts run against themselves work.Incorporate simpler tablesorter javascript Nota bene: This application is one of the most experimental, and may not perform as expected. It has been successfully tested on several dozen classical Greek texts. A file may be placed in both groups, to explore cases of self-quotation or repetition. This process can take a very long time for lengthy texts, particuarly at the stage where a 1gram gets added to an Ngram, because the process takes quadratic time. Many messages could appear during tan:add-1gram(), updating progress through perhaps long routines. It is recommended that you save intermediate steps, to avoid having to repeat steps on subsequent runs. Processing time example: two texts in group 1 of about 4.4K and 2.6K words against a single text in group 2 of about 137K words took 319 seconds to build up to a set of unconsolidated token aliases. One text from group 1 had an associated TAN-A-lm annotation and the text from group 2 did as well. There was a TAN-A-lm library associated with the language (Greek). When the program was run again without changing parameters, it took only 11 seconds to get to that same stage, because of the saved temporary files. That same set of texts took 1,219 seconds (20 minutes) to develop into a 3gram, with chops at common Greek stop words and skipping the most common 1% token aliases. When run again, based on temporary files, it took only 23 seconds. That is, saving intermediate steps could save you hours of time.