TAN keywords for types of token definitions Definitive list of key terms used to name standard token definitions.

http://creativecommons.org/licenses/by/4.0/deed.en_US Creative Commons Attribution 4.0 International License This license is granted independent of rights and licenses associated with the source.

http://viaf.org/viaf/299582703 tag:textalign.net,2015:agent:kalvesmaki:joel Joel Kalvesmaki http://schema.org/creator creator Started file Revised to suit new <token-definition>

letters letters only general-words-only-1 general-words-only gwo General tokenization pattern for any language, words only. Non-letters such as punctuation are ignored.

letters and punctuation general-1 general gen General tokenization pattern for any language, treating not only series of letters as word tokens but also individual non-letter characters (e.g., punctuation).

nonspace General tokenization pattern for any language, treating any contiguous run of nonspace marks as a word.