Encoding Guidelines for the ELTeC: level 2

Table of contents

This reference document describes the European Literary Text Collection (ELTeC), a major deliverable of COST Action 16204, Distant Reading. The ELTeC is a principled collection of literary text corpora, uniformly encoded in TEI XML, and representing the production of novels in different European languages for the period 1840 to 1920. The present document begins with a description of the principles and sampling methods used to construct the collection, and is followed by detailed technical documentation of the TEI XML schema used to encode the textual components and the metadata of the collection. All texts included in the ELTeC conform to the schema described by this document.

1. Corpus Design

1.1. Principles and sampling method

The goal of CA16204 is to create a benchmark corpus of literature from 1840-1920 adequate to the needs of many different computational distant reading methods for corpus annotation and analysis. The corpus design should support comparison of texts and individual sub-collections selected according to the metadata associated with each text. It should be possible to sample sub-collections from the ELTeC for specific tasks and research questions. In a first step, we focus on the development of clear, operationalized, transparent and motivated selection criteria for the corpus.

It is important to stress that we do not intend to define what a novel is by defining what kind of selection criteria we will use for ELTeC. The category novel may be divided into three groups where at least one of the following core criteria is met: a) textual: length (>10.000 words), prose, fiction, narrative structure b) paratextual (the term ‘novel’ or equivalent appears in the title or subtitle of the text) and c) contextual: the text is bibliographically listed with the UDC: 82-31 Novels. Full-length stories.

We follow a non-normative but metadata-based approach of sampling criteria which will follow a corpus design approach. Corpus sampling criteria are mostly oriented/developed by the research question or/and contexts of the corpus creators group. In CA16204, we have neither only a single research question nor a fixed and previously known corpus creator group. The research context of the Action is more interested in knowledge production in a methodological sense and does not prefer a single method, model or theory. Furthermore, the member group of the Action will fluctuate and consist of researches from different disciplines with different theoretical and cultural contexts. Thus, we need to build the corpus design on a methodical basis. With this method, we will also be able to select canonical texts as well but not exclusively.

Representativeness is a kind of ideal which we would like to pursue but which cannot be achieved as whole. We will therefore aim to represent the variety of a population. In line with the MoU, the ELTeC will be designed as a monitor corpus where texts (from different languages and periods) can be added over time. We then need to decide which criterion is balanced in which way and interplays with other criteria.

1.2. Requirements of sampling criteria

According to the MoU, the corpus design should be balanced with respect to language and publication date of the texts. This means that the corpus should not be based solely on chronological criteria, meaning that we need a text from each year of the period in question. The main sampling criterion ‘language’ will require not to include translations at all. We will prefer to take the first edition of a novel or editions of these novels. By a novel, we prefer to take the edition of the book, hence we don't prefer novels only printed in periodicals, unless a particular literary tradition only features novels printed in serial format. If we consider editions of a novel, these editions should be freely available (free licences for reusing them. The first edition is more interesting from a philological point of view. It represents the authentic texts of the authors. Dealing with historical texts might require some cleaning up or normalizations. We will merge all word forms which are separated by line breaks. At the moment, we must assume that there are no (sufficiently good) normalization tools for every language. Later editions of a novel may be already normalized in some way. This might lead to different text representations in ELTeC which should be indicated in the metadata.

Considering also later freely available editions of a novel has two advantages: First, members of the Action already can provide machine-readable text documents (html, TEI etc.) of later editions and second, in some languages it might be easier to find later editions which already exist in a machine-readable format (in this way we do not have to put effort in digitizing them).

Electronically availability should not be a leading sampling criterion although availability is a limiting factor. A text should not be excluded from ELTeC because it is not digitized, but it should be excluded if the text cannot be made freely available in ELTeC. If we only use availability as a selecting criterion, we are at risk of copying projects such as ‘Gutenberg’ for example. The issue remains of finding additional funds to digitise non-canonical books. Un til that moment, the solution would be to create pilot corpora (that can later be supplemented or substituted by an alterna tive) for literatures that do not have a significant number of digitized texts.

We then need additional criteria which can be applied without having to know (read) the texts in question. The criteria should be checked without a deep knowledge about the texts. Otherwise, this will oppose the goal of the whole Action and the methodical approach of distant reading. The criteria should be operationalizable, meaning decidable from text metadata. Here, we define text metadata in a wider scope than only the classical bibliographical metadata. In this way corpus design interacts with metadata. Some of the text’s metadata can be used as sampling criteria. These criteria are text-external and -internal criteria (cf. Hunston 2008) on which we then need to rely. The selection criteria may be assisted by bibliographical overviews (wherever available) for each language in order to avoid possible canon-derived bias.

We suggest using an online table as a means of collecting nominations for inclusion in the ELTeC but other methods are feasible.

1.3. Sampling criteria

For creating a language collection two steps have to be done: First step is selection: identifying text candidates. Second step is balancing: proportion within the corpus. Both steps are defined in this document.

Principles:

Organization

Criteria: Eligibility.

In order to be included (selection), a text must...

Criteria: Composition.

Among the novels in each language the subcollection must contain...

Date : 1840 to 1920 (first iteration)

We will divide into four groups

  • group A (1840-1859): code T1
  • group B (1860-1879): code T2
  • group C (1880-1899): code T3
  • group D (1900-1920): code T4
Language of the text

The MoU defines the languages to be sampled. It does not propose distinguishing regional variation (e.g. in German), nor geographical variation (e.g. the French spoken in Belgium, France, or Switzerland). It assumes only European varieties, so English excludes US English; French excludes Quebecois.

We follow a language-based approach (not country-based). This means for example that we include Swiss German texts in the German language collection. We prefer standard varieties over dialect varieties if sampling criteria for text candidates are met.

Reprint count

We propose to use the number of times a work is reprinted as an objective measure of its reception during the period 1970-2009, using categories like the following:

  • low: few or no reprints found during this period,
  • high: many reprints found during this period
  • Note that the reprint count does not include digitizations of texts.
Author gender

We use the following three categories for actual (not claimed) author gender

  • male
  • female
  • mixed (more than one author)
  • undefined
Length

We include a variety of lengths

  • short (10k-50k word tokens)
  • medium (50k-100k word tokens)
  • long (>100k word tokens)

2. Corpus Encoding

2.1. Encoding Principles

The MoU for the project notes that ‘Distant Reading methods cover a wide range of computational methods for literary text analysis, such as authorship attribution, topic modelling, character network analysis, or stylistic analysis.’ The focus of the ELTeC encoding scheme is therefore not to represent texts in all their original complexity of structure or appearance, but rather to facilitate a richer and better-informed distant reading than a transcription of lexical content alone would permit. In designing this encoding scheme, we have applied the following principles:

The goal is not to duplicate the work of scholarly editors or to produce (yet another) digital edition of a specific source document. Rather it is to ensure that the ELTeC texts can be processed satisfactorily, even by simple minded (but XML-aware) systems primarily concerned with lexis, and to make life easier for the developers of such systems.

In selecting features for inclusion in the markup scheme, we have been guided, but not limited, by existing practice as far as possible. Our main goal has been to identify a small core set of textual features which can be readily (preferably automatically) identified in existing digital transcriptions, or easily and consistently provided by new transcriptions.

We distinguish three ‘levels’ of encoding, referred to below as level zero, level one and level two. All ELTeC texts are made available at level zero, the basic encoding format. Some texts may additionally be made available at levels one or two, which provide a richer set of encoded features. For example: a level one text will include semantic information missing from a level zero text; a level two text will include tokenization information missing from a level one text. As far as possible conversion between levels will be automatically scripted, but this is not possible in the general case.

This document lists all the textual features which are to be distinguished in an ELTeC conformant transcription at one of these three levels. Whenever a given feature exists in a text, it will be marked up as indicated here. No other features will be captured by the markup: if some textual feature not provided for here is identified by a marked up source text, that markup will be removed (though it may be retained in a version of the text encoded at a different level).

All ELTeC documents are TEI conformant, and therefore include a TEI Header, as discussed in section 3.1. Metadata in the TEI Header below.

2.2. Basic Transcription Guidelines (all levels)

The basic unit of the ELTeC collection is a single novel, represented by a single <TEI> element, consisting of a <teiHeader> element containing metadata specific to that novel and a <text> element containing a normalised transcription of the text itself. We propose no mechanism (other than metadata) to encode units larger than a single novel, such as multipart novel series like Proust's A la recherche du temps perdu or Balzac's Les Rougon-Macquart. Each text should be transcribed in full from a specific identifiable edition, typically the first, and the source documented in the TEI Header. The original spelling and punctuation of the source should be retained, but details of typography are not required: hence words hyphenated across line or page breaks should be silently reassembled.

To facilitate checking of a transcription against its source during production, the <pb> element may be used to mark the point in a transcription where a new page begins in the source. An identifier for each <pb> element may be provided in a level 1 text to facilitate linkage to a page image of the corresponding source page. This element is not required in a level 0 text.

Running titles, page footers, catchwords and other forms of printed paratext are all omitted from an ELTeC transcription, with the exception of a page number, which may be supplied as value of the n attribute. Note that this attribute supplies the page number as specified by the source. If no page number is given, the value should be enclosed in brackets.

If a page begins with the second part of a hyphenated word, the <pb> tag may appear after that word in order to simplify lexical processing. Otherwise its position should be the same in transcription and source.

As well as a titlepage or a table of contents, a published novel often includes material such as forewords or appendixes additional to the text of the novel itself. This liminal matter is included in an ELTeC text only if it is believed to be authorial. Material before the body of the text begins is collected within a <front> element, and material following the body in a <back> element. In either case, distinct sections of the material, if encoded, are represented by a <div> with its type attribute set to liminal.

At level zero, titlepages and tables of contents are omitted. At level one, they are replaced by a <gap> element. Non-authorial liminal material is silently omitted at all levels.

Within the body of a text, major structural divisions (parts, sections, chapters etc.) will be captured using the generic <div> element, with attributes type, xml:lang, xml:id and n used as further detailed below.

The names used for hierarchic structural divisions of a novel above the chapter are arbitrary, culture-specific, and often inconsistent : in some novels things called ‘part’ contain things called ‘book’ and in others the reverse. We propose to follow TEI in using a single element (<div>) for every hierarchical structural division, down to the level of ‘chapter’.

The type attribute is used to indicate the function of a structural division. It should have one of the following values:

liminal
authorial preface, foreword, back matter, etc.
titlepage
(within front or back) contains title page text: level 1 only
notes
(within front or back) contains authorial notes : level 1 only
part
any structural subdivision of a text larger than a chapter e.g. part, volume etc.
chapter
smallest structural subdivision of the body of a text

A short novel may have no subdivision at all, in which case the <div> element should not be used. No further subdivisions within a <div type='chapter'> are permitted. If the text of a chapter is subdivided in some way, for example by means of a number, a row of stars, a horizontal rule, or similar device, this should be indicated in the markup by means of a <milestone> element. If a chapter contains an embedded text of some kind, for example a quoted letter or other narrative, this should be marked using the <quote> element.

The (human) language in which a text is expressed is indicated explicitly by the xml:lang attribute which supplies the ISO 641-2 letter code for the language concerned. This attribute will always be supplied on the <text> element to specify a default, and may also appear on other elements to indicate passages where the language changes. The various different languages used in a given text will be itemized in its metadata (see <langUsage> element in the header).

A single reference scheme will be defined for the whole corpus, with the following components:

The identifier will be supplied as the value of an xml:id attribute on each <text>, <div>, front, back, or <s> element as appropriate. Adding this identifier is an easily automated task built into the workflow for accession to the ELTeC.

Note that these identifiers will not necessarily correspond with the numbering used in a particular source text. In a work where the first twelve chapters are considered to form part one, and the next twelve constitute part two, the first chapter of the second part will have an identifier ending 013, even though it may be numbered 1 in a source text.

We do not preserve the lineation of running prose in our source texts, since this is always purely an artefact of the source edition. For the same reason we reassemble words broken across a line break, silently removing any hyphen present. (This will make it impossible to use our texts for hyphenation studies. So be it.)

The title of a chapter, or of any other subdivision, as given in the source should be encoded using the TEI <head> element. There may be more than one such element at the start of a <div> element. Novels occasionally include other initial matter, such as a quotation, or a summary of the content of the chapter; these are not specially treated in ELTeC texts.

The chapters of a novel mostly consist of prose, arranged in paragraphs. It is not unusual to find other structures however, specifically verse, or passages of dialogue presented as if in a play, with speaker labels and even stage directions. Less frequently, novels may contain material presented in list or tabular formats. Graphics with their own associated heading or other text are also frequent.

Novels are also full of direct speech, represented using various different conventions, but almost always distinguished from the narrative voice. The first person narrative is also common, but may be regarded as a special case. How exactly different narrative strands are articulated in a novel, and the extent to which they may be characterised by their lexis has been a preoccupation of many ‘distant reading’ style analyses. Although it might be helpful to distinguish material purporting to be direct speech from material purporting to be narrative in our basic encoding, doing so consistently and accurately would be problematic. ELTeC texts therefore do no more than preserve existing punctuation. The <p> element is used for everything which is typeset as a separate block on the page, including both paragraphs and list items; the <l >element is used for verse lines or similar, typically set off from the rest of the text. Illustrations and any associated text such as a title or heading are excluded. Passages set as if in drama are not specially treated.

Printed texts typically deploy a number of conventions which can cause problems for linguistic analyses of even the most basic kind. Changes of font or style (italicization or use of superscript, for example) usually signal something, which an analysis should take into account. However, determining the function of such typographic variation is not always straightforward. ELTeC texts (at level 0) therefore simply indicate the presence of typographic salience, using the <hi> element.

2.3. Transcription Guidelines (level 1)

ELTeC encoding at level zero aims above all for consistency and transparency in what is reliably achievable, leaving most problematic issues to be addressed by linguistic annotation.

If however a text has been derived from a digital version in which a more ambitious range of textual features has already been captured, whether by means of TEI-style markup or styling information such as that provided by Word, or if there are sufficient resources available to provide a slightly reacher encoded version, a novel may be encoded at ELTeC level 1, using additional elements discussed in this section. Note that a level one text can always automatically be converted to a level zero text, if this is necessary for compatibility or for some other reason. (A script to do this is available on the project website) The reverse conversion, from level zero to level one, requires human intervention.

At level 1, the element <gap> should be used to indicate when something has been omitted from the encoding. For example, a suppressed graphic, or foreword, which at level zero are silently omitted, should be represented at level 1 by an explicit <gap> element, with attributes indicating what has been omitted from the encoding.

At level 1, the following additional elements can be used to mark up the significance of some stretch of text which would otherwise simply be marked as typographically salient using <hi>:

<foreign>
a word or phrase in a foreign language
<title>
the title of a book, piece of music, etc.
<label>
any form of heading or label found within a paragraph of text
<emph>
a word or phrase being linguistically emphasized e.g. by being shouted

The <hi> is also sometimes used for indications of superscript characters (such as French ‘14ᵉ’); these should simply be removed.

When a sequence of verse lines, or a passage from some other narrative level such as a letter is quoted within a text, the <l> or <p> elements representing it should be wrapped in the <quote> element made available by the ELTeC level one schema.

It is not unusual to find special devices such as a row of stars or a rule in the middle of a chapter, usually indicating a discontinuity in the narrative timeline, or structural shift. Such indications may be simply ignored in an ELTeC level zero text; at level 1, the special purpose <milestone> element may be used to mark their presence.

Level 1 texts may also represent authorial notes, if these are present, using the element <note> to contain the body of the note, and the element <ref> to represent the point of attachment for the note. Wherever they appear in the text, notes are always separated from it and encoded in a separate <div> element within the <back> element. See examples.

Occasionally, the text being transcribed contains self-evident errors. Where these are caused by the encoding process (e.g. an OCR error), these are always silently corrected in the transcription. Where however the original itself is faulty, however, and the transcriber (or a textual editor) has corrected it, the correction should be signalled by using the <corr> element available at level one.

2.4. Transcription Guidelines (level 2)

At ELTeC level2, all existing elements are retained and two new elements <s> and <w> are introduced to support segmentation of running text into sentence-like and word-like sequences respectively. Individual tokens are marked using the <w> element, and decorated with one or more of the TEI-defined linguistic attributes pos, lemma, and join. Both words and punctuation marks are considered to be ‘tokens’ in this sense, although the TEI suggests distinguishing the two cases using <w> and <pc> respectively. The <s> (segment) element is used to provide an end-to-end tessellating segmentation of the whole sequence of <w> elements, based on orthographic form. This provides a convenient extension of the existing text-body-div hierarchy within which tokens are located. The elements <p>, <head>, and <l> (which contain just text at levels 0 and 1) at level 2 can contain a sequence of <s> elements. Empty elements <gap>, <milestone>, <pb> or <ref> are also permitted within text content at any point, but these are disregarded when segmentation is carried out. Each <s> element can contain a sequence of <w> elements, either directly, or wrapped in one of the sub-paragraph elements <corr>, <emph>, <foreign>, <hi>, <label>, or <title>. To this list we add the element <rs> (referring string), provided by the TEI for the encoding of any form of entity name, such as a Named Entity Recognition procedure might produce.

This approach implies that <w> elements may appear at two levels in the hierarchy which may upset some software; it also implies that <w> elements must be properly contained within one of these elements, without overlap.

This TEI XML format is equally applicable to the production of training data for applications using machine learning techniques and to the outputs of such systems. However, since such machine learning applications typically operate on text content in a tabular format only, XSLT filters which transform (or generate) the XML markup discussed here from such tabular formats without loss of information are envisaged. At the time of writing, however, Working Group 2 has yet to put this proposed architecture to the test.

2.5. TEI Elements and textual features

The following summary table lists the textual features which every ELTeC text must capture, together with an indication of how that feature should be represented.

Textual FeatureEncodingNote
Page break<pb/>n attribute gives attested number of page; optional at level 0
Title page<div type="titlepage"> within <front>Optional at level 0
Authorial preface, foreword, appendix, etc<div type="liminal"> within <front> or <back> as appropriateNon-authorial matter is silently omitted
volume, chapter etc.<div> nested as necessary within <body>type may be chapter, or group (for anything else); n may indicate original numbering
Heading or title<head> at start of <div>; <trailer> at end
Running title/page footer OmittedPage number only may be included in <pb>
Prose paragraph or list item<p>Discard any formatting information
Verse line<l>Use only for verse lines in display blocks

Other textual features are treated differently at different encoding levels. They are listed in the following table:

Textual FeatureLevel 0 EncodingLevel 1 EncodingNote
Table of contents, errata list, other liminal matteromitted<gap>use unit and quantity to specify what has been omitted
Mid-chapter structural markeromitted<milestone/>use unit and type to supply further detail
Authorial footnoteomittedtranscribe text of note text within a <note> within <div type="notes"> inside <back>; mark point of attachment with a <ref>use target on <ref> to point to <note>
Font changeMark with <hi> (no attributes)Replace with <foreign>, <title>, <label>, <emph> as appropriate
Graphic or illustrationomitted<gap unit="graphic">
Quotation or display block<p> (or series of <l>)<quote> containing one or more <p> or <l>?
Editorial correctionunmarked<corr>Use when encoded text differs from printed original

3. Corpus Metadata

3.1. Metadata in the TEI Header

This section describes the metadata associated with each text (title, authorship, date etc.) and with the collection as a whole. The intention is to provide this in a standardised way to facilitate subsetting of the collection, using (for example) coded values for the descriptive selection criteria associated with the text. As far as possible, our text should represent the first complete printed edition of each novel selected.

The TEI Header provides a very large number of possibilities for encoding such metadata. We will provide a checklist of the TEI Header elements which are always to be provided for each text, possibly in the form of a template. As in the body of the text, the intention is to provide a guaranteed minimal level of information, consistent across all parts of the ELTeC.

Note that metadata may be supplied at (at least) two levels: the level of the ELTeC as a whole, and that of individual texts within it. Information which applies uniformly to all parts of the collection should be supplied in the ELTeC header; information specific to a particular document in the text header.

Every ELTeC text includes a TEI Header supplying metadata to describe it. There is also a TEI Header for the whole collection, which has additional information common to all the texts. This section lists the header components which should be supplied for every text, indicating briefly specific usage rules.

Here is the basic template for an individual ELTeC header:
<teiHeader type="novelHeader">  <fileDesc>   <titleStmt>    <title> <!-- title of work -->    </title>    <author> <!-- information about the author -->    </author>    <respStmt> <!-- information about the encoder -->    </respStmt>   </titleStmt>   <extent> <!-- size of the text, in pages and words -->   </extent>   <publicationStmt> <!-- standard text about status as part of ELTeC -->   </publicationStmt>   <sourceDesc>    <bibl> <!-- bibliographic description of the printed source -->    </bibl>   </sourceDesc>  </fileDesc>  <profileDesc> <!-- additional descriptive information, selection criteria, etc. -->  </profileDesc>  <revisionDesc> <!-- revision information -->  </revisionDesc> </teiHeader>

Each of the TEI elements shown above must be provided and used as described below. The ELTeC schemas will reject as invalid a document in which these conventions are not followed.

The natural language used for text in the Header should be that of the language collection to which the text belongs, e.g. French if the text is in French. The attribute @xml:lang may be supplied on any element to indicate its content is in some other language where necessary.

3.1.1. The file description (<fileDesc>)

3.1.1.1. The title statement (<titleStmt>)

This must supply :

  • at least one <title> element containing the standard title of the work, followed by the phrase "ELTeC edition" (or its equivalent in the appropriate language). Multiple title elements should be supplied only if the title has been translated into some other language.
  • at least one <author> element containing the standard name of the author, in the form SURNAME, FORENAMES (BIRTHYEAR - DEATHYEAR). If either date is unknown, it should be replaced by a question-mark. The author name should be that under which the author is conventionally catalogued, not necessarily that found on the title page.
  • at least one <respStmt> element containing one or more <name> elements and one or more <resp> elements indicating the person/s responsible for producing the ELTeC edition, and the nature of their responsibility respectively.
Here is a simple example, for a German text :
<titleStmt>  <title>Auf zwei Planeten : edizion ELTeC</title>  <title xml:lang="en">Two planets</title>  <author>Laßwitz, Kurd(1848-1910)</author>  <respStmt>   <resp>editor</resp>   <name>Alexander Geyken</name>  </respStmt> </titleStmt>
3.1.1.2. The extent statement (<extent>)

This must supply :

  • one or more <measure> elements, indicating the size of the text. The size must be given as numeric content, and the type of measurement must be either words or pages, as indicated by the @unit attribute.

Here is a simple example, for a French text :

<extent>  <measure unit="words">12000</measure>  <measure unit="pages">256</measure> </extent>

The page count may be derived from an external bibliographic source, and may not therefore correspond with the actual number of <pb> elements in the transcription. If no page count is available, no <measure unit="pages"> should be supplied.

3.1.1.3. The publication statement (<publicationStmt>)
This must contain a <distributor> element naming the project itself, a <date> element showing the text was added to the Collection, and an <availability> element containing a standardised statement concerning its licence. One or more <ref> elements may also follow specifying a URL from which the text may be downloaded. These elements must be given in the order specified, as shown in this example:
<publicationStmt>  <distributor>COST Action ELTeC</distributor>  <date when="2018-11-23"/>  <availability>   <licence target="https://creativecommons.org/licenses/by/4.0/"> Licenced under CC-BY 4.0 </licence>  </availability>  <ref type="doi"   target="10.5281/zenodo.8468"/> </publicationStmt>
3.1.1.4. The source description (<sourceDesc>)

This must contain at least one <bibl> element containing a bibliographic description of the source text from which the ELTeC version has been derived. This description might include any or all of the following standard TEI bibliographic elements:

Here is a French example:
<sourceDesc>  <bibl type="digitalSource">   <ref target="http://gallica.bnf.fr/ark:/12148/bpt6k931128v"> Tatiana Leïlof      roman parisien (édition numerisée) </ref>   <publisher> gallica.bnf.fr / Bibliothèque nationale de France </publisher>   <idno type="ARK">12148/bpt6k931128v</idno>  </bibl>  <bibl type="firstEdition">   <title>Tatiana Leïlof , roman parisien, par Édouard Rod</title>   <publisher>E. Plon, Nourrit et Cie</publisher>   <pubPlace>Paris</pubPlace>   <date>1886</date>  </bibl> </sourceDesc>

This encoding shows that the source of the ELTeC text is the digital facsimile provided under the title and ARK identifier indicated, and that the first edition of this work was published in Paris in 1886.

3.1.2. The profile description (<profileDesc>)

This must contain a <langUsage> element detailing the language or languages used in the text, followed optionally by a <textClass> element providing descriptive keywords, and by a mandatory <textDesc> element providing the sampling criteria applicable to this text.

The <textDesc> element contains one of each of the following elements from the model.textDescPart class in the order indicated:

  • UNKNOWN ELEMENT model.textDescPart

These elements are used to represent the sampling criteria applicable to the current text. They are specific to the ELTeC project, and are therefore taken from the ELTeC namespace (http://distantreading.net/eltec/ns) rather than the TEI namespace.

The optional <textClass> element may contain one or more <keywords> elements. Each <keywords> element may contain one or more <term> element describing some aspect of the text. At present the descriptive keywords may be freely chosen. All the terms in a given <keywords> list should use the same language, which should be that of the text itself, unless otherwise specified by means of an @xml:lang attribute.

The <langUsage> element should contain one or more <language> element, one for each of the human languages used in the text. The @ident attribute of this element identifies the language using the ISO 639-2 code in the same way as the @xml:lang attribute. The @usage attribute may be used to indicate approximately what percentage of the text uses this language, or otherwise qualify it by means of a brief description.

Here is an imaginary Italian example:
<profileDesc    xmlns:e="http://distantreading.net/eltec/ns">  <langUsage>   <language ident="itusage="80"/>   <language ident="enusage="10">citazioni inglesi</language>   <language ident="deusage="10">citazioni tedeschi</language>  </langUsage>  <textDesc>   <e:authorGender key="M"/>   <e:canonicity key="low"/>   <e:size key="large"/>   <e:timeSlot key="T3"/>  </textDesc> </profileDesc>
This text is mostly in Italian, but approximately a tenth of it is taken up with quotations in English, and another tenth with quotations in German. Its description indicates that it has a male author, is of low canonicity (i.e. few reprints have been found during the period 1970-2009), contains more than 100,000 words, and was first published between 1880 and 1899.

3.1.3. The revision description (<revisionDesc>)

This contains at least one <change> element, documenting significant revisions or versions of the asspociated text. Each change element has a @when attribute which gives the date of the change in W3C format (YYYY-MM-DD) and the change elements are given in chronological order, most recent first. The content of the element should be a brief sentence indicating what was done and who was responsible for doing it, using the language of the text.

Here is an example from a Spanish text:
<revisionDesc>  <change when="20180427">Revisión en formato ELTeC: Lou Burnard</change>  <change when="20011004">Supervisión del texto : Marisa Payá (Supervisora)</change>  <change when="20011004">Revisión del formato : Josefina Carrión    (Supervisora)</change>  <change when="20011127">Etiquetado del texto en XML-TEI : Ana López Díaz    (Correctora)</change>  <change when="20040630">Revisión del etiquetado : Mari    Carmen Jerez Gaona (Correctora)</change> </revisionDesc>

Appendix A Formal specifications

The ELTeC encoding scheme defined by this document is a TEI-conformant customization, from which user documentation, and formal RELAXNG or DTD specifications are generated automatically.

Appendix A.1 Elements

Appendix A.1.1 <TEI>

<TEI> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resource class. Multiple <TEI> elements may be combined within a <TEI> (or <teiCorpus>) element. [4. Default Text Structure 15.1. Varieties of Composite Text]
Moduletextstructure
AttributesAttributes att.typed (@type) att.global (xml:id, xml:lang, @n, @xml:base, @xml:space) att.global.rendition (@rend)
xml:id(identifier) provides a unique identifier for the element bearing the attribute.
Derived fromatt.global
StatusRequired
DatatypeID
xml:lang(language) indicates the language of the element content using a ‘tag’ generated according to BCP 47.
Derived fromatt.global
StatusRequired
Datatypeteidata.language
Contained by
textstructure: TEI
May contain
header: teiHeader
textstructure: TEI text
Note

In ELTeC schemas, the attributes xml:lang and xml:id must be supplied for each TEI element. Identifiers should have a common alphabetic prefix followed by up to 5 digits. Language codes should conform to ISO 639-2

Example
<TEI xml:id="SPA2001xml:lang="SPA" xmlns="http://www.tei-c.org/ns/1.0"> <!-- --> </TEI>
This text in the Spanish language has the identifier SPA2001
Schematron
<sch:ns prefix="tei"  uri="http://www.tei-c.org/ns/1.0"/> <sch:ns prefix="xs"  uri="http://www.w3.org/2001/XMLSchema"/>
Schematron
<sch:ns prefix="rng"  uri="http://relaxng.org/ns/structure/1.0"/>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="teiHeader"/>
  <alternate minOccurs="1" maxOccurs="1">
   <sequence minOccurs="1" maxOccurs="1">
    <classRef key="model.resource"
     minOccurs="1" maxOccurs="unbounded"/>
    <elementRef key="TEI" minOccurs="0"
     maxOccurs="unbounded"/>
   </sequence>
   <elementRef key="TEI" minOccurs="1"
    maxOccurs="unbounded"/>
  </alternate>
 </sequence>
</content>
    
Schema Declaration
element TEI
{
   att.global.attribute.n,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.typed.attributes,
   attribute xml:id { text },
   attribute xml:lang { text },
   ( teiHeader, ( ( model.resource+, TEI* ) | TEI+ ) )
}

Appendix A.1.2 <author>

<author> (author) in a bibliographic reference, contains the name(s) of an author, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement]
Modulecore
AttributesAttributes att.canonical (@ref) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Member of
Contained by
core: bibl
header: titleStmt
May containCharacter data only
Note

The ref attribute should be used to reference one or more externally defined identifiers for the author, as defined by an authority file such as VIAF.

ExampleWhen used within a <titleStmt>, an author's name is given in a standardized format (surname, forename/s, (YYYY-YYYY)) as shown in this example.
<author ref="viaf:31996364">Forster, Edward Morgan (1879-1970)</author>
ExampleWhen used within the <sourceDesc>, an author's name is given in the format used by the source in question, as shown in this example.
<author>E.M. Forster</author>
ExampleIn cases of multiple authorship, the <author> element within <titleStmt> should be repeated
<titleStmt>  <title>The Diary of a Nobody : ELTeC edition </title>  <author>Grossmith, George (1847-1912)</author>  <author>Grossmith, Walter Weedon (1854-1919)</author> </titleStmt>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element author { att.canonical.attributes, att.global.attributes, text }

Appendix A.1.3 <authorGender>

<authorGender> specifies the sex of the author where this is known
Namespacehttp://distantreading.net/eltec/ns
Modulederived-module-ELTeC
AttributesAttributes
key
StatusRequired
Datatypeteidata.enumerated
Legal values are:
M
male author
F
female author
U
author sex unknown
X
author sex mixed
Note

indicates the biological sex of the author, not the sex claimed or implied by the author's name

Contained by
corpus: textDesc
May containEmpty element
Exampleindicates that the author of the novel to be described is male (M)
<profileDesc    xmlns:e="http://distantreading.net/eltec/ns">  <textDesc>   <authorGender xmlns="http://distantreading.net/eltec/ns" key="M"/> <!-- ... -->  </textDesc> </profileDesc>
Exampleindicates that the gender of author of the novel to be described cannot be specified (U)
<profileDesc    xmlns:e="http://distantreading.net/eltec/ns">  <textDesc>   <authorGender xmlns="http://distantreading.net/eltec/ns" key="U"/> <!-- ... -->  </textDesc> </profileDesc>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element authorGender { attribute key { "M" | "F" | "U" | "X" }, empty }

Appendix A.1.4 <availability>

<availability> (availability) supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc. [2.2.4. Publication, Distribution, Licensing, etc.]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
May contain
header: licence
Note

All ELTeC texts comprise a text which is in the public domain and markup which is licenced under the Creative Commons Attribution licence indicated (CC-BY 4.0).

Example
<availability>  <licence target="https://creativecommons.org/licenses/by/4.0/">   <p>The TEI mark up is licenced with Creative Commons Attribution (CC-BY 4.0).</p>  </licence> </availability>
Content model
<content>
 <elementRef key="licence"/>
</content>
    
Schema Declaration
element availability { att.global.attributes, licence }

Appendix A.1.5 <back>

<back> (back matter) contains any appendixes, etc. following the main part of a text. [4.7. Back Matter 4. Default Text Structure]
Moduletextstructure
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
textstructure: text
May contain
analysis: span spanGrp
textstructure: div trailer
Note

Because cultural conventions differ as to which elements are grouped as back matter and which as front matter, the content models for the <back> and <front> elements are identical.

Example
<back>  <div type="liminal">   <head>Appendix</head>   <p> <!-- additional text here -->   </p>  </div>  <div type="notes">   <head>Authorial Notes</head>   <note xml:id="ENG18700_N23"> <!-- text of footnote here -->   </note>  </div> </back>
Schematron
<sch:assert test="child::tei:div[@type='notes'] or child::tei:div[@type='liminal']"  role="ERROR">The back matter of a text must contain either liminal or notes divisions</sch:assert>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <alternate minOccurs="0"
   maxOccurs="unbounded">
   <classRef key="model.frontPart"/>
   <classRef key="model.pLike.front"/>
   <classRef key="model.pLike"/>
   <classRef key="model.listLike"/>
   <classRef key="model.global"/>
  </alternate>
  <alternate minOccurs="0" maxOccurs="1">
   <sequence minOccurs="1" maxOccurs="1">
    <classRef key="model.div1Like"/>
    <alternate minOccurs="0"
     maxOccurs="unbounded">
     <classRef key="model.frontPart"/>
     <classRef key="model.div1Like"/>
     <classRef key="model.global"/>
    </alternate>
   </sequence>
   <sequence minOccurs="1" maxOccurs="1">
    <classRef key="model.divLike"/>
    <alternate minOccurs="0"
     maxOccurs="unbounded">
     <classRef key="model.frontPart"/>
     <classRef key="model.divLike"/>
     <classRef key="model.global"/>
    </alternate>
   </sequence>
  </alternate>
  <sequence minOccurs="0" maxOccurs="1">
   <classRef key="model.divBottomPart"/>
   <alternate minOccurs="0"
    maxOccurs="unbounded">
    <classRef key="model.divBottomPart"/>
    <classRef key="model.global"/>
   </alternate>
  </sequence>
 </sequence>
</content>
    
Schema Declaration
element back
{
   att.global.attributes,
   (
      (
         model.frontPartmodel.pLike.frontmodel.pLike
       | model.listLike
       | model.global
      )*,
      (
         (
            model.div1Like,
            ( model.frontPart | model.div1Like | model.global )*
         )
       | ( model.divLike, ( model.frontPart | model.divLike | model.global )* )
      )?,
      ( model.divBottomPart, ( model.divBottomPart | model.global )* )?
   )
}

Appendix A.1.6 <bibl>

<bibl> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.sortable (@sortKey)
typecharacterizes the element in some sense, using any convenient classification scheme or typology.
Derived fromatt.typed
StatusRequired
Datatypeteidata.enumerated
Legal values are:
firstEdition
describes the first complete print source edition published in the author's lifetime
printSource
describes a print source edition used as the source of the encoding which is not the first edition
digitalSource
describes a digital edition used as the source of the encoding, which may be derived from the first or another print edition
unspecified
the status of this reference has to be determined
Member of
Contained by
core: bibl
header: sourceDesc
May contain
Note

Contains phrase-level elements, together with any combination of elements from the model.biblPart class

Exampleshows a full source description with a digital source which represents the first edition
<sourceDesc>  <bibl type="digitalSource">   <title>Wuthering Heights (1st edition) : wikisource edition</title>   <ref target="https://en.wikisource.org/wiki/Wuthering_Heights_(1st_edition)"/>  </bibl>  <bibl type="firstEdition">   <title>Wuthering Heights</title>   <title>A novel by</title>   <author>Ellis Bell</author>   <publisher>London: T. C. Newby</publisher>   <date>1847</date>  </bibl> </sourceDesc>
Example
<sourceDesc>  <bibl type="printSource">   <title>Opera omnia</title>   <title>Romanzi</title>   <author>Svevo, Italo</author>   <respStmt>    <resp>editor</resp>    <name>Maier, Bruno</name>   </respStmt>   <publisher>dall'Oglio</publisher>   <pubPlace>Milano</pubPlace>   <date>1969</date>   <note>Contiene: Una vita ; Senilita ; La coscienza di Zeno</note>  </bibl>  <bibl type="firstEdition">   <date>1892</date>  </bibl> </sourceDesc>
The ELTeC text derives from a print edition published in 1969. The first edition of the work concerned was published in 1892. We do not know whether or not the print edition used the first edition as a source.
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <classRef key="model.highlighted"/>
  <classRef key="model.pPart.data"/>
  <classRef key="model.pPart.edit"/>
  <classRef key="model.segLike"/>
  <classRef key="model.ptrLike"/>
  <classRef key="model.biblPart"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Schema Declaration
element bibl
{
   att.global.attributes,
   att.sortable.attributes,
   attribute type
   {
      "firstEdition" | "printSource" | "digitalSource" | "unspecified"
   },
   (
      text
    | model.gLike
    | model.highlightedmodel.pPart.datamodel.pPart.editmodel.segLikemodel.ptrLikemodel.biblPartmodel.global
   )*
}

Appendix A.1.7 <body>

<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure]
Moduletextstructure
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
textstructure: text
May contain
analysis: span spanGrp
textstructure: div trailer
Example
<body>  <div type="chapter">   <head>I.</head>   <p>Utanfor, i Vest, bryt Have paa mot ei sju Milir lang laag Sandstrand.</p>   <p>Det er sjølve Have. Nordhave breidt og fritt, ukløyvt og utøymt, endelaust....</p> <!-- ... -->  </div> <!-- more chapters here -->  <trailer>Slutten</trailer> </body>
Schematron
<sch:assert test="descendant::tei:div[@type='chapter' or @type='letter']"  role="ERROR">The body of a text must contain at least one chapter or letter</sch:assert>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <classRef key="model.global"
   minOccurs="0" maxOccurs="unbounded"/>
  <sequence minOccurs="0" maxOccurs="1">
   <classRef key="model.divTop"/>
   <alternate minOccurs="0"
    maxOccurs="unbounded">
    <classRef key="model.global"/>
    <classRef key="model.divTop"/>
   </alternate>
  </sequence>
  <sequence minOccurs="0" maxOccurs="1">
   <classRef key="model.divGenLike"/>
   <alternate minOccurs="0"
    maxOccurs="unbounded">
    <classRef key="model.global"/>
    <classRef key="model.divGenLike"/>
   </alternate>
  </sequence>
  <alternate minOccurs="1" maxOccurs="1">
   <sequence minOccurs="1"
    maxOccurs="unbounded">
    <classRef key="model.divLike"/>
    <alternate minOccurs="0"
     maxOccurs="unbounded">
     <classRef key="model.global"/>
     <classRef key="model.divGenLike"/>
    </alternate>
   </sequence>
   <sequence minOccurs="1"
    maxOccurs="unbounded">
    <classRef key="model.div1Like"/>
    <alternate minOccurs="0"
     maxOccurs="unbounded">
     <classRef key="model.global"/>
     <classRef key="model.divGenLike"/>
    </alternate>
   </sequence>
   <sequence minOccurs="1" maxOccurs="1">
    <sequence minOccurs="1"
     maxOccurs="unbounded">
     <classRef key="model.common"/>
     <classRef key="model.global"
      minOccurs="0" maxOccurs="unbounded"/>
    </sequence>
    <alternate minOccurs="0" maxOccurs="1">
     <sequence minOccurs="1"
      maxOccurs="unbounded">
      <classRef key="model.divLike"/>
      <alternate minOccurs="0"
       maxOccurs="unbounded">
       <classRef key="model.global"/>
       <classRef key="model.divGenLike"/>
      </alternate>
     </sequence>
     <sequence minOccurs="1"
      maxOccurs="unbounded">
      <classRef key="model.div1Like"/>
      <alternate minOccurs="0"
       maxOccurs="unbounded">
       <classRef key="model.global"/>
       <classRef key="model.divGenLike"/>
      </alternate>
     </sequence>
    </alternate>
   </sequence>
  </alternate>
  <sequence minOccurs="0"
   maxOccurs="unbounded">
   <classRef key="model.divBottom"/>
   <classRef key="model.global"
    minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
 </sequence>
</content>
    
Schema Declaration
element body
{
   att.global.attributes,
   (
      model.global*,
      ( model.divTop, ( model.global | model.divTop )* )?,
      ( model.divGenLike, ( model.global | model.divGenLike )* )?,
      (
         ( model.divLike, ( model.global | model.divGenLike )* )+
       | ( model.div1Like, ( model.global | model.divGenLike )* )+
       | (
            ( model.common, model.global* )+,
            (
               ( model.divLike, ( model.global | model.divGenLike )* )+
             | ( model.div1Like, ( model.global | model.divGenLike )* )+
            )?
         )
      ),
      ( model.divBottom, model.global* )*
   )
}

Appendix A.1.8 <canonicity>

<canonicity> indicates the degree to which the text has become part of a literary canon
Namespacehttp://distantreading.net/eltec/ns
Modulederived-module-ELTeC
AttributesAttributes
key
StatusRequired
Datatypeteidata.enumerated
Legal values are:
high
text has been republished very frequently since its original appearance
low
text has not been reprinted since its original appearance
unspecified
information about the number of reprints not yet determined
Note

Frequency of publication is assessed with reference to time periods of twenty years. A work which has been republished in more than four such periods, i.e. over a reasonably long time since its first appearance, is considered highly canonical. A work which appears to have been published only in a single such period is considered of low canonicity.

Contained by
corpus: textDesc
May containEmpty element
Example
<textDesc    xmlns:e="http://distantreading.net/eltec/ns"> <!-- ... -->  <reprintCount xmlns="http://distantreading.net/eltec/ns" key="medium"/> <!-- ... --> </textDesc>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element canonicity { attribute key { "high" | "low" | "unspecified" }, empty }

Appendix A.1.9 <change>

<change> (change) documents a change or set of changes made during the production of a source document, or during the revision of an electronic file. [2.6. The Revision Description 2.4.1. Creation 11.7. Identifying Changes and Revisions]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) att.datable.w3c (when, @notBefore, @notAfter, @from, @to)
whensupplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.
Derived fromatt.datable.w3c
StatusRequired
Datatypeteidata.temporal.w3c
Contained by
header: revisionDesc
May contain
Note

In ELTeC texts, the when attribute must be supplied and should indicate a date in the format YYY-MM-DD.

Example
<change when="2018-11-01">Conversion with CLIGStoELTeC stylesheet for ELTeC-1</change>
Content model
<content>
 <macroRef key="macro.specialPara"/>
</content>
    
Schema Declaration
element change
{
   att.datable.w3c.attribute.notBefore,
   att.datable.w3c.attribute.notAfter,
   att.datable.w3c.attribute.from,
   att.datable.w3c.attribute.to,
   att.global.attributes,
   att.typed.attributes,
   attribute when { text },
   macro.specialPara
}

Appendix A.1.10 <corr>

<corr> (correction) contains the correct form of a passage apparently erroneous in the copy text. [3.5.1. Apparent Errors]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type)
Member of
Contained by
May contain
ExampleIn this text, the words "al" and "hombre" have been added by the editor/transcriber, to replace an original which omits these words for some reason or represents them with a non-standard or erroneous orthography.
<p>... me había presentado aún ocasión de asombrar <corr>al</corr> mundo con ningún hecho heroico; pero el oírme llamar <corr>hombre</corr> me llenó de orgul...</p>
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element corr { att.global.attributes, att.typed.attributes, macro.paraContent }

Appendix A.1.11 <date>

<date> (date) contains a date in any format. [3.6.4. Dates and Times 2.2.4. Publication, Distribution, Licensing, etc. 2.6. The Revision Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 15.2.3. The Setting Description 13.4. Dates]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref) att.datable (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) att.dimensions (@unit, @quantity, @extent) att.typed (@type)
Member of
Contained by
May contain
analysis: pc s span spanGrp w
character data
Note

<date> is used within <publicationStmt> and within <bibl>.

Exampleindicatesthe date of the publication of a novel in ELTeC within <publicationStmt>
<publicationStmt>  <availability>   <licence target="https://creativecommons.org/licenses/by/4.0/">    <p> <!-- description -->    </p>   </licence>  </availability>  <p> Published as part of ELTeC <date>2018-11-01</date>  </p> </publicationStmt>
Exampleindicating the date of the first edition of a novel within <sourceDesc>
<sourceDesc>  <bibl type="firstEdition"> <!-- -->   <date>1871</date> <!-- -->  </bibl> </sourceDesc>
Schematron
<sch:assert test="ancestor::tei:teiHeader"  role="ERROR"> The date element should not be used outside the TEI Header </sch:assert>
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <classRef key="model.phrase"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Schema Declaration
element date
{
   att.global.attributes,
   att.canonical.attributes,
   att.datable.attributes,
   att.dimensions.attributes,
   att.typed.attributes,
   ( text | model.gLike | model.phrase | model.global )*
}

Appendix A.1.12 <distributor>

<distributor> (distributor) supplies the name of a person or other agency responsible for the distribution of a text. [2.2.4. Publication, Distribution, Licensing, etc.]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref)
Member of
Contained by
core: bibl
May contain
analysis: pc s span spanGrp w
character data
Example
<distributor>Oxford Text Archive</distributor> <distributor>Redwood and Burn Ltd</distributor>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element distributor
{
   att.global.attributes,
   att.canonical.attributes,
   macro.phraseSeq
}

Appendix A.1.13 <div>

<div> (text division) contains a subdivision of the front, body, or back of a text. [4.1. Divisions of the Body]
Moduletextstructure
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
typecharacterizes the element in some sense, using any convenient classification scheme or typology.
Derived fromatt.typed
StatusOptional
Datatypeteidata.enumerated
Legal values are:
titlepage
title page transcription (level 1 only)
notes
collects authorial notes (level 1 only)
liminal
authorial preface, foreword, back matter etc.
chapter
any textual division which is not further sub-divided except by milestone markers and which contains one or more paragraphs
letter
a textual division containing one or more paragraphs and not further sub-divided except by milestone markers which is presented as a letter
group
any grouping of chapters or groups, e.g. part, volume, etc.
Member of
Contained by
textstructure: back body div front
May contain
analysis: span spanGrp
textstructure: div trailer
Note

A division of type=letter should only be used if the entire novel is composed of a sequence of letters. When a letter is quoted within a chapter containing other material, the letter should be marked using the <quote> element.

Example
<body>  <div type="chapter">   <head>III</head>   <p>Em casa do dr. Carvalho, Claudio pouco fallou com Emilia, elle prezo a uma meza do      "whist", para ser agradavel ao juiz que sem isso se aborrecia, ella dansando sempre.      ...</p> <!-- more paragraphs here -->  </div> </body>
Schematrondiv of type chapter should not be further subdivided
<sch:report test="@type='chapter' and child::tei:div"> A div of type 'chapter' may not be further subdivided (except by milestones) </sch:report>
Schematron
<s:report test="ancestor::tei:l"> Abstract model violation: Lines may not contain higher-level structural elements such as div. </s:report>
Schematron
<s:report test="ancestor::tei:p or ancestor::tei:ab and not(ancestor::tei:floatingText)"> Abstract model violation: p and ab may not contain higher-level structural elements such as div. </s:report>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <alternate minOccurs="0"
   maxOccurs="unbounded">
   <classRef key="model.divTop"/>
   <classRef key="model.global"/>
  </alternate>
  <sequence minOccurs="0" maxOccurs="1">
   <alternate minOccurs="1" maxOccurs="1">
    <sequence minOccurs="1"
     maxOccurs="unbounded">
     <alternate minOccurs="1" maxOccurs="1">
      <classRef key="model.divLike"/>
      <classRef key="model.divGenLike"/>
     </alternate>
     <classRef key="model.global"
      minOccurs="0" maxOccurs="unbounded"/>
    </sequence>
    <sequence minOccurs="1" maxOccurs="1">
     <sequence minOccurs="1"
      maxOccurs="unbounded">
      <classRef key="model.common"/>
      <classRef key="model.global"
       minOccurs="0" maxOccurs="unbounded"/>
     </sequence>
     <sequence minOccurs="0"
      maxOccurs="unbounded">
      <alternate minOccurs="1"
       maxOccurs="1">
       <classRef key="model.divLike"/>
       <classRef key="model.divGenLike"/>
      </alternate>
      <classRef key="model.global"
       minOccurs="0" maxOccurs="unbounded"/>
     </sequence>
    </sequence>
   </alternate>
   <sequence minOccurs="0"
    maxOccurs="unbounded">
    <classRef key="model.divBottom"/>
    <classRef key="model.global"
     minOccurs="0" maxOccurs="unbounded"/>
   </sequence>
  </sequence>
 </sequence>
</content>
    
Schema Declaration
element div
{
   att.global.attributes,
   attribute type
   {
      "titlepage" | "notes" | "liminal" | "chapter" | "letter" | "group"
   }?,
   (
      ( model.divTop | model.global )*,
      (
         (
            ( ( model.divLike | model.divGenLike ), model.global* )+
          | (
               ( model.common, model.global* )+,
               ( ( model.divLike | model.divGenLike ), model.global* )*
            )
         ),
         ( model.divBottom, model.global* )*
      )?
   )
}

Appendix A.1.14 <emph>

<emph> (emphasized) marks words or phrases which are stressed or emphasized for linguistic or rhetorical effect. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Member of
Contained by
May contain
ExampleThe editor/transcriber wishes to show that the word "my" is linguistically emphasized in the original source:
Oh—don’t mind <emph>my</emph> feelings—call me a mangy monkey—I’ve tried hard enough to look like one!
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element emph { att.global.attributes, macro.paraContent }

Appendix A.1.15 <encodingDesc>

<encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived. [2.3. The Encoding Description 2.1.1. The TEI Header and Its Components]
Moduleheader
AttributesAttributesatt.global (n, @xml:id, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend)
n(number) gives a number (or other label) for an element, which is not necessarily unique within the document.
Derived fromatt.global
StatusRequired
Datatypeteidata.enumerated
Legal values are:
eltec-0
eltec-1
eltec-2
Contained by
header: teiHeader
May contain
core: p
Exampledescribes the level of encoding of the TEI document, either level 0, 1 or 2
<encodingDesc n="eltec-0">  <p>Encoded to ELTeC level zero</p> </encodingDesc>
Content model
<content>
 <elementRef key="p"/>
</content>
    
Schema Declaration
element encodingDesc
{
   att.global.attribute.xmlid,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   attribute n { "eltec-0" | "eltec-1" | "eltec-2" },
   p
}

Appendix A.1.16 <extent>

<extent> (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units. [2.2.3. Type and Extent of File 2.2. The File Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 10.7.1. Object Description]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Member of
Contained by
core: bibl
header: fileDesc
May contain
core: measure
Note

Must contain at least one <measure> element, indicating the word count. Other indications of size are optional.

ExampleA book of 235 pages, containing 102,345 words
<extent>  <measure unit="words">102345</measure>  <measure unit="pages">235</measure> </extent>
Schematron
<sch:assert test="child::tei:measure[@unit eq 'words']">You must provide a word count</sch:assert>
Content model
<content>
 <elementRef key="measure" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element extent { att.global.attributes, measure+ }

Appendix A.1.17 <fileDesc>

<fileDesc> (file description) contains a full bibliographic description of an electronic file. [2.2. The File Description 2.1.1. The TEI Header and Its Components]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: teiHeader
May contain
Note

The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elsewhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived.

Example
<fileDesc>  <titleStmt> <!-- information about the title of the work -->  </titleStmt>  <extent> <!-- information about the size of the work -->  </extent>  <publicationStmt>   <p>Adicionado à coleção ELTeC <date>20 de novembro de 2018</date>. </p>  </publicationStmt>  <sourceDesc>   <bibl> <!-- bibliographic description of the source/s of the work -->   </bibl>  </sourceDesc> </fileDesc>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="titleStmt"/>
  <elementRef key="extent"/>
  <elementRef key="publicationStmt"/>
  <elementRef key="sourceDesc"/>
 </sequence>
</content>
    
Schema Declaration
element fileDesc
{
   att.global.attributes,
   ( titleStmt, extent, publicationStmt, sourceDesc )
}

Appendix A.1.18 <foreign>

<foreign> (foreign) identifies a word or phrase as belonging to some language other than that of the surrounding text. [3.3.2.1. Foreign Words or Expressions]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Member of
Contained by
May contain
analysis: pc s span spanGrp w
character data
Note

The global xml:lang attribute should be supplied for this element to identify the language of the word or phrase marked. As elsewhere, its value should be a language tag as defined in 6.1. Language Identification.

This element is intended for use only where no other element is available to mark the phrase or words concerned. The global xml:lang attribute should be used in preference to this element where it is intended to mark the language of the whole of some text element.

The <distinct> element may be used to identify phrases belonging to sublanguages or registers not generally regarded as true languages.

ExampleThe Latin phrase "Ab urbe condita" is not in the same language (Portuguese) as the rest of the paragraph
<p>E calcando a espada debaixo do pé esquerdo, curvou-a: <foreign>Ab urbe condita</foreign>, da fundação de Roma, no ano seiscentos e três. </p>
ExampleIn this example, the whole quotation is given in a different language (Spanish). The foreign language concerned is specified by means of the xml:lang attribute. The <foreign> element can only be used to enclose words and phrases directly, rather than to enclose <l> or <quote> elements, and must therefore be repeated for the content of each line.
<p>E cá fóra veriamos o velho mendigo no mesmo lugar ainda, cantando ao som da sanfona:</p> <quote>  <l>   <foreign xml:lang="es">«Rosa fresca, rosa fresca,</foreign>  </l>  <l>   <foreign xml:lang="es">tan garrida y con amor;</foreign>  </l>  <l>   <foreign xml:lang="es">quando vos tuve em mis braços,</foreign>  </l>  <l>   <foreign xml:lang="es">no vos supe servir, no,</foreign>  </l>  <l>   <foreign xml:lang="es">y agora que os serviria</foreign>  </l>  <l>   <foreign xml:lang="es">no vos puedo aver no.</foreign>   <ref target="#note2"/>  </l> </quote>
ExampleAn alternative and more economical encoding for the foregoing example:
<p>E cá fóra veriamos o velho mendigo no mesmo lugar ainda, cantando ao som da sanfona:</p> <quote xml:lang="es">  <l>«Rosa fresca, rosa fresca,</l>  <l>tan garrida y con amor;</l> <!-- ... etc. --> </quote>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element foreign { att.global.attributes, macro.phraseSeq }

Appendix A.1.19 <front>

<front> (front matter) contains any prefatory matter (headers, abstracts, title page, prefaces, dedications, etc.) found at the start of a document, before the main body. [4.6. Title Pages 4. Default Text Structure]
Moduletextstructure
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
textstructure: text
May contain
analysis: span spanGrp
textstructure: div trailer
Note

Because cultural conventions differ as to which elements are grouped as front matter and which as back matter, the content models for the <front> and <back> elements are identical.

Example
<front>  <div type="titlePage">   <p>Bucura Dumbravă</p>   <p>HAIDUCUL</p>   <p>Tradus</p>   <p>de</p>   <p>Teodor Nica</p>   <p>ediția a IV-a</p>   <p>București</p>   <p>Editura Librăriei Școlelor C. Sfetea</p>   <p>63-64, - Calea Moșilor, - 62-64</p>   <p>1919</p>  </div>  <div type="liminal">   <head>PREFAȚĂ LA EDȚIA ÎNTÂIA</head>   <p>Isvoarele, de cari m'am slujit la studiul vieții lin Iancu Jianu și a timpului său,      sunt cele următoare...</p>   <p>BUCURA DUMBRAVĂ.</p>   <p>București, 1911.</p>  </div> </front>
Schematron
<sch:assert test="child::tei:div[@type='titlepage'] or child::tei:div[@type='liminal']"  role="ERROR">The front matter of a text must contain either liminal or titlepage divisions</sch:assert>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <alternate minOccurs="0"
   maxOccurs="unbounded">
   <classRef key="model.frontPart"/>
   <classRef key="model.pLike"/>
   <classRef key="model.pLike.front"/>
   <classRef key="model.global"/>
  </alternate>
  <sequence minOccurs="0" maxOccurs="1">
   <alternate minOccurs="1" maxOccurs="1">
    <sequence minOccurs="1" maxOccurs="1">
     <classRef key="model.div1Like"/>
     <alternate minOccurs="0"
      maxOccurs="unbounded">
      <classRef key="model.div1Like"/>
      <classRef key="model.frontPart"/>
      <classRef key="model.global"/>
     </alternate>
    </sequence>
    <sequence minOccurs="1" maxOccurs="1">
     <classRef key="model.divLike"/>
     <alternate minOccurs="0"
      maxOccurs="unbounded">
      <classRef key="model.divLike"/>
      <classRef key="model.frontPart"/>
      <classRef key="model.global"/>
     </alternate>
    </sequence>
   </alternate>
   <sequence minOccurs="0" maxOccurs="1">
    <classRef key="model.divBottom"/>
    <alternate minOccurs="0"
     maxOccurs="unbounded">
     <classRef key="model.divBottom"/>
     <classRef key="model.global"/>
    </alternate>
   </sequence>
  </sequence>
 </sequence>
</content>
    
Schema Declaration
element front
{
   att.global.attributes,
   (
      ( model.frontPart | model.pLike | model.pLike.front | model.global )*,
      (
         (
            (
               model.div1Like,
               ( model.div1Like | model.frontPart | model.global )*
            )
          | (
               model.divLike,
               ( model.divLike | model.frontPart | model.global )*
            )
         ),
         ( model.divBottom, ( model.divBottom | model.global )* )?
      )?
   )
}

Appendix A.1.20 <gap>

<gap> (gap) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible. [3.5.3. Additions, Deletions, and Omissions]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.dimensions (@unit, @quantity, @extent)
Member of
Contained by
May containEmpty element
Note

In an ELTeC level 1 transcription, the unit attribute of this element may be used to indicate what has been omitted from a transcription.

ExampleTwo consecutive graphic components omitted from transcription:
<gap unit="graphicquantity="2"/>
ExampleTable of contents omitted from transcription:
<gap unit="toc"/>
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <classRef key="model.descLike"/>
  <classRef key="model.certLike"/>
 </alternate>
</content>
    
Schema Declaration
element gap
{
   att.global.attributes,
   att.dimensions.attributes,
   ( model.descLike | model.certLike )*
}

Appendix A.1.21 <head>

<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. [4.2.1. Headings and Trailers]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type)
Member of
Contained by
textstructure: back body div front
May contain
Note

The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section.

Example
<div type="part">  <head>BOOK I.</head>  <head>MISS BROOKE.</head>  <div type="chapter">   <head>CHAPTER I.</head>   <quote> Since I can do no good because a woman, Reach constantly at something that is near      it. —The Maid's Tragedy: BEAUMONT AND FLETCHER. </quote>   <p>Miss Brooke had that kind of beauty which seems to be thrown into relief by poor      dress.... </p> <!-- ... -->  </div> <!-- ... --> </div>
A heading of any kind at the start of a division of any kind may be marked using <head>. In this example, there are two headings at the start of the first part, and one at the start of the first chapter. The epigraph at the start of the first chapter is marked up as a quotation and is not a heading.
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <elementRef key="lg"/>
  <classRef key="model.gLike"/>
  <classRef key="model.phrase"/>
  <classRef key="model.inter"/>
  <classRef key="model.lLike"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Schema Declaration
element head
{
   att.global.attributes,
   att.typed.attributes,
   (
      text
    | lg
    | model.gLike
    | model.phrasemodel.intermodel.lLikemodel.global
   )*
}

Appendix A.1.22 <hi>

<hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Member of
Contained by
May contain
Note

This element is used at level0 for any kind of typographic salience recorded by the encoder. At level1 a semantic interpretation (for example using <emph>, <foreign> etc.) replaces it.

Example
<p>Ha önök ismernék a <hi>Spleen</hi>-t, a <hi>Végtelenség </hi> prométheüszi gőgjét, a <hi>Rejtelmek</hi> fásultságát és magas röptét, szóval, ha önök ismernék <hi>Icarus</hi>t, alkalmasint ilyenformán méltóztatnának okos­kodni...</p>
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element hi { att.global.attributes, macro.paraContent }

Appendix A.1.23 <idno>

<idno> (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way. [13.3.1. Basic Principles 2.2.4. Publication, Distribution, Licensing, etc. 2.2.5. The Series Statement 3.12.2.4. Imprint, Size of a Document, and Reprint Information]
Moduleheader
AttributesAttributes
typecategorizes the identifier, for example as an ISBN, Social Security number, etc.
Derived fromatt.typed
StatusOptional
Datatypeteidata.enumerated
Suggested values include:
ISBN
International Standard Book Number: a 13- or (if assigned prior to 2007) 10-digit identifying number assigned by the publishing industry to a published book or similar item, registered with the International ISBN Agency.
ISSN
International Standard Serial Number: an eight-digit number to uniquely identify a serial publication.
DOI
Digital Object Identifier: a unique string of letters and numbers assigned to an electronic document.
URI
Uniform Resource Identifier: a string of characters to uniquely identify a resource which usually contains indication of the means of accessing that resource, the name of its host, and its filepath.
VIAF
A data number in the Virtual Internet Authority File assigned to link different names in catalogs around the world for the same entity.
ESTC
English Short-Title Catalogue number: an identifying number assigned to a document in English printed in the British Isles or North America before 1801.
OCLC
OCLC control number (record number) for the union catalog record in WorldCat, a union catalog for member libraries in the Online Computer Library Center global cooperative.
Member of
Contained by
core: bibl
header: idno
May contain
header: idno
character data
Note

<idno> should be used for labels which identify an object or concept in a formal cataloguing system such as a database or an RDF store, or in a distributed system such as the World Wide Web. Some suggested values for type on <idno> are ISBN, ISSN, DOI, and URI.

Example
<idno type="ISBN">978-1-906964-22-1</idno> <idno type="ISSN">0143-3385</idno> <idno type="DOI">10.1000/123</idno> <idno type="URI">http://www.worldcat.org/oclc/185922478</idno> <idno type="URI">http://authority.nzetc.org/463/</idno> <idno type="LT">Thomason Tract E.537(17)</idno> <idno type="Wing">C695</idno> <idno type="oldCat">  <g ref="#sym"/>345 </idno>
In the last case, the identifier includes a non-Unicode character which is defined elsewhere by means of a <glyph> or <char> element referenced here as #sym.
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <elementRef key="idno"/>
 </alternate>
</content>
    
Schema Declaration
element idno
{
   attribute type
   {
      "ISBN" | "ISSN" | "DOI" | "URI" | "VIAF" | "ESTC" | "OCLC"
   }?,
   ( text | model.gLike | idno )*
}

Appendix A.1.24 <keywords>

<keywords> (keywords) contains a list of keywords or phrases identifying the topic or nature of a text. [2.4.3. The Text Classification]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
schemeidentifies the controlled vocabulary within which the set of keywords concerned is defined, for example by a <taxonomy> element, or by some other resource.
StatusOptional
Datatypeteidata.pointer
Contained by
header: textClass
May contain
core: term
Note

In ELTeC texts, this element may only be used within a <textClass> element, and may contain only a sequence of <term> elements. Its usage is optional.

Example
<textClass>  <keywords>   <term xml:lang="eng">juvenile literature</term>   <term xml:lang="deu">bildungsroman</term>  </keywords> </textClass>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="term" minOccurs="1"
   maxOccurs="unbounded"/>
  <elementRef key="list"/>
 </alternate>
</content>
    
Schema Declaration
element keywords
{
   att.global.attributes,
   attribute scheme { text }?,
   ( term+ | list )
}

Appendix A.1.25 <l>

<l> (verse line) contains a single, possibly incomplete, line of verse. [3.13.1. Core Tags for Verse 3.13. Passages of Verse or Drama 7.2.5. Speech Contents]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Member of
Contained by
header: change
textstructure: body div trailer
May contain
Example
<p>Mimi chante avec un sourire gracieux et un désintéressement tout particulier des couplets dont voici le refrain: <quote>   <l>Notre bonheur est accompli</l>   <l>Voilà le culte rétabli.</l>  </quote> </p>
Schematron
<s:report test="ancestor::tei:l[not(.//tei:note//tei:l[. = current()])]"> Abstract model violation: Lines may not contain lines or lg elements. </s:report>
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <classRef key="model.phrase"/>
  <classRef key="model.inter"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Schema Declaration
element l
{
   att.global.attributes,
   ( text | model.gLike | model.phrase | model.inter | model.global )*
}

Appendix A.1.26 <label>

<label> (label) contains any label or heading used to identify part of a text, typically but not exclusively in a list or glossary. [3.8. Lists]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type)
Member of
Contained by
header: change
textstructure: body div trailer
May contain
analysis: pc s span spanGrp w
character data
Example
<p>  <label>April 5.</label>-Two shoulders of mutton arrived, Carrie having arranged with another butcher without consulting me. Gowing called, and fell over scraper coming in. <hi>Must</hi> get that scraper removed. </p> <p>  <label>April 6.</label>-Eggs for breakfast simply shocking; sent them back to Borset with my compliments, and he needn't call any more for orders. </p>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element label { att.global.attributes, att.typed.attributes, macro.phraseSeq }

Appendix A.1.27 <langUsage>

<langUsage> (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text. [2.4.2. Language Usage 2.4. The Profile Description 15.3.2. Declarable Elements]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: profileDesc
May contain
core: p
header: language
Exampleprovides information about the language of the novel, using <language>
<langUsage>  <language ident="fra">French</language> </langUsage>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <classRef key="model.pLike" minOccurs="1"
   maxOccurs="unbounded"/>
  <elementRef key="language" minOccurs="1"
   maxOccurs="unbounded"/>
 </alternate>
</content>
    
Schema Declaration
element langUsage { att.global.attributes, ( model.pLike+ | language+ ) }

Appendix A.1.28 <language>

<language> (language) characterizes a single language or sublanguage used within a text. [2.4.2. Language Usage]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
ident(identifier) Supplies a language code constructed as defined in BCP 47 which is used to identify the language documented by this element, and which is referenced by the global xml:lang attribute.
StatusRequired
Datatypeteidata.language
usagespecifies the approximate percentage (by volume) of the text which uses this language.
StatusOptional
DatatypenonNegativeInteger
Contained by
header: langUsage
May contain
analysis: span spanGrp
character data
Note

Particularly for sublanguages, an informal prose characterization should be supplied as content for the element.

Example
<langUsage>  <language ident="en-USusage="75">modern American English</language>  <language ident="i-az-Arabusage="20">Azerbaijani in Arabic script</language>  <language ident="x-lapusage="05">Pig Latin</language> </langUsage>
Content model
<content>
 <macroRef key="macro.phraseSeq.limited"/>
</content>
    
Schema Declaration
element language
{
   att.global.attributes,
   attribute ident { text },
   attribute usage { text }?,
   macro.phraseSeq.limited
}

Appendix A.1.29 <licence>

<licence> contains information about a licence or other legal agreement applicable to the text. [2.2.4. Publication, Distribution, Licensing, etc.]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
targetspecifies the destination of the reference by supplying one or more URI References
Derived fromatt.pointing
StatusRequired
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Contained by
header: availability
May contain
core: p
Note

The TEI XML markup added to all components of ELTeC is made available under a CC-BY licence. The textual content is in the public domain.

Example
<licence target="https://creativecommons.org/licenses/by/4.0/">  <p>The TEI mark up is licenced with Creative Commons Attribution (CC-BY 4.0).</p> </licence>
Content model
<content>
 <elementRef key="p" minOccurs="0"/>
</content>
    
Schema Declaration
element licence { att.global.attributes, attribute target { list { + } }, p? }

Appendix A.1.30 <measure>

<measure> (measure) contains a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name. [3.6.3. Numbers and Measures]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.measurement (unit, @unitRef, @quantity, @commodity)
unit(unit) specifies the units used for this measurement
Derived fromatt.measurement
StatusRequired
Datatypeteidata.enumerated
Legal values are:
pages
number of pages in the whole of the source text
words
number of space delimited tokens in the transcribed source text, excluding the header
vols
number of volumes in the original source text
Contained by
header: extent
May contain
XSD token
Note

An indication of the number of words is mandatory. Indicating the number of pages is optional. If information for page or volume count is not available the relevant <measure> element should be absent.

Spaces and other punctuation marks are not permitted as content of the <measure> element.

Exampledescribes two measurements for <extent>: the number of words and the number of pages
<extent>  <measure unit="words">71043</measure>  <measure unit="pages">364</measure> </extent>
Content model
<content>
 <dataRef key="teidata.numeric"/>
</content>
    
Schema Declaration
element measure
{
   att.global.attributes,
   att.measurement.attribute.unitRef,
   att.measurement.attribute.quantity,
   att.measurement.attribute.commodity,
   attribute unit { "pages" | "words" | "vols" },
   teidata.numeric
}

Appendix A.1.31 <milestone>

<milestone> (milestone) marks a boundary point separating any kind of section of a text, typically but not necessarily indicating a point at which some part of a standard reference system changes, where the change is not represented by a structural element. [3.11.3. Milestone Elements]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.milestoneUnit (@unit) att.typed (@type)
Member of
Contained by
May containEmpty element
Note

The unit attribute should describe the kind of unit delimited by the tag, for example "subSection"; it is mandatory. If the milestone numbers or labels the unit in question, the n attribute may be used to carry the name or number given. The type attribute should describe the kind of milestone indication found in the source, for example "stars", "line", "numbering", etc.; it is optional.

Example
<milestone unit="subSection"  type="asterisk"/>
Example
<div type="group">  <head>BOOK THE FIRST</head>  <head>THE DAYS BEFORE TONO-BUNGAY WAS INVENTED</head>  <div type="chapter">   <head>CHAPTER THE FIRST</head>   <milestone unit="subSectionn="I"/>   <p>Most people in this world seem to live "in character" ... </p> <!-- ... -->   <p>....of an altogether different sort from that of Tono-Bungay.</p>   <milestone unit="subSectionn="II"/>   <p>I write that much and look at it, and wonder ... </p>  </div> </div>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element milestone
{
   att.global.attributes,
   att.milestoneUnit.attributes,
   att.typed.attributes,
   empty
}

Appendix A.1.32 <name>

<name> (name, proper noun) contains a proper noun or noun phrase. [3.6.1. Referring Strings]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.datable (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) att.typed (@type)
Member of
Contained by
core: respStmt
May contain
analysis: pc s span spanGrp w
character data
Note

Not permitted outside the header

Exampleprovides the name of a person who is neither an author nor a publisher
<name>Christof Schöch</name>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element name
{
   att.global.attributes,
   att.datable.attributes,
   att.typed.attributes,
   macro.phraseSeq
}

Appendix A.1.33 <note>

<note> (note) contains a note or annotation. [3.9.1. Notes and Simple Annotation 2.2.6. The Notes Statement 3.12.2.8. Notes and Statement of Language 9.3.5.4. Notes within Entries]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.pointing (@target) att.typed (@type)
Member of
Contained by
May contain
ExampleIn this example, two authorial footnotes have been encoded. Each note is encoded with a <note> element, and carries a unique identifier on its xml:id attribute. A <ref> element replaces the siglum in the running text indicating the point where the note is attached.
<body> <!-- ... -->  <p>Au milieu de ces spirituels convives on remarquait une figure angélique, c'était celle    de la fille de madame de Condorcet, de cette ravissante Eliza <ref target="#FR0726_N1">[1]</ref> qui, à peine dans l'âge de l'adolescence, avait déjà la taille et les traits    réguliers d'une statue grecque.</p> <!-- ... -->  <p>—Je ne sortirai point aujourd'hui, j'ai mal à la tête, une longue coiffure me    fatiguerait; Ellénore arrangera mes cheveux, et me mettra ma baigneuse <ref target="#FR0726_N2">[2]</ref>.</p> </body> <back>  <div type="notes">   <note xml:id="FR0726_N1">[Note 1: Elle a épousé depuis M. O'Connor.]</note>   <note xml:id="FR0726_N2">[Note 2: Sorte de bonnet négligé, qui était à la mode en ce      temps.]</note>  </div> </back>
Schematron
<sch:assert test="parent::tei:div[@type='notes']"  role="ERROR">Notes must be given out of line and inside a div of type 'notes'</sch:assert>
Content model
<content>
 <macroRef key="macro.specialPara"/>
</content>
    
Schema Declaration
element note
{
   att.global.attributes,
   att.pointing.attributes,
   att.typed.attributes,
   macro.specialPara
}

Appendix A.1.34 <p>

<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Member of
Contained by
May contain
Example
<div type="chapter">  <head> 1</head>  <p>Fräulein Lotti war soeben erwacht. ....</p>  <p>Frau Katze schüttelt den Kopf, schließt die Augen, leckt die fadendünnen Lippen und    gähnt wie ein Tiger.</p>  <p>Ihre Gebieterin hakt den Fensterflügel ein, damit die Spaziergängerin bequem eintreten    könne, wenn es ihr genehm sein würde heimzukehren....</p> <!-- ... --> </div>
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element p { att.global.attributes, macro.paraContent }

Appendix A.1.35 <pb>

<pb> (page beginning) marks the beginning of a new page in a paginated document. [3.11.3. Milestone Elements]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type)
facslink to a graphic image of the page beginning here
StatusOptional
DatatypeanyURI
Member of
Contained by
May containEmpty element
Note

A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself.

ExampleA page break may be associated with a facsimile image of the page it introduces by means of the facs attribute
<body>  <pb n="1facs="page1.png"/> <!-- page1.png contains an image of the page; the text it contains is encoded here -->  <p> <!-- ... -->  </p>  <pb n="2facs="page2.png"/> <!-- similarly, for page 2 -->  <p> <!-- ... -->  </p> </body>
ExampleIf a page break interrupts a word the word fragments should be reassembled following it.
<p>My own relations too were nobly generous and by their kindness I have been <pb n="100"/> established in this shop, and for the last year have carried on this little business.... </p>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element pb
{
   att.global.attributes,
   att.typed.attributes,
   attribute facs { text }?,
   empty
}

Appendix A.1.36 <pc>

<pc> (punctuation character) contains a character or string of characters regarded as constituting a single punctuation mark. [17.1.2. Below the Word Level 17.4.2. Lightweight Linguistic Annotation]
Moduleanalysis
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.segLike (@function) att.typed (@type) att.linguistic (@lemma, @lemmaRef, @pos, @msd, @join)
forceindicates the extent to which this punctuation mark conventionally separates words or phrases
StatusOptional
Datatypeteidata.enumerated
Legal values are:
strong
the punctuation mark is a word separator
weak
the punctuation mark is not a word separator
inter
the punctuation mark may or may not be a word separator
unitprovides a name for the kind of unit delimited by this punctuation mark.
StatusOptional
Datatypeteidata.enumerated
preindicates whether this punctuation mark precedes or follows the unit it delimits.
StatusOptional
Datatypeteidata.truthValue
Member of
Contained by
May contain
core: corr
character data
Example
<phr>  <w>do</w>  <w>you</w>  <w>understand</w>  <pc type="interrogative">?</pc> </phr>
ExampleExample encoding of the German sentence Wir fahren in den Urlaub., encoded with attributes from att.linguistic discussed in section [[undefined AILALW]].
<s>  <w pos="PPERmsd="1.Pl.*.Nom">Wir</w>  <w pos="VVFINmsd="1.Pl.Pres.Ind">fahren</w>  <w pos="APPRmsd="--">in</w>  <w pos="ARTmsd="Def.Masc.Akk.Sg.">den</w>  <w pos="NNmsd="Masc.Akk.Sg.">Urlaub</w>  <pc pos="$.msd="--join="left">.</pc> </s>
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <elementRef key="c"/>
  <classRef key="model.pPart.edit"/>
 </alternate>
</content>
    
Schema Declaration
element pc
{
   att.global.attributes,
   att.segLike.attributes,
   att.typed.attributes,
   att.linguistic.attributes,
   attribute force { "strong" | "weak" | "inter" }?,
   attribute unit { text }?,
   attribute pre { text }?,
   ( text | model.gLike | c | model.pPart.edit )*
}

Appendix A.1.37 <profileDesc>

<profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. [2.4. The Profile Description 2.1.1. The TEI Header and Its Components]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: teiHeader
May contain
Note

Although the content model permits it, it is rarely meaningful to supply multiple occurrences for any of the child elements of <profileDesc> unless these are documenting multiple texts.

Example
<profileDesc    xmlns:e="http://distantreading.net/eltec/ns">  <langUsage>   <language ident="fra">French</language>  </langUsage>  <textDesc>   <e:authorGender key="M"/>   <e:size key="long"/>   <e:reprintCount key="high"/>   <e:timeSlot key="T1"/>  </textDesc> </profileDesc>
Profile for a French text, with a male author, containing more than 100,000 words, of high reprintCount, first published between 1840 and 1859.
Example
<profileDesc    xmlns:e="http://distantreading.net/eltec/ns">  <langUsage>   <language ident="de">German</language>  </langUsage>  <textDesc>   <authorGender xmlns="http://distantreading.net/eltec/ns" key="F"/>   <size xmlns="http://distantreading.net/eltec/ns" key="long"/>   <reprintCount xmlns="http://distantreading.net/eltec/ns" key="low"/>   <timeSlot xmlns="http://distantreading.net/eltec/ns" key="T4"/>  </textDesc> </profileDesc>
Profile for a German text, with a female author, containing between 10 and 50,000 words, of low reprintCount, first published between 1900 and 1920.
ExampleIf descriptive keywords are available for a text, these may be included within a <textClass> element prefixed to the <textDesc>, as in this example:
<profileDesc    xmlns:e="http://distantreading.net/eltec/ns">  <langUsage>   <language ident="de">German</language>  </langUsage>  <textClass>   <keywords>    <term>bildungsroman</term>   </keywords>  </textClass>  <textDesc>   <authorGender xmlns="http://distantreading.net/eltec/ns" key="F"/>   <size xmlns="http://distantreading.net/eltec/ns" key="long"/>   <reprintCount xmlns="http://distantreading.net/eltec/ns" key="low"/>   <timeSlot xmlns="http://distantreading.net/eltec/ns" key="T4"/>  </textDesc> </profileDesc>
Content model
<content>
 <elementRef key="langUsage" minOccurs="1"
  maxOccurs="1"/>
 <elementRef key="textClass" minOccurs="0"
  maxOccurs="1"/>
 <elementRef key="textDesc" minOccurs="1"
  maxOccurs="1"/>
</content>
    
Schema Declaration
element profileDesc { att.global.attributes, langUsage, textClass?, textDesc }

Appendix A.1.38 <pubPlace>

<pubPlace> (publication place) contains the name of the place where a bibliographic item was published. [3.12.2.4. Imprint, Size of a Document, and Reprint Information]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Member of
Contained by
core: bibl
May contain
analysis: pc s span spanGrp w
character data
Example
<bibl type="firstEdition">  <title>Herança de lágrimas</title>  <author>Lopo de Souza</author>  <publisher>Redação do Vimaranense-Editora</publisher>  <pubPlace>Guimarães</pubPlace>  <date>1871</date> </bibl>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element pubPlace { att.global.attributes, macro.phraseSeq }

Appendix A.1.39 <publicationStmt>

<publicationStmt> (publication statement) groups information concerning the publication or distribution of an electronic or other text. [2.2.4. Publication, Distribution, Licensing, etc. 2.2. The File Description]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: fileDesc
May contain
Note

In a published ELTeC text, the publication statement always has components as shown above: a <publisher> naming the project itself, a <distributor> specifying the Zenodo community within which the text is made available, a <date> showing the text was released, and an <availability> element indicating the licence under which it is made available. One or more <ref> elements may also follow specifying a URL from which the text may be downloaded. These details are added during the publication process, if not already present.

Example
<publicationStmt>  <publisher ref="https://distant-reading.net">COST Action "Distant Reading for European    Literary History" (CA16204)</publisher>  <distributor ref="https://zenodo.org/communities/eltec/">Zenodo.org</distributor>  <date when="{$today}"/>  <availability>   <licence target="https://creativecommons.org/licenses/by/4.0/"/>  </availability>  <ref type="doi"   target="10.5281/zenodo.8468"/>  <ref type="raw"   target="https://raw.githubusercontent.com/COST-ELTeC/ELTeC-eng/master/level1/ENG18440_Disraeli.xml"/> </publicationStmt>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="p"/>
  <sequence minOccurs="1" maxOccurs="1">
   <elementRef key="publisher"/>
   <elementRef key="distributor"/>
   <elementRef key="date"/>
   <elementRef key="availability"/>
   <elementRef key="ref"
    maxOccurs="unbounded" minOccurs="0"/>
  </sequence>
 </alternate>
</content>
    
Schema Declaration
element publicationStmt
{
   att.global.attributes,
   ( p | ( publisher, distributor, date, availability, ref* ) )
}

Appendix A.1.40 <publisher>

<publisher> (publisher) provides the name of the organization responsible for the publication or distribution of a bibliographic item. [3.12.2.4. Imprint, Size of a Document, and Reprint Information 2.2.4. Publication, Distribution, Licensing, etc.]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref)
Member of
Contained by
core: bibl
May contain
analysis: pc s span spanGrp w
character data
Note

Not permitted outside the header

Example
<bibl type="firstEdition">  <title>La baronne trépassée</title>  <publisher>Baudry</publisher>  <pubPlace>Paris</pubPlace>  <date>1852</date> </bibl>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element publisher
{
   att.global.attributes,
   att.canonical.attributes,
   macro.phraseSeq
}

Appendix A.1.41 <quote>

<quote> (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text. [3.3.3. Quotation 4.3.1. Grouped Texts]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) att.notated (@notation)
Member of
Contained by
May contain
Note

In ELTeC this element is used for any kind of quotation or pseudo quotation appearing in the body of a text, including epigraphs, citations, etc.

ExampleIn this example, the two lines of verse are quoted and do not form part of the narrative:
<p>О, многозначајне ли су речи покојног Његуша II:</p> <quote>  <l>„Благо томе ко довијек живи,</l>  <l>имао се рашта и родити!...</l> </quote>
Content model
<content>
 <macroRef key="macro.specialPara"/>
</content>
    
Schema Declaration
element quote
{
   att.global.attributes,
   att.typed.attributes,
   att.notated.attributes,
   macro.specialPara
}

Appendix A.1.42 <ref>

<ref> (reference) defines a reference to another location, possibly modified by additional text or comment. [3.7. Simple Links and Cross-References 16.1. Links]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.pointing (@target) att.typed (@type)
Member of
Contained by
May contain
Note

In ELTeC the <ref> element is used only to provide a link from the body of a text to an associated authorial note. Its content is conventionalised as shown and may be removed in a level2 version of the text.

Example
<p>"May happen <ref target="#ENG18482_N21">[21]</ref> yo'd better take him, Alice;...</p>
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element ref
{
   att.global.attributes,
   att.pointing.attributes,
   att.typed.attributes,
   macro.paraContent
}

Appendix A.1.43 <reprintCount>

<reprintCount> indicates how frequently the title has been reprinted
Namespacehttp://distantreading.net/eltec/ns
Modulederived-module-ELTeC
AttributesAttributes
key
StatusRequired
Datatypeteidata.enumerated
Legal values are:
high
text reprinted very frequently during the period 1970 - 2009
low
text reprinted only occasionally or not at all during the period 1970 - 2009
unspecified
information about the number of reprints not yet determined
Note

The number of times a work is reprinted may be considered some indication of the extent to which it is regarded as canonical or relevant, though other factors may also play a significant part.

Contained by
corpus: textDesc
May containEmpty element
Example
<textDesc    xmlns:e="http://distantreading.net/eltec/ns"> <!-- ... -->  <reprintCount xmlns="http://distantreading.net/eltec/ns" key="high"/> <!-- ... --> </textDesc>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element reprintCount
{
   attribute key { "high" | "low" | "unspecified" },
   empty
}

Appendix A.1.44 <resp>

<resp> (responsibility) contains a phrase describing the nature of a person's intellectual responsibility, or an organization's role in the production or distribution of a work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref) att.datable (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to))
Contained by
core: respStmt
May contain
analysis: span spanGrp
character data
Note

The attribute ref, inherited from the class att.canonical may be used to indicate the kind of responsibility in a normalized form by referring directly to a standardized list of responsibility types, such as that maintained by a naming authority, for example the list maintained at http://www.loc.gov/marc/relators/relacode.html for bibliographic usage.

Example
<respStmt>  <resp ref="http://id.loc.gov/vocabulary/relators/com.html">compiler</resp>  <name>Edward Child</name> </respStmt>
Content model
<content>
 <macroRef key="macro.phraseSeq.limited"/>
</content>
    
Schema Declaration
element resp
{
   att.global.attributes,
   att.canonical.attributes,
   att.datable.attributes,
   macro.phraseSeq.limited
}

Appendix A.1.45 <respStmt>

<respStmt> (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organizations which have played a role in the production or distribution of a bibliographic work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref)
Member of
Contained by
core: bibl
header: titleStmt
May contain
core: name note resp
Note

In ELTeC this element is used either within the <titleStmt>, where it documents responsibility for some aspect of the ELTeC text's creation, or within a <bibl> where it documents responsibity for the bibliographic item concerned (other than authorship)

Example
<respStmt>  <resp>ELTeC conversion</resp>  <name>Leonard Konle</name> </respStmt>
ExampleWhen several names are associated with the same responsibility, they may be grouped within a single <respStmt> as in the following example:
<respStmt>  <resp>Original data capture</resp>  <name>Meredith Bach</name>  <name>Mary Meehan</name>  <name>Online Distributed Proofreading Team</name> </respStmt>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <alternate minOccurs="1" maxOccurs="1">
   <sequence minOccurs="1" maxOccurs="1">
    <elementRef key="resp" minOccurs="1"
     maxOccurs="unbounded"/>
    <classRef key="model.nameLike.agent"
     minOccurs="1" maxOccurs="unbounded"/>
   </sequence>
   <sequence minOccurs="1" maxOccurs="1">
    <classRef key="model.nameLike.agent"
     minOccurs="1" maxOccurs="unbounded"/>
    <elementRef key="resp" minOccurs="1"
     maxOccurs="unbounded"/>
   </sequence>
  </alternate>
  <elementRef key="note" minOccurs="0"
   maxOccurs="unbounded"/>
 </sequence>
</content>
    
Schema Declaration
element respStmt
{
   att.global.attributes,
   att.canonical.attributes,
   (
      ( ( resp+, model.nameLike.agent+ ) | ( model.nameLike.agent+, resp+ ) ),
      note*
   )
}

Appendix A.1.46 <revisionDesc>

<revisionDesc> (revision description) summarizes the revision history for a file. [2.6. The Revision Description 2.1.1. The TEI Header and Its Components]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: teiHeader
May contain
header: change
Note

When several significant changes are recorded, each should be documented using a separate <change> element, given in reverse chronological order i.e. most recent first.

Example
<revisionDesc>  <change when="2018-12-12">Spell check completed</change>  <change when="2018-11-01">Initial conversion to ELTeC-1 using CLIGStoELTeC stylesheet  </change> </revisionDesc>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="list"/>
  <elementRef key="listChange"/>
  <elementRef key="change" minOccurs="1"
   maxOccurs="unbounded"/>
 </alternate>
</content>
    
Schema Declaration
element revisionDesc
{
   att.global.attributes,
   ( list | listChange | change+ )
}

Appendix A.1.47 <rs>

<rs> (referencing string) contains a general purpose name or referring string. [13.2.1. Personal Names 3.6.1. Referring Strings]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type)
Member of
Contained by
May contain
analysis: pc s span spanGrp w
character data
Example
<q>My dear <rs type="person">Mr. Bennet</rs>, </q> said <rs type="person">his lady</rs> to him one day, <q>have you heard that <rs type="place">Netherfield Park</rs> is let at last?</q>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element rs { att.global.attributes, att.typed.attributes, macro.phraseSeq }

Appendix A.1.48 <s>

<s> (s-unit) contains a sentence-like division of a text. [17.1. Linguistic Segment Categories 8.4.1. Segmentation]
Moduleanalysis
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.segLike (@function) att.typed (@type) att.notated (@notation)
Member of
Contained by
May contain
Note

The <s> element may be used to mark orthographic sentences, or any other segmentation of a text, provided that the segmentation is end-to-end, complete, and non-nesting. For segmentation which is partial or recursive, the <seg> should be used instead.

The type attribute may be used to indicate the type of segmentation intended, according to any convenient typology.

Example
<s>  <w pos="DET">Here</w>  <w pos="AUXjoin="leftlemma="be">'s</w>  <w pos="DET">a</w>  <emph>   <w pos="ADV">really</w>   <w pos="ADJ">silly</w>  </emph>  <w pos="NOUN">example</w>  <pc join="left">.</pc> </s>
Schematron
<s:report test="tei:s">You may not nest one s element within another: use seg instead</s:report>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <classRef key="model.global"/>
  <classRef key="model.pPart.edit"/>
  <classRef key="model.limitedPhrase"/>
 </alternate>
</content>
    
Schema Declaration
element s
{
   att.global.attributes,
   att.segLike.attributes,
   att.typed.attributes,
   att.notated.attributes,
   ( w | pc | model.global | model.pPart.edit | model.limitedPhrase )+
}

Appendix A.1.49 <size>

<size> indicates the size group to which the text belongs
Namespacehttp://distantreading.net/eltec/ns
Modulederived-module-ELTeC
AttributesAttributes
key
StatusRequired
Datatypeteidata.enumerated
Legal values are:
long
more than 100,000 words
medium
50,000 to 100,000 words
short
10,000 to 50,000 words
Contained by
corpus: textDesc
May containEmpty element
Exampleindicates that a novel contains more than 100,000 words (long)
<textDesc    xmlns:e="http://distantreading.net/eltec/ns"> <!-- ... -->  <size xmlns="http://distantreading.net/eltec/ns" key="long"/> <!-- ... --> </textDesc>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element size { attribute key { "long" | "medium" | "short" }, empty }

Appendix A.1.50 <sourceDesc>

<sourceDesc> (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. [2.2.7. The Source Description]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: fileDesc
May contain
core: bibl p
ExampleIn ELTeC, the source or sources of a text are documented using one or more <bibl> elements of appropriate types. In this case, the ELTeC text derives from a digital version published by Éfélé in 2014 which is believed to be derived from a first edition published in Paris in 1848.
<sourceDesc>  <bibl type="digitalSource">   <publisher> Éfélé</publisher>, <date>2014</date>   <ref target="http://efele.net/ebooks/livres/000067"/>  </bibl>  <bibl type="firstEdition">Paris: Furne, J.-J. Dubochet et Cie, J. Hetzel et Paulin,  <date>1848</date>. </bibl> </sourceDesc>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <classRef key="model.pLike" minOccurs="1"
   maxOccurs="unbounded"/>
  <alternate minOccurs="1"
   maxOccurs="unbounded">
   <classRef key="model.biblLike"/>
   <classRef key="model.sourceDescPart"/>
   <classRef key="model.listLike"/>
  </alternate>
 </alternate>
</content>
    
Schema Declaration
element sourceDesc
{
   att.global.attributes,
   (
      model.pLike+
    | ( model.biblLike | model.sourceDescPart | model.listLike )+
   )
}

Appendix A.1.51 <span>

<span> associates an interpretative annotation directly with a span of text. [17.3. Spans and Interpretations]
Moduleanalysis
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.pointing (@target) att.interpLike (@inst)
typeindicates what kind of phenomenon is being noted in the passage.
StatusRecommended
Datatypeteidata.enumerated
Sample values include:
image
identifies an image in the passage.
character
identifies a character associated with the passage.
theme
identifies a theme in the passage.
allusion
identifies an allusion to another text.
fromgives the identifier of the node which is the starting point of the span of text being annotated; if not accompanied by a to attribute, gives the identifier of the node of the entire span of text being annotated.
StatusOptional
Datatypeteidata.pointer
togives the identifier of the node which is the end-point of the span of text being annotated.
StatusOptional
Datatypeteidata.pointer
Member of
Contained by
May contain
analysis: span spanGrp
character data
Example
<p xml:id="para2">(The "aftermath" starts here)</p> <p xml:id="para3">(The "aftermath" continues here)</p> <p xml:id="para4">(The "aftermath" ends in this paragraph)</p> <!-- ... --> <span type="structurefrom="#para2"  to="#para4">aftermath</span>
Schematron
<s:report test="@from and @target">Only one of the attributes @target and @from may be supplied on <s:name/> </s:report>
Schematron
<s:report test="@to and @target">Only one of the attributes @target and @to may be supplied on <s:name/> </s:report>
Schematron
<s:report test="@to and not(@from)">If @to is supplied on <s:name/>, @from must be supplied as well</s:report>
Schematron
<s:report test="contains(normalize-space(@to),' ') or contains(normalize-space(@from),' ')">The attributes @to and @from on <s:name/> may each contain only a single value</s:report>
Content model
<content>
 <macroRef key="macro.phraseSeq.limited"/>
</content>
    
Schema Declaration
element span
{
   att.global.attributes,
   att.interpLike.attribute.inst,
   att.pointing.attributes,
   attribute type { text }?,
   attribute from { text }?,
   attribute to { text }?,
   macro.phraseSeq.limited
}

Appendix A.1.52 <spanGrp>

<spanGrp> (span group) collects together span tags. [17.3. Spans and Interpretations]
Moduleanalysis
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.interpLike (@inst)
typeindicates what kind of phenomenon is being noted in the passage.
StatusRecommended
Datatypeteidata.enumerated
Sample values include:
image
identifies an image in the passage.
character
identifies a character associated with the passage.
theme
identifies a theme in the passage.
allusion
identifies an allusion to another text.
Member of
Contained by
May contain
analysis: span
Example
<u xml:id="UU1">Can I have ten oranges and a kilo of bananas please?</u> <u xml:id="UU2">Yes, anything else?</u> <u xml:id="UU3">No thanks.</u> <u xml:id="UU4">That'll be dollar forty.</u> <u xml:id="UU5">Two dollars</u> <u xml:id="UU6">Sixty, eighty, two dollars. <anchor xml:id="UU6e"/>Thank you.<anchor xml:id="UU6f"/> </u> <spanGrp type="transactions">  <span from="#UU1">sale request</span>  <span from="#UU2to="#UU3">sale compliance</span>  <span from="#UU4">sale</span>  <span from="#UU5to="#UU6">purchase</span>  <span from="#UU6eto="#UU6f">purchase closure</span> </spanGrp>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <classRef key="model.descLike"
   minOccurs="0" maxOccurs="unbounded"/>
  <elementRef key="span" minOccurs="0"
   maxOccurs="unbounded"/>
 </sequence>
</content>
    
Schema Declaration
element spanGrp
{
   att.global.attributes,
   att.interpLike.attribute.inst,
   attribute type { text }?,
   ( model.descLike*, span* )
}

Appendix A.1.53 <teiHeader>

<teiHeader> (TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources. [2.1.1. The TEI Header and Its Components 15.1. Varieties of Composite Text]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
textstructure: TEI
May contain
Note

One of the few elements unconditionally required in any TEI document.

Example
<teiHeader>  <fileDesc>   <titleStmt> <!-- information about the author and title -->   </titleStmt>   <extent> <!-- information about the size of the text-->   </extent>   <publicationStmt>    <availability> <!-- information about licensing and publication of the ELTeC text-->    </availability>   </publicationStmt>   <sourceDesc> <!-- information about the source(s) from which the ELTeC text was derived -->   </sourceDesc>  </fileDesc>  <encodingDesc n="eltec-1"> <!-- indication of the encoding level -->  </encodingDesc>  <profileDesc>   <langUsage> <!-- indication of the language -->   </langUsage>   <textDesc> <!-- classification of the text according to the ELTeC sampling criteria -->   </textDesc>  </profileDesc>  <revisionDesc> <!-- Change log for the digital file -->  </revisionDesc> </teiHeader>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="fileDesc"/>
  <elementRef key="encodingDesc"/>
  <elementRef key="profileDesc"/>
  <elementRef key="revisionDesc"/>
 </sequence>
</content>
    
Schema Declaration
element teiHeader
{
   att.global.attributes,
   ( fileDesc, encodingDesc, profileDesc, revisionDesc )
}

Appendix A.1.54 <term>

<term> (term) contains a single-word, multi-word, or symbolic designation which is regarded as a technical term. [3.4.1. Terms and Glosses]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) att.canonical (@ref)
Contained by
header: keywords
May contain
analysis: pc s span spanGrp w
character data
Note

In ELTeC this element is used only in the header, to specify a descriptive keyword for the text being documented.

Example
<keywords xml:lang="en">  <term>silver fork</term>  <term>society</term> </keywords>
Schematron
<s:assert test="child::* or child::text()[normalize-space()]"  role="ERROR">A <term> must contain some text!</s:assert>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element term
{
   att.global.attributes,
   att.typed.attributes,
   att.canonical.attributes,
   macro.phraseSeq
}

Appendix A.1.55 <text>

<text> (text) contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 15.1. Varieties of Composite Text]
Moduletextstructure
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type)
Member of
Contained by
textstructure: TEI
May contain
analysis: span spanGrp
textstructure: back body front
Example
<text>  <front> <!-- front matter e.g. titlepage -->  </front>  <body> <!-- body of the text -->  </body>  <back> <!-- back matter e.g. notes-->  </back> </text>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <classRef key="model.global"
   minOccurs="0" maxOccurs="unbounded"/>
  <sequence minOccurs="0" maxOccurs="1">
   <elementRef key="front"/>
   <classRef key="model.global"
    minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
  <alternate minOccurs="1" maxOccurs="1">
   <elementRef key="body"/>
   <elementRef key="group"/>
  </alternate>
  <classRef key="model.global"
   minOccurs="0" maxOccurs="unbounded"/>
  <sequence minOccurs="0" maxOccurs="1">
   <elementRef key="back"/>
   <classRef key="model.global"
    minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
 </sequence>
</content>
    
Schema Declaration
element text
{
   att.global.attributes,
   att.typed.attributes,
   (
      model.global*,
      ( front, model.global* )?,
      ( body | group ),
      model.global*,
      ( back, model.global* )?
   )
}

Appendix A.1.56 <textClass>

<textClass> (text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc. [2.4.3. The Text Classification]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: profileDesc
May contain
header: keywords
Example
<textClass>  <keywords>   <term xml:lang="eng">juvenile literature</term>   <term xml:lang="deu">bildungsroman</term>  </keywords> </textClass>
Content model
<content>
 <elementRef key="keywords"/>
</content>
    
Schema Declaration
element textClass { att.global.attributes, keywords }

Appendix A.1.57 <textDesc>

<textDesc> (text description) provides a description of a text in terms of its situational parameters. [15.2.1. The Text Description]
Modulecorpus
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: profileDesc
May contain
Note

The elements <authorGender>, <size>, <reprintCount>, and <timeSlot> are not TEI elements, and do not belong to the TEI namespace. Their namespace must be specified, either in full as in this example, or by means of a namespace prefix defined on some hierarchically superior element.

Each of these four elements must be supplied exactly once, and in the order specified.

Example
<textDesc    xmlns:e="http://distantreading.net/eltec/ns">  <authorGender xmlns="http://distantreading.net/eltec/ns" key="F"/>  <size xmlns="http://distantreading.net/eltec/ns" key="long"/>  <reprintCount xmlns="http://distantreading.net/eltec/ns" key="high"/>  <timeSlot xmlns="http://distantreading.net/eltec/ns" key="T2"/> </textDesc>
Profile for a text with a female author, containing between over 100,000 words, of high reprintCount, first published between 1860 and 1879.
Schematron
<sch:report test="child::*:canonicity">The element formerly known as "canonicity" has now been renamed "reprintCount"</sch:report>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="authorGender"/>
  <elementRef key="size"/>
  <alternate minOccurs="1" maxOccurs="1">
   <elementRef key="canonicity"/>
   <elementRef key="reprintCount"/>
  </alternate>
  <elementRef key="timeSlot"/>
 </sequence>
</content>
    
Schema Declaration
element textDesc
{
   att.global.attributes,
   ( authorGender, size, ( canonicity | reprintCount ), timeSlot )
}

Appendix A.1.58 <timeSlot>

<timeSlot> specifies the time period during which the work was first published as a single volume
Namespacehttp://distantreading.net/eltec/ns
Modulederived-module-ELTeC
AttributesAttributes
key
StatusRequired
Datatypeteidata.enumerated
Legal values are:
T1
work first published between 1840 and 1859
T2
work first published between 1860 and 1879
T3
work first published between 1880 and 1899
T4
work first published between 1900 and 1920
Contained by
corpus: textDesc
May containEmpty element
Exampleindicates that the novel described was first published between 1840 and 1859 (T1)
<textDesc    xmlns:e="http://distantreading.net/eltec/ns"> <!-- ... -->  <timeSlot xmlns="http://distantreading.net/eltec/ns" key="T1"/> <!-- ... --> </textDesc>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element timeSlot { attribute key { "T1" | "T2" | "T3" | "T4" }, empty }

Appendix A.1.59 <title>

<title> (title) contains a title for any kind of work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.5. The Series Statement]
Modulecore
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref)
levelindicates the bibliographic level for a title, that is, whether it identifies an article, book, journal, series, or unpublished material.
StatusOptional
Datatypeteidata.enumerated
Legal values are:
a
(analytic) the title applies to an analytic item, such as an article, poem, or other work published as part of a larger item.
m
(monographic) the title applies to a monograph such as a book or other item considered to be a distinct publication, including single volumes of multi-volume works
j
(journal) the title applies to any serial or periodical publication such as a journal, magazine, or newspaper
s
(series) the title applies to a series of otherwise distinct publications such as a collection
u
(unpublished) the title applies to any unpublished material (including theses and dissertations unless published by a commercial press)
Note

The level of a title is sometimes implied by its context: for example, a title appearing directly within an <analytic> element is ipso facto of level ‘a’, and one appearing within a <series> element of level ‘s’. For this reason, the level attribute is not required in contexts where its value can be unambiguously inferred. Where it is supplied in such contexts, its value should not contradict the value implied by its parent element.

Member of
Contained by
May contain
Note

In ELTeC, this element is available only as metadata within the TEI header.

Example
<titleStmt>  <title>Wuthering Heights : ELTeC edition</title> <!-- ... --> </titleStmt>
ExampleThe ref attribute may optionally be used to reference an authority file entry for the title; in this case in VIAF
<titleStmt>  <title ref="viaf:194763311">El Señor de Bembibre : edición ELTeC</title> <!-- ... --> </titleStmt>
Schematron
<s:assert test="child::* or child::text()[normalize-space()]"  role="ERROR">provide a title for each novel followed by the phrase "ELTeC edition" (or a similar expression in the language of the text)</s:assert>
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element title
{
   att.global.attributes,
   att.canonical.attributes,
   attribute level { "a" | "m" | "j" | "s" | "u" }?,
   macro.paraContent
}

Appendix A.1.60 <titleStmt>

<titleStmt> (title statement) groups information about the title of a work and those responsible for its content. [2.2.1. The Title Statement 2.2. The File Description]
Moduleheader
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
Contained by
header: fileDesc
May contain
Example
<titleStmt>  <title>Silas Marner: The Weaver of Raveloe : ELTec edition</title>  <author ref="viaf:89000553">Eliot, George (pseud.) (1819-1880)</author>  <respStmt> <!-- ... -->  </respStmt> </titleStmt>
Example
<titleStmt>  <title ref="viaf:194763311">El Señor de Bembibre : edición ELTeC</title>  <author ref="viaf:27087132">Gil y Carrasco, Enrique (1815-1846)</author>  <respStmt> <!-- ... -->  </respStmt> </titleStmt>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="title" minOccurs="1"
   maxOccurs="unbounded"/>
  <classRef key="model.respLike"
   minOccurs="0" maxOccurs="unbounded"/>
 </sequence>
</content>
    
Schema Declaration
element titleStmt { att.global.attributes, ( title+, model.respLike* ) }

Appendix A.1.61 <trailer>

<trailer> contains a closing title or footer appearing at the end of a division of a text. [4.2.4. Content of Textual Divisions 4.2. Elements Common to All Divisions]
Moduletextstructure
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type)
Member of
Contained by
textstructure: back body div front
May contain
Example
<div type="volumen="1"> <!-- more chapters here -->  <div type="chaptern="23"> <!-- more paragraphs here -->   <p>.... and to think of the money it cost!</p>  </div>  <trailer>End of the first volume.</trailer> </div>
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <elementRef key="lg"/>
  <classRef key="model.gLike"/>
  <classRef key="model.phrase"/>
  <classRef key="model.inter"/>
  <classRef key="model.lLike"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Schema Declaration
element trailer
{
   att.global.attributes,
   att.typed.attributes,
   (
      text
    | lg
    | model.gLike
    | model.phrasemodel.intermodel.lLikemodel.global
   )*
}

Appendix A.1.62 <w>

<w> (word) represents a grammatical (not necessarily orthographic) word. [17.1. Linguistic Segment Categories 17.4.2. Lightweight Linguistic Annotation]
Moduleanalysis
AttributesAttributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.segLike (@function) att.typed (@type) att.notated (@notation) att.linguistic (pos, @lemma, @lemmaRef, @msd, @join)
Member of
Contained by
May contain
analysis: w
character data
Example
<s>  <w pos="DET">Here</w>  <w pos="AUXjoin="leftlemma="be">'s</w>  <w pos="DET">a</w>  <emph>   <w pos="ADV">really</w>   <w pos="ADJ">silly</w>  </emph>  <w pos="NOUN">example</w>  <w pos="PUNCTjoin="left">.</w> </s>
Example
<s>  <w pos="NOUN">Carte</w>  <w pos="ADPlemma="des">   <w pos="PART">de</w>   <w pos="DET">les</w>  </w>  <w pos="NOUN">vins</w> </s>
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <elementRef key="w"/>
 </alternate>
</content>
    
Schema Declaration
element w
{
   att.global.attributes,
   att.segLike.attributes,
   att.typed.attributes,
   att.linguistic.attribute.lemma,
   att.linguistic.attribute.lemmaRef,
   att.linguistic.attribute.msd,
   att.linguistic.attribute.join,
   att.notated.attributes,
   attribute pos { text }?,
   ( text | model.gLike | w )*
}

Appendix A.2 Model classes

Appendix A.2.1 model.attributable

model.attributable groups elements that contain a word or phrase that can be attributed to a source.
Moduletei
Used by
Membersmodel.quoteLike[quote]

Appendix A.2.2 model.biblLike

model.biblLike groups elements containing a bibliographic description.
Moduletei
Used by
Membersbibl

Appendix A.2.3 model.biblPart

model.biblPart groups elements which represent components of a bibliographic description.
Moduletei
Used by
Membersmodel.imprintPart[distributor pubPlace publisher] model.respLike[author respStmt] bibl extent idno title

Appendix A.2.4 model.common

model.common groups common chunk- and inter-level elements.
Moduletei
Used by
Membersmodel.divPart[model.lLike[l] model.pLike[p]] model.inter[model.attributable[model.quoteLike[quote]] model.egLike model.labelLike[label] model.listLike model.oddDecl model.stageLike]
Note

This class defines the set of chunk- and inter-level elements; it is used in many content models, including those for textual divisions.

Appendix A.2.5 model.dateLike

model.dateLike groups elements containing temporal expressions.
Moduletei
Used by
Membersdate

Appendix A.2.6 model.divBottom

model.divBottom groups elements appearing at the end of a text division.
Moduletei
Used by
Membersmodel.divBottomPart[trailer] model.divWrapper

Appendix A.2.7 model.divBottomPart

model.divBottomPart groups elements which can occur only at the end of a text division.
Moduletei
Used by
Memberstrailer

Appendix A.2.8 model.divLike

model.divLike groups elements used to represent un-numbered generic structural divisions.
Moduletei
Used by
Membersdiv

Appendix A.2.9 model.divPart

model.divPart groups paragraph-level elements appearing directly within divisions.
Moduletei
Used by
Membersmodel.lLike[l] model.pLike[p]
Note

Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items.

Appendix A.2.10 model.divTop

model.divTop groups elements appearing at the beginning of a text division.
Moduletei
Used by
Membersmodel.divTopPart[model.headLike[head]] model.divWrapper

Appendix A.2.11 model.divTopPart

model.divTopPart groups elements which can occur only at the beginning of a text division.
Moduletei
Used by
Membersmodel.headLike[head]

Appendix A.2.12 model.emphLike

model.emphLike groups phrase-level elements which are typographically distinct and to which a specific function can be attributed.
Moduletei
Used by
Membersemph foreign title

Appendix A.2.13 model.frontPart

model.frontPart groups elements which appear at the level of divisions within front or back matter.
Moduletei
Used by
Membersmodel.frontPart.drama

Appendix A.2.14 model.global

Appendix A.2.15 model.global.edit

model.global.edit groups globally available elements which perform a specifically editorial function.
Moduletei
Used by
Membersgap

Appendix A.2.16 model.global.meta

model.global.meta groups globally available elements which describe the status of other elements.
Moduletei
Used by
Membersspan spanGrp
Note

Elements in this class are typically used to hold groups of links or of abstract interpretations, or by provide indications of certainty etc. It may find be convenient to localize all metadata elements, for example to contain them within the same divison as the elements that they relate to; or to locate them all to a division of their own. They may however appear at any point in a TEI text.

Appendix A.2.17 model.headLike

model.headLike groups elements used to provide a title or heading at the start of a text division.
Moduletei
Used by
Membershead

Appendix A.2.18 model.hiLike

model.hiLike groups phrase-level elements which are typographically distinct but to which no specific function can be attributed.
Moduletei
Used by
Membershi

Appendix A.2.19 model.highlighted

model.highlighted groups phrase-level elements which are typographically distinct.
Moduletei
Used by
Membersmodel.emphLike[emph foreign title] model.hiLike[hi]

Appendix A.2.20 model.imprintPart

model.imprintPart groups the bibliographic elements which occur inside imprints.
Moduletei
Used by
Membersdistributor pubPlace publisher

Appendix A.2.21 model.inter

model.inter groups elements which can appear either within or between paragraph-like elements.
Moduletei
Used by
Membersmodel.attributable[model.quoteLike[quote]] model.egLike model.labelLike[label] model.listLike model.oddDecl model.stageLike

Appendix A.2.22 model.lLike

model.lLike groups elements representing metrical components such as verse lines.
Moduletei
Used by
Membersl

Appendix A.2.23 model.labelLike

model.labelLike groups elements used to gloss or explain other parts of a document.
Moduletei
Used by
Memberslabel

Appendix A.2.24 model.limitedPhrase

model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources.
Moduletei
Used by
Membersmodel.emphLike[emph foreign title] model.hiLike[hi] model.pPart.data[model.addressLike model.dateLike[date] model.measureLike model.nameLike[model.offsetLike model.placeStateLike[model.placeNamePart] rs]] model.pPart.editorial model.pPart.msdesc model.phrase.xml model.ptrLike[ref]

Appendix A.2.25 model.milestoneLike

model.milestoneLike groups milestone-style elements used to represent reference systems.
Moduletei
Used by
Membersmilestone pb

Appendix A.2.26 model.nameLike

model.nameLike groups elements which name or refer to a person, place, or organization.
Moduletei
Used by
Membersmodel.offsetLike model.placeStateLike[model.placeNamePart] rs
Note

A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc.

Appendix A.2.27 model.nameLike.agent

model.nameLike.agent groups elements which contain names of individuals or corporate bodies.
Moduletei
Used by
Membersname
Note

This class is used in the content model of elements which reference names of people or organizations.

Appendix A.2.28 model.noteLike

model.noteLike groups globally-available note-like elements.
Moduletei
Used by
Membersnote

Appendix A.2.29 model.pLike

model.pLike groups paragraph-like elements.
Moduletei
Used by
Membersp

Appendix A.2.30 model.pLike.front

model.pLike.front groups paragraph-like elements which can occur as direct constituents of front matter.
Moduletei
Used by
Membershead

Appendix A.2.31 model.pPart.data

model.pPart.data groups phrase-level elements containing names, dates, numbers, measures, and similar data.
Moduletei
Used by
Membersmodel.addressLike model.dateLike[date] model.measureLike model.nameLike[model.offsetLike model.placeStateLike[model.placeNamePart] rs]

Appendix A.2.32 model.pPart.edit

model.pPart.edit groups phrase-level elements for simple editorial correction and transcription.
Moduletei
Used by
Membersmodel.pPart.editorial model.pPart.transcriptional[corr]

Appendix A.2.33 model.pPart.transcriptional

model.pPart.transcriptional groups phrase-level elements used for editorial transcription of pre-existing source materials.
Moduletei
Used by
Memberscorr

Appendix A.2.34 model.phrase

model.phrase groups elements which can occur at the level of individual words or phrases.
Moduletei
Used by
Membersmodel.graphicLike model.highlighted[model.emphLike[emph foreign title] model.hiLike[hi]] model.lPart model.pPart.data[model.addressLike model.dateLike[date] model.measureLike model.nameLike[model.offsetLike model.placeStateLike[model.placeNamePart] rs]] model.pPart.edit[model.pPart.editorial model.pPart.transcriptional[corr]] model.pPart.msdesc model.phrase.xml model.ptrLike[ref] model.segLike[pc s w] model.specDescLike
Note

This class of elements can occur within paragraphs, list items, lines of verse, etc.

Appendix A.2.35 model.placeStateLike

model.placeStateLike groups elements which describe changing states of a place.
Moduletei
Used by
Membersmodel.placeNamePart

Appendix A.2.36 model.ptrLike

model.ptrLike groups elements used for purposes of location and reference.
Moduletei
Used by
Membersref

Appendix A.2.37 model.quoteLike

model.quoteLike groups elements used to directly contain quotations.
Moduletei
Used by
Membersquote

Appendix A.2.38 model.resource

model.resource groups separate elements which constitute the content of a digital resource, as opposed to its metadata.
Moduletei
Used by
Memberstext

Appendix A.2.39 model.respLike

model.respLike groups elements which are used to indicate intellectual or other significant responsibility, for example within a bibliographic element.
Moduletei
Used by
Membersauthor respStmt

Appendix A.2.40 model.segLike

model.segLike groups elements used for arbitrary segmentation.
Moduletei
Used by
Memberspc s w
Note

The principles on which segmentation is carried out, and any special codes or attribute values used, should be defined explicitly in the <segmentation> element of the <encodingDesc> within the associated TEI header.

Appendix A.3 Attribute classes

Appendix A.3.1 att.canonical

att.canonical provides attributes which can be used to associate a representation such as a name or title with canonical information about the object being named or referenced.
Moduletei
Membersauthor date distributor publisher resp respStmt term title
AttributesAttributes
ref(reference) provides an explicit means of locating a full definition or identity for the entity being named by means of one or more URIs.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
adds a reference for a standard authority database, e.g. viaf
<author ref="viaf:1234">Some Geyser</author>
Note

The value must point directly to one or more XML elements or other resources by means of one or more URIs, separated by whitespace. If more than one is supplied the implication is that the name identifies several distinct entities.

Appendix A.3.2 att.datable

att.datable provides attributes for normalization of elements that contain dates, times, or datable events.
Moduletei
Memberschange date name resp
AttributesAttributes att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)
Note

This ‘superclass’ provides attributes that can be used to provide normalized values of temporal information. By default, the attributes from the att.datable.w3c class are provided. If the module for names & dates is loaded, this class also provides attributes from the att.datable.iso and att.datable.custom classes. In general, the possible values of attributes restricted to the W3C datatypes form a subset of those values available via the ISO 8601 standard. However, the greater expressiveness of the ISO datatypes may not be needed, and there exists much greater software support for the W3C datatypes.

Appendix A.3.3 att.datable.w3c

att.datable.w3c provides attributes for normalization of elements that contain datable events conforming to the W3C XML Schema Part 2: Datatypes Second Edition.
Moduletei
Membersatt.datable[change date name resp]
AttributesAttributes
whensupplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
Examples of W3C date, time, and date & time formats.
<p>  <date when="1945-10-24">24 Oct 45</date>  <date when="1996-09-24T07:25:00Z">September 24th, 1996 at 3:25 in the morning</date>  <time when="1999-01-04T20:42:00-05:00">Jan 4 1999 at 8 pm</time>  <time when="14:12:38">fourteen twelve and 38 seconds</time>  <date when="1962-10">October of 1962</date>  <date when="--06-12">June 12th</date>  <date when="---01">the first of the month</date>  <date when="--08">August</date>  <date when="2006">MMVI</date>  <date when="0056">AD 56</date>  <date when="-0056">56 BC</date> </p>
This list begins in the year 1632, more precisely on Trinity Sunday, i.e. the Sunday after Pentecost, in that year the <date calendar="#julian"  when="1632-06-06">27th of May (old style)</date>.
<opener>  <dateline>   <placeName>Dorchester, Village,</placeName>   <date when="1828-03-02">March 2d. 1828.</date>  </dateline>  <salute>To    Mrs. Cornell,</salute> Sunday <time when="12:00:00">noon.</time> </opener>
notBeforespecifies the earliest possible date for the event in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
notAfterspecifies the latest possible date for the event in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
fromindicates the starting point of the period in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
toindicates the ending point of the period in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
Schematron
<sch:rule context="tei:*[@when]"> <sch:report test="@notBefore|@notAfter|@from|@to"  role="nonfatal">The @when attribute cannot be used with any other att.datable.w3c attributes.</sch:report> </sch:rule>
Schematron
<sch:rule context="tei:*[@from]"> <sch:report test="@notBefore"  role="nonfatal">The @from and @notBefore attributes cannot be used together.</sch:report> </sch:rule>
Schematron
<sch:rule context="tei:*[@to]"> <sch:report test="@notAfter"  role="nonfatal">The @to and @notAfter attributes cannot be used together.</sch:report> </sch:rule>
Example
<date from="1863-05-28to="1863-06-01">28 May through 1 June 1863</date>
Note

The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by XML Schema Part 2: Datatypes Second Edition, using the Gregorian calendar.

The most commonly-encountered format for the date portion of a temporal attribute is yyyy-mm-dd, but yyyy, --mm, ---dd, yyyy-mm, or --mm-dd may also be used. For the time part, the form hh:mm:ss is used.

Note that this format does not currently permit use of the value 0000 to represent the year 1 BCE; instead the value -0001 should be used.

Appendix A.3.4 att.dimensions

att.dimensions provides attributes for describing the size of physical objects.
Moduletei
Membersdate gap
AttributesAttributes
unitnames the unit used for the measurement
StatusOptional
Datatypeteidata.enumerated
Suggested values include:
cm
(centimetres)
mm
(millimetres)
in
(inches)
line
lines of text
char
(characters) characters of text
quantityspecifies the length in the units specified
StatusOptional
Datatypeteidata.numeric
extentindicates the size of the object concerned using a project-specific vocabulary combining quantity and units in a single string of words.
StatusOptional
Datatypeteidata.text
<gap extent="5 words"/>
<height extent="half the page"/>

Appendix A.3.5 att.global

att.global provides attributes common to all elements in the TEI encoding scheme.
Moduletei
MembersTEI author availability back bibl body change corr date distributor div emph encodingDesc extent fileDesc foreign front gap head hi keywords l label langUsage language licence measure milestone name note p pb pc profileDesc pubPlace publicationStmt publisher quote ref resp respStmt revisionDesc rs s sourceDesc span spanGrp teiHeader term text textClass textDesc title titleStmt trailer w
AttributesAttributes att.global.rendition (@rend)
xml:id(identifier) provides a unique identifier for the element bearing the attribute.
StatusOptional
DatatypeID
Note

The xml:id attribute may be used to specify a canonical reference for an element; see section 3.11. Reference Systems.

n(number) gives a number (or other label) for an element, which is not necessarily unique within the document.
StatusOptional
Datatypeteidata.text
Note

The value of this attribute is always understood to be a single token, even if it contains space or other punctuation characters, and need not be composed of numbers only. It is typically used to specify the numbering of chapters, sections, list items, etc.; it may also be used in the specification of a standard reference system for the text.

xml:lang(language) indicates the language of the element content using a ‘tag’ generated according to BCP 47.
StatusOptional
Datatypeteidata.language
<p> … The consequences of this rapid depopulation were the loss of the last <foreign xml:lang="rap">ariki</foreign> or chief (Routledge 1920:205,210) and their connections to ancestral territorial organization.</p>
Note

The xml:lang value will be inherited from the immediately enclosing element, or from its parent, and so on up the document hierarchy. It is generally good practice to specify xml:lang at the highest appropriate level, noticing that a different default may be needed for the <teiHeader> from that needed for the associated resource element or elements, and that a single TEI document may contain texts in many languages.

The authoritative list of registered language subtags is maintained by IANA and is available at http://www.iana.org/assignments/language-subtag-registry. For a good general overview of the construction of language tags, see http://www.w3.org/International/articles/language-tags/, and for a practical step-by-step guide, see https://www.w3.org/International/questions/qa-choosing-language-tags.en.php.

The value used must conform with BCP 47. If the value is a private use code (i.e., starts with x- or contains -x-), a <language> element with a matching value for its ident attribute should be supplied in the TEI header to document this value. Such documentation may also optionally be supplied for non-private-use codes, though these must remain consistent with their (IETF)Internet Engineering Task Force definitions.

xml:baseprovides a base URI reference with which applications can resolve relative URI references into absolute URI references.
StatusOptional
Datatypeteidata.pointer
<div type="bibl">  <head>Bibliography</head>  <listBibl xml:base="http://www.lib.ucdavis.edu/BWRP/Works/">   <bibl>    <author>     <name>Landon, Letitia Elizabeth</name>    </author>    <ref target="LandLVowOf.sgm">     <title>The Vow of the Peacock</title>    </ref>   </bibl>   <bibl>    <author>     <name>Compton, Margaret Clephane</name>    </author>    <ref target="NortMIrene.sgm">     <title>Irene, a Poem in Six Cantos</title>    </ref>   </bibl>   <bibl>    <author>     <name>Taylor, Jane</name>    </author>    <ref target="TaylJEssay.sgm">     <title>Essays in Rhyme on Morals and Manners</title>    </ref>   </bibl>  </listBibl> </div>
xml:spacesignals an intention about how white space should be managed by applications.
StatusOptional
Datatypeteidata.enumerated
Legal values are:
default
signals that the application's default white-space processing modes are acceptable
preserve
indicates the intent that applications preserve all white space
Note

The XML specification provides further guidance on the use of this attribute. Note that many parsers may not handle xml:space correctly.

Appendix A.3.6 att.global.rendition

att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme.
Moduletei
Membersatt.global[TEI author availability back bibl body change corr date distributor div emph encodingDesc extent fileDesc foreign front gap head hi keywords l label langUsage language licence measure milestone name note p pb pc profileDesc pubPlace publicationStmt publisher quote ref resp respStmt revisionDesc rs s sourceDesc span spanGrp teiHeader term text textClass textDesc title titleStmt trailer w]
AttributesAttributes
rend(rendition) indicates how the element in question was rendered or presented in the source text.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
<head rend="align(center) case(allcaps)">  <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/>  <hi rend="case(mixed)">New Blazing-World</hi>. </head>
Note

These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project. Some potentially useful conventions are noted from time to time at appropriate points in the Guidelines. The values of the rend attribute are a set of sequence-indeterminate individual tokens separated by whitespace.

Appendix A.3.7 att.linguistic

att.linguistic provides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically <w> and <pc> in the analysis module.
Moduleanalysis
Memberspc w
AttributesAttributes
lemmaprovides a lemma (base form) for the word, typically uninflected and serving both as an identifier (e.g. in dictionary contexts, as a headword), and as a basis for potential inflections.
StatusOptional
Datatypeteidata.text
<w lemma="wife">wives</w>
<w lemma="Arznei">Artzeneyen</w>
lemmaRefprovides a pointer to a definition of the lemma for the word, for example in an online lexicon.
StatusOptional
Datatypeteidata.pointer
<w type="verb"  lemma="hit"  lemmaRef="http://www.example.com/lexicon/hitvb.xml">hitt<m type="suffix">ing</m> </w>
pos(part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).
StatusOptional
Datatypeteidata.text
The German sentence ‘Wir fahren in den Urlaub.’ tagged with the Stuttgart-Tuebingen-Tagset (STTS).
<s>  <w pos="PPER">Wir</w>  <w pos="VVFIN">fahren</w>  <w pos="APPR">in</w>  <w pos="ART">den</w>  <w pos="NN">Urlaub</w>  <w pos="$.">.</w> </s>
The English sentence ‘We're going to Brazil.’ tagged with the CLAWS-5 tagset, arranged inline (with significant whitespace).
<p><w pos="PNP">We</w><w pos="VBB">'re</w> <w pos="VVG">going</w> <w pos="PRP">to</w> <w pos="NP0">Brazil</w><pc pos="PUN">.</pc></p>         
The English sentence ‘We're going on vacation to Brazil for a month!’ tagged with the CLAWS-7 tagset and arranged sequentially.
<p>  <w pos="PPIS2">We</w>  <w pos="VBR">'re</w>  <w pos="VVG">going</w>  <w pos="II">on</w>  <w pos="NN1">vacation</w>  <w pos="II">to</w>  <w pos="NP1">Brazil</w>  <w pos="IF">for</w>  <w pos="AT1">a</w>  <w pos="NNT1">month</w>  <pc pos="!">!</pc> </p>
msd(morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features).
StatusOptional
Datatypeteidata.text
<ab>  <w pos="PPER"   msd="1.Pl.*.Nom">Wir</w>  <w pos="VVFIN"   msd="1.Pl.Pres.Ind">fahren</w>  <w pos="APPR"   msd="--">in</w>  <w pos="ART"   msd="Def.Masc.Akk.Sg">den</w>  <w pos="NN"   msd="Masc.Akk.Sg">Urlaub</w>  <pc pos="$."   msd="--">.</pc> </ab>
joinwhen present, it provides information on whether the token in question is adjacent to another, and if so, on which side. The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012.
StatusOptional
Datatypeteidata.text
Legal values are:
no
(the token is not adjacent to another)
left
(there is no whitespace on the left side of the token)
right
(there is no whitespace on the right side of the token)
both
(there is no whitespace on either side of the token)
overlap
(the token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream)
The example below assumes that the lack of whitespace is marked redundantly, by using the appropriate values of join.
<s>  <pc join="right">"</pc>  <w join="left">Friends</w>  <w>will</w>  <w>be</w>  <w join="right">friends</w>  <pc join="both">.</pc>  <pc join="left">"</pc> </s>
Note that a project may make a decision to only indicate lack of whitespace in one direction, or do that non-redundantly. The existing proposal is the broadest possible, on the assumption that we adopt the "streamable view", where all the information on the current element needs to be represented locally.
The English sentence ‘We're going on vacation.’ tagged with the CLAWS-5 tagset, arranged sequentially, tagged on the assumption that only the lack of the preceding whitespace is indicated.
<p>  <w pos="PNP">We</w>  <w pos="VBB"   join="left">'re</w>  <w pos="VVG">going</w>  <w pos="PRP">on</w>  <w pos="NN1">vacation</w>  <pc pos="PUN"   join="left">.</pc> </p>
Note

These attributes make it possible to encode simple language corpora and to add a layer of linguistic information to any tokenized resource. See section 17.4.2. Lightweight Linguistic Annotation for discussion.

Appendix A.3.8 att.milestoneUnit

att.milestoneUnit provides an attribute to indicate the type of section which is changing at a specific milestone.
Modulecore
Membersmilestone
AttributesAttributes
unitprovides a conventional name for the kind of section changing at this milestone.
StatusRequired
Datatypeteidata.enumerated
Suggested values include:
page
physical page breaks (synonymous with the <pb> element).
column
column breaks.
line
line breaks (synonymous with the <lb> element).
book
any units termed book, liber, etc.
poem
individual poems in a collection.
canto
cantos or other major sections of a poem.
speaker
changes of speaker or narrator.
stanza
stanzas within a poem, book, or canto.
act
acts within a play.
scene
scenes within a play or act.
section
sections of any kind.
absent
passages not present in the reference edition.
unnumbered
passages present in the text, but not to be included as part of the reference.
<milestone n="23"  ed="La"  unit="Dreissiger"/> ... <milestone n="24"  ed="AV"  unit="verse"/> ...
Note

If the milestone marks the beginning of a piece of text not present in the reference edition, the special value absent may be used as the value of unit. The normal interpretation is that the reference edition does not contain the text which follows, until the next <milestone> tag for the edition in question is encountered.

In addition to the values suggested, other terms may be appropriate (e.g. Stephanus for the Stephanus numbers in Plato).

The type attribute may be used to characterize the unit boundary in any respect other than simply identifying the type of unit, for example as word-breaking or not.

Appendix A.3.9 att.notated

att.notated provides an attribute to indicate any specialised notation used for element content.
Moduletei
Membersquote s w
AttributesAttributes
notationnames the notation used for the content of the element.
StatusOptional
Datatypeteidata.enumerated

Appendix A.3.10 att.pointing

att.pointing provides a set of attributes used by all elements which point to other elements by means of one or more URI references.
Moduletei
Memberslicence note ref span
AttributesAttributes
targetspecifies the destination of the reference by supplying one or more URI References
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Note

One or more syntactically valid URI references, separated by whitespace. Because whitespace is used to separate URIs, no whitespace is permitted inside a single URI. If a whitespace character is required in a URI, it should be escaped with the normal mechanism, e.g. TEI%20Consortium.

Appendix A.3.11 att.segLike

att.segLike provides attributes for elements used for arbitrary segmentation.
Moduletei
Memberspc s w
AttributesAttributes
function(function) characterizes the function of the segment.
StatusOptional
Datatypeteidata.enumerated
Note

Attribute values will often vary depending on the type of element to which they are attached. For example, a <cl>, may take values such as coordinate, subject, adverbial etc. For a <phr>, such values as subject, predicate etc. may be more appropriate. Such constraints will typically be implemented by a project-defined customization.

Appendix A.3.12 att.sortable

att.sortable provides attributes for elements in lists or groups that are sortable, but whose sorting key cannot be derived mechanically from the element content.
Moduletei
Membersbibl
AttributesAttributes
sortKeysupplies the sort key for this element in an index, list or group which contains it.
StatusOptional
Datatypeteidata.word
David's other principal backer, Josiah ha-Kohen <index indexName="NAMES">  <term sortKey="Azarya_Josiah_Kohen">Josiah ha-Kohen b. Azarya</term> </index> b. Azarya, son of one of the last gaons of Sura was David's own first cousin.
Note

The sort key is used to determine the sequence and grouping of entries in an index. It provides a sequence of characters which, when sorted with the other values, will produced the desired order; specifics of sort key construction are application-dependent

Dictionary order often differs from the collation sequence of machine-readable character sets; in English-language dictionaries, an entry for 4-H will often appear alphabetized under ‘fourh’, and McCoy may be alphabetized under ‘maccoy’, while A1, A4, and A5 may all appear in numeric order ‘alphabetized’ between ‘a-’ and ‘AA’. The sort key is required if the orthography of the dictionary entry does not suffice to determine its location.

Appendix A.3.13 att.typed

att.typed provides attributes which can be used to classify or subclassify elements in any way.
Moduletei
Membersatt.interpLike[span spanGrp] TEI bibl change corr date div head idno label milestone name note pb pc quote ref rs s term text trailer w
AttributesAttributes
typecharacterizes the element in some sense, using any convenient classification scheme or typology.
StatusOptional
Datatypeteidata.enumerated
<div type="verse">  <head>Night in Tarras</head>  <lg type="stanza">   <l>At evening tramping on the hot white road</l>   <l></l>  </lg>  <lg type="stanza">   <l>A wind sprang up from nowhere as the sky</l>   <l></l>  </lg> </div>
Note

The type attribute is present on a number of elements, not all of which are members of att.typed, usually because these elements restrict the possible values for the attribute in a specific way.

Schematron
<sch:rule context="tei:*[@subtype]"> <sch:assert test="@type">The <sch:name/> element should not be categorized in detail with @subtype unless also categorized in general with @type</sch:assert> </sch:rule>
Note

When appropriate, values from an established typology should be used. Alternatively a typology may be defined in the associated TEI header. If values are to be taken from a project-specific list, this should be defined using the <valList> element in the project-specific schema description, as described in 23.3.1.3. Modification of Attribute and Attribute Value Lists .

Appendix A.4 Macros

Appendix A.4.1 macro.paraContent

macro.paraContent (paragraph content) defines the content of paragraphs and similar elements.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <classRef key="model.phrase"/>
  <classRef key="model.inter"/>
  <classRef key="model.global"/>
  <elementRef key="lg"/>
  <classRef key="model.lLike"/>
 </alternate>
</content>
    
Declaration
macro.paraContent =
   (
      text
    | model.gLike
    | model.phrasemodel.intermodel.global
    | lg
    | model.lLike
   )*

Appendix A.4.2 macro.phraseSeq

macro.phraseSeq (phrase sequence) defines a sequence of character data and phrase-level elements.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <classRef key="model.attributable"/>
  <classRef key="model.phrase"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Declaration
macro.phraseSeq =
   ( text | model.gLike | model.attributable | model.phrase | model.global )*

Appendix A.4.3 macro.phraseSeq.limited

macro.phraseSeq.limited (limited phrase sequence) defines a sequence of character data and those phrase-level elements that are not typically used for transcribing extant documents.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.limitedPhrase"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Declaration
macro.phraseSeq.limited = ( text | model.limitedPhrase | model.global )*

Appendix A.4.4 macro.specialPara

macro.specialPara ('special' paragraph content) defines the content model of elements such as notes or list items, which either contain a series of component-level elements or else have the same structure as a paragraph, containing a series of phrase-level and inter-level elements.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <classRef key="model.phrase"/>
  <classRef key="model.inter"/>
  <classRef key="model.divPart"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Declaration
macro.specialPara =
   (
      text
    | model.gLike
    | model.phrasemodel.intermodel.divPartmodel.global
   )*

Appendix A.5 Datatypes

Appendix A.5.1 teidata.certainty

teidata.certainty defines the range of attribute values expressing a degree of certainty.
Moduletei
Used by
Content model
<content>
 <valList type="closed">
  <valItem ident="high"/>
  <valItem ident="medium"/>
  <valItem ident="low"/>
  <valItem ident="unknown"/>
 </valList>
</content>
    
Declaration
teidata.certainty = "high" | "medium" | "low" | "unknown"
Note

Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter.

Appendix A.5.2 teidata.enumerated

teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities.
Moduletei
Used by
Element:
Content model
<content>
 <dataRef key="teidata.word"/>
</content>
    
Declaration
teidata.enumerated = teidata.word
Note

Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element.

Appendix A.5.3 teidata.language

teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system.
Moduletei
Used by
Element:
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <dataRef name="language"/>
  <valList>
   <valItem ident=""/>
  </valList>
 </alternate>
</content>
    
Declaration
teidata.language = xsd:language | ( "" )
Note

The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice.

A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable.

language
The IANA-registered code for the language. This is almost always the same as the ISO 639 2-letter language code if there is one. The list of available registered language subtags can be found at http://www.iana.org/assignments/language-subtag-registry. It is recommended that this code be written in lower case.
script
The ISO 15924 code for the script. These codes consist of 4 letters, and it is recommended they be written with an initial capital, the other three letters in lower case. The canonical list of codes is maintained by the Unicode Consortium, and is available at http://unicode.org/iso15924/iso15924-codes.html. The IETF recommends this code be omitted unless it is necessary to make a distinction you need.
region
Either an ISO 3166 country code or a UN M.49 region code that is registered with IANA (not all such codes are registered, e.g. UN codes for economic groupings or codes for countries for which there is already an ISO 3166 2-letter code are not registered). The former consist of 2 letters, and it is recommended they be written in upper case; the list of codes can be searched or browsed at https://www.iso.org/obp/ui/#search/code/. The latter consist of 3 digits; the list of codes can be found at http://unstats.un.org/unsd/methods/m49/m49.htm.
variant
An IANA-registered variation. These codes are used to indicate additional, well-recognized variations that define a language or its dialects that are not covered by other available subtags.
extension
An extension has the format of a single letter followed by a hyphen followed by additional subtags. These exist to allow for future extension to BCP 47, but as of this writing no such extensions are in use.
private use
An extension that uses the initial subtag of the single letter x (i.e., starts with x-) has no meaning except as negotiated among the parties involved. These should be used with great care, since they interfere with the interoperability that use of RFC 4646 is intended to promote. In order for a document that makes use of these subtags to be TEI-conformant, a corresponding <language> element must be present in the TEI header.

There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications.

Second, an entire language tag can consist of only a private use subtag. These tags start with x-, and do not need to follow any further rules established by the IETF and endorsed by these Guidelines. Like all language tags that make use of private use subtags, the language in question must be documented in a corresponding <language> element in the TEI header.

Examples include

sn
Shona
zh-TW
Taiwanese
zh-Hant-HK
Chinese written in traditional script as used in Hong Kong
en-SL
English as spoken in Sierra Leone
pl
Polish
es-MX
Spanish as spoken in Mexico
es-419
Spanish as spoken in Latin America

The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML.

Appendix A.5.4 teidata.numeric

teidata.numeric defines the range of attribute values used for numeric values.
Moduletei
Used by
Content model
<content rend="replace">
 <dataRef name="token"
  restriction="([\d]+)"/>
</content>
    
Declaration
teidata.numeric = token { pattern = "([\d]+)" }
Note

We restrict all numeric data to positive integer values only

Appendix A.5.5 teidata.pointer

teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere.
Moduletei
Used by
Element:
Content model
<content>
 <dataRef name="anyURI"/>
</content>
    
Declaration
teidata.pointer = xsd:anyURI
Note

The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, https://secure.wikimedia.org/wikipedia/en/wiki/% is encoded as https://secure.wikimedia.org/wikipedia/en/wiki/%25 while http://موقع.وزارة-الاتصالات.مصر/ is encoded as http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/

Appendix A.5.6 teidata.probability

teidata.probability defines the range of attribute values expressing a probability.
Moduletei
Used by
Content model
<content>
 <dataRef name="double"/>
</content>
    
Declaration
teidata.probability = xsd:double
Note

Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true.

Appendix A.5.7 teidata.temporal.w3c

teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification.
Moduletei
Used by
Element:
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <dataRef name="date"/>
  <dataRef name="gYear"/>
  <dataRef name="gMonth"/>
  <dataRef name="gDay"/>
  <dataRef name="gYearMonth"/>
  <dataRef name="gMonthDay"/>
  <dataRef name="time"/>
  <dataRef name="dateTime"/>
 </alternate>
</content>
    
Declaration
teidata.temporal.w3c =
   xsd:date
 | xsd:gYear
 | xsd:gMonth
 | xsd:gDay
 | xsd:gYearMonth
 | xsd:gMonthDay
 | xsd:time
 | xsd:dateTime
Note

If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.

Appendix A.5.8 teidata.text

teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace.
Moduletei
Used by
Content model
<content>
 <dataRef name="string"/>
</content>
    
Declaration
teidata.text = string
Note

Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted.

Appendix A.5.9 teidata.truthValue

teidata.truthValue defines the range of attribute values used to express a truth value.
Moduletei
Used by
Element:
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Declaration
teidata.truthValue = xsd:boolean
Note

The possible values of this datatype are 1 or true, or 0 or false.

This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: teidata.xTruthValue.

Appendix A.5.10 teidata.word

teidata.word defines the range of attribute values expressed as a single word or token.
Moduletei
Used by
Content model
<content>
 <dataRef name="token"
  restriction="[^\p{C}\p{Z}]+"/>
</content>
    
Declaration
teidata.word = token { pattern = "[^\p{C}\p{Z}]+" }
Note

Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Cost Action CA16204 – WG1. Date: 2019-02-01