This reference document describes the European Literary Text Collection (ELTeC), a major deliverable of COST Action 16204, Distant Reading. The ELTeC is a principled collection of literary text corpora, uniformly encoded in TEI XML, and representing the production of novels in different European languages for the period 1840 to 1920. The present document begins with a description of the principles and sampling methods used to construct the collection, and is followed by detailed technical documentation of the TEI XML schema used to encode the textual components and the metadata of the collection. All texts included in the ELTeC conform to the schema described by this document.
The goal of CA16204 is to create a benchmark corpus of literature from 1840-1920 adequate to the needs of many different computational distant reading methods for corpus annotation and analysis. The corpus design should support comparison of texts and individual sub-collections selected according to the metadata associated with each text. It should be possible to sample sub-collections from the ELTeC for specific tasks and research questions. In a first step, we focus on the development of clear, operationalized, transparent and motivated selection criteria for the corpus.
It is important to stress that we do not intend to define what a novel is by defining what kind of selection criteria we will use for ELTeC. The category novel may be divided into three groups where at least one of the following core criteria is met: a) textual: length (>10.000 words), prose, fiction, narrative structure b) paratextual (the term ‘novel’ or equivalent appears in the title or subtitle of the text) and c) contextual: the text is bibliographically listed with the UDC: 82-31 Novels. Full-length stories.
We follow a non-normative but metadata-based approach of sampling criteria which will follow a corpus design approach. Corpus sampling criteria are mostly oriented/developed by the research question or/and contexts of the corpus creators group. In CA16204, we have neither only a single research question nor a fixed and previously known corpus creator group. The research context of the Action is more interested in knowledge production in a methodological sense and does not prefer a single method, model or theory. Furthermore, the member group of the Action will fluctuate and consist of researches from different disciplines with different theoretical and cultural contexts. Thus, we need to build the corpus design on a methodical basis. With this method, we will also be able to select canonical texts as well but not exclusively.
Representativeness is a kind of ideal which we would like to pursue but which cannot be achieved as whole. We will therefore aim to represent the variety of a population. In line with the MoU, the ELTeC will be designed as a monitor corpus where texts (from different languages and periods) can be added over time. We then need to decide which criterion is balanced in which way and interplays with other criteria.
According to the MoU, the corpus design should be balanced with respect to language and publication date of the texts. This means that the corpus should not be based solely on chronological criteria, meaning that we need a text from each year of the period in question. The main sampling criterion ‘language’ will require not to include translations at all. We will prefer to take the first edition of a novel or editions of these novels. By a novel, we prefer to take the edition of the book, hence we don't prefer novels only printed in periodicals, unless a particular literary tradition only features novels printed in serial format. If we consider editions of a novel, these editions should be freely available (free licences for reusing them. The first edition is more interesting from a philological point of view. It represents the authentic texts of the authors. Dealing with historical texts might require some cleaning up or normalizations. We will merge all word forms which are separated by line breaks. At the moment, we must assume that there are no (sufficiently good) normalization tools for every language. Later editions of a novel may be already normalized in some way. This might lead to different text representations in ELTeC which should be indicated in the metadata.
Considering also later freely available editions of a novel has two advantages: First, members of the Action already can provide machine-readable text documents (html, TEI etc.) of later editions and second, in some languages it might be easier to find later editions which already exist in a machine-readable format (in this way we do not have to put effort in digitizing them).
Electronically availability should not be a leading sampling criterion although availability is a limiting factor. A text should not be excluded from ELTeC because it is not digitized, but it should be excluded if the text cannot be made freely available in ELTeC. If we only use availability as a selecting criterion, we are at risk of copying projects such as ‘Gutenberg’ for example. The issue remains of finding additional funds to digitise non-canonical books. Un til that moment, the solution would be to create pilot corpora (that can later be supplemented or substituted by an alterna tive) for literatures that do not have a significant number of digitized texts.
We then need additional criteria which can be applied without having to know (read) the texts in question. The criteria should be checked without a deep knowledge about the texts. Otherwise, this will oppose the goal of the whole Action and the methodical approach of distant reading. The criteria should be operationalizable, meaning decidable from text metadata. Here, we define text metadata in a wider scope than only the classical bibliographical metadata. In this way corpus design interacts with metadata. Some of the text’s metadata can be used as sampling criteria. These criteria are text-external and -internal criteria (cf. Hunston 2008) on which we then need to rely. The selection criteria may be assisted by bibliographical overviews (wherever available) for each language in order to avoid possible canon-derived bias.
We suggest using an online table as a means of collecting nominations for inclusion in the ELTeC but other methods are feasible.
For creating a language collection two steps have to be done: First step is selection: identifying text candidates. Second step is balancing: proportion within the corpus. Both steps are defined in this document.
Principles:
Organization
Criteria: Eligibility.
In order to be included (selection), a text must...
Criteria: Composition.
Among the novels in each language the subcollection must contain...
We will divide into four groups
The MoU defines the languages to be sampled. It does not propose distinguishing regional variation (e.g. in German), nor geographical variation (e.g. the French spoken in Belgium, France, or Switzerland). It assumes only European varieties, so English excludes US English; French excludes Quebecois.
We follow a language-based approach (not country-based). This means for example that we include Swiss German texts in the German language collection. We prefer standard varieties over dialect varieties if sampling criteria for text candidates are met.
We propose to use the number of times a work is reprinted as an objective measure of its reception during the period 1970-2009, using categories like the following:
We use the following three categories for actual (not claimed) author gender
We include a variety of lengths
The MoU for the project notes that ‘Distant Reading methods cover a wide range of computational methods for literary text analysis, such as authorship attribution, topic modelling, character network analysis, or stylistic analysis.’ The focus of the ELTeC encoding scheme is therefore not to represent texts in all their original complexity of structure or appearance, but rather to facilitate a richer and better-informed distant reading than a transcription of lexical content alone would permit. In designing this encoding scheme, we have applied the following principles:
The goal is not to duplicate the work of scholarly editors or to produce (yet another) digital edition of a specific source document. Rather it is to ensure that the ELTeC texts can be processed satisfactorily, even by simple minded (but XML-aware) systems primarily concerned with lexis, and to make life easier for the developers of such systems.
In selecting features for inclusion in the markup scheme, we have been guided, but not limited, by existing practice as far as possible. Our main goal has been to identify a small core set of textual features which can be readily (preferably automatically) identified in existing digital transcriptions, or easily and consistently provided by new transcriptions.
We distinguish three ‘levels’ of encoding, referred to below as level zero, level one and level two. All ELTeC texts are made available at level zero, the basic encoding format. Some texts may additionally be made available at levels one or two, which provide a richer set of encoded features. For example: a level one text will include semantic information missing from a level zero text; a level two text will include tokenization information missing from a level one text. As far as possible conversion between levels will be automatically scripted, but this is not possible in the general case.
This document lists all the textual features which are to be distinguished in an ELTeC conformant transcription at one of these three levels. Whenever a given feature exists in a text, it will be marked up as indicated here. No other features will be captured by the markup: if some textual feature not provided for here is identified by a marked up source text, that markup will be removed (though it may be retained in a version of the text encoded at a different level).
All ELTeC documents are TEI conformant, and therefore include a TEI Header, as discussed in section 3.1. Metadata in the TEI Header below.
The basic unit of the ELTeC collection is a single novel, represented by a single <TEI> element, consisting of a <teiHeader> element containing metadata specific to that novel and a <text> element containing a normalised transcription of the text itself. We propose no mechanism (other than metadata) to encode units larger than a single novel, such as multipart novel series like Proust's A la recherche du temps perdu or Balzac's Les Rougon-Macquart. Each text should be transcribed in full from a specific identifiable edition, typically the first, and the source documented in the TEI Header. The original spelling and punctuation of the source should be retained, but details of typography are not required: hence words hyphenated across line or page breaks should be silently reassembled.
To facilitate checking of a transcription against its source during production, the <pb> element may be used to mark the point in a transcription where a new page begins in the source. An identifier for each <pb> element may be provided in a level 1 text to facilitate linkage to a page image of the corresponding source page. This element is not required in a level 0 text.
Running titles, page footers, catchwords and other forms of printed paratext are all omitted from an ELTeC transcription, with the exception of a page number, which may be supplied as value of the n attribute. Note that this attribute supplies the page number as specified by the source. If no page number is given, the value should be enclosed in brackets.
If a page begins with the second part of a hyphenated word, the <pb> tag may appear after that word in order to simplify lexical processing. Otherwise its position should be the same in transcription and source.
As well as a titlepage or a table of contents, a published novel often includes material such as forewords or appendixes additional to the text of the novel itself. This liminal matter is included in an ELTeC text only if it is believed to be authorial. Material before the body of the text begins is collected within a <front> element, and material following the body in a <back> element. In either case, distinct sections of the material, if encoded, are represented by a <div> with its type attribute set to liminal
.
At level zero, titlepages and tables of contents are omitted. At level one, they are replaced by a <gap> element. Non-authorial liminal material is silently omitted at all levels.
Within the body of a text, major structural divisions (parts, sections, chapters etc.) will be captured using the generic <div> element, with attributes type, xml:lang, xml:id and n used as further detailed below.
The names used for hierarchic structural divisions of a novel above the chapter are arbitrary, culture-specific, and often inconsistent : in some novels things called ‘part’ contain things called ‘book’ and in others the reverse. We propose to follow TEI in using a single element (<div>) for every hierarchical structural division, down to the level of ‘chapter’.
The type attribute is used to indicate the function of a structural division. It should have one of the following values:
A short novel may have no subdivision at all, in which case the <div> element should not be used. No further subdivisions within a <div type='chapter'> are permitted. If the text of a chapter is subdivided in some way, for example by means of a number, a row of stars, a horizontal rule, or similar device, this should be indicated in the markup by means of a <milestone> element. If a chapter contains an embedded text of some kind, for example a quoted letter or other narrative, this should be marked using the <quote> element.
The (human) language in which a text is expressed is indicated explicitly by the xml:lang attribute which supplies the ISO 641-2 letter code for the language concerned. This attribute will always be supplied on the <text> element to specify a default, and may also appear on other elements to indicate passages where the language changes. The various different languages used in a given text will be itemized in its metadata (see <langUsage> element in the header).
A single reference scheme will be defined for the whole corpus, with the following components:
FRA00042
FR042012
is the twelfth chapter of the 42nd French novel.The identifier will be supplied as the value of an xml:id attribute on each <text>, <div>, front, back, or <s> element as appropriate. Adding this identifier is an easily automated task built into the workflow for accession to the ELTeC.
Note that these identifiers will not necessarily correspond with the numbering used in a particular source text. In a work where the first twelve chapters are considered to form part one, and the next twelve constitute part two, the first chapter of the second part will have an identifier ending 013
, even though it may be numbered 1
in a source text.
We do not preserve the lineation of running prose in our source texts, since this is always purely an artefact of the source edition. For the same reason we reassemble words broken across a line break, silently removing any hyphen present. (This will make it impossible to use our texts for hyphenation studies. So be it.)
The title of a chapter, or of any other subdivision, as given in the source should be encoded using the TEI <head> element. There may be more than one such element at the start of a <div> element. Novels occasionally include other initial matter, such as a quotation, or a summary of the content of the chapter; these are not specially treated in ELTeC texts.
The chapters of a novel mostly consist of prose, arranged in paragraphs. It is not unusual to find other structures however, specifically verse, or passages of dialogue presented as if in a play, with speaker labels and even stage directions. Less frequently, novels may contain material presented in list or tabular formats. Graphics with their own associated heading or other text are also frequent.
Novels are also full of direct speech, represented using various different conventions, but almost always distinguished from the narrative voice. The first person narrative is also common, but may be regarded as a special case. How exactly different narrative strands are articulated in a novel, and the extent to which they may be characterised by their lexis has been a preoccupation of many ‘distant reading’ style analyses. Although it might be helpful to distinguish material purporting to be direct speech from material purporting to be narrative in our basic encoding, doing so consistently and accurately would be problematic. ELTeC texts therefore do no more than preserve existing punctuation. The <p> element is used for everything which is typeset as a separate block on the page, including both paragraphs and list items; the <l >element is used for verse lines or similar, typically set off from the rest of the text. Illustrations and any associated text such as a title or heading are excluded. Passages set as if in drama are not specially treated.
Printed texts typically deploy a number of conventions which can cause problems for linguistic analyses of even the most basic kind. Changes of font or style (italicization or use of superscript, for example) usually signal something, which an analysis should take into account. However, determining the function of such typographic variation is not always straightforward. ELTeC texts (at level 0) therefore simply indicate the presence of typographic salience, using the <hi> element.
ELTeC encoding at level zero aims above all for consistency and transparency in what is reliably achievable, leaving most problematic issues to be addressed by linguistic annotation.
If however a text has been derived from a digital version in which a more ambitious range of textual features has already been captured, whether by means of TEI-style markup or styling information such as that provided by Word, or if there are sufficient resources available to provide a slightly reacher encoded version, a novel may be encoded at ELTeC level 1, using additional elements discussed in this section. Note that a level one text can always automatically be converted to a level zero text, if this is necessary for compatibility or for some other reason. (A script to do this is available on the project website) The reverse conversion, from level zero to level one, requires human intervention.
At level 1, the element <gap> should be used to indicate when something has been omitted from the encoding. For example, a suppressed graphic, or foreword, which at level zero are silently omitted, should be represented at level 1 by an explicit <gap> element, with attributes indicating what has been omitted from the encoding.
At level 1, the following additional elements can be used to mark up the significance of some stretch of text which would otherwise simply be marked as typographically salient using <hi>:
The <hi> is also sometimes used for indications of superscript characters (such as French ‘14ᵉ’); these should simply be removed.
When a sequence of verse lines, or a passage from some other narrative level such as a letter is quoted within a text, the <l> or <p> elements representing it should be wrapped in the <quote> element made available by the ELTeC level one schema.
It is not unusual to find special devices such as a row of stars or a rule in the middle of a chapter, usually indicating a discontinuity in the narrative timeline, or structural shift. Such indications may be simply ignored in an ELTeC level zero text; at level 1, the special purpose <milestone> element may be used to mark their presence.
Level 1 texts may also represent authorial notes, if these are present, using the element <note> to contain the body of the note, and the element <ref> to represent the point of attachment for the note. Wherever they appear in the text, notes are always separated from it and encoded in a separate <div> element within the <back> element. See examples.
Occasionally, the text being transcribed contains self-evident errors. Where these are caused by the encoding process (e.g. an OCR error), these are always silently corrected in the transcription. Where however the original itself is faulty, however, and the transcriber (or a textual editor) has corrected it, the correction should be signalled by using the <corr> element available at level one.
At ELTeC level2, all existing elements are retained and two new elements <s> and <w> are introduced to support segmentation of running text into sentence-like and word-like sequences respectively. Individual tokens are marked using the <w> element, and decorated with one or more of the TEI-defined linguistic attributes pos, lemma, and join. Both words and punctuation marks are considered to be ‘tokens’ in this sense, although the TEI suggests distinguishing the two cases using <w> and <pc> respectively. The <s> (segment) element is used to provide an end-to-end tessellating segmentation of the whole sequence of <w> elements, based on orthographic form. This provides a convenient extension of the existing text-body-div hierarchy within which tokens are located. The elements <p>, <head>, and <l> (which contain just text at levels 0 and 1) at level 2 can contain a sequence of <s> elements. Empty elements <gap>, <milestone>, <pb> or <ref> are also permitted within text content at any point, but these are disregarded when segmentation is carried out. Each <s> element can contain a sequence of <w> elements, either directly, or wrapped in one of the sub-paragraph elements <corr>, <emph>, <foreign>, <hi>, <label>, or <title>. To this list we add the element <rs> (referring string), provided by the TEI for the encoding of any form of entity name, such as a Named Entity Recognition procedure might produce.
This approach implies that <w> elements may appear at two levels in the hierarchy which may upset some software; it also implies that <w> elements must be properly contained within one of these elements, without overlap.
This TEI XML format is equally applicable to the production of training data for applications using machine learning techniques and to the outputs of such systems. However, since such machine learning applications typically operate on text content in a tabular format only, XSLT filters which transform (or generate) the XML markup discussed here from such tabular formats without loss of information are envisaged. At the time of writing, however, Working Group 2 has yet to put this proposed architecture to the test.
The following summary table lists the textual features which every ELTeC text must capture, together with an indication of how that feature should be represented.
Textual Feature | Encoding | Note |
---|---|---|
Page break | <pb/> | n attribute gives attested number of page; optional at level 0 |
Title page | <div type="titlepage"> within <front> | Optional at level 0 |
Authorial preface, foreword, appendix, etc | <div type="liminal"> within <front> or <back> as appropriate | Non-authorial matter is silently omitted |
volume, chapter etc. | <div> nested as necessary within <body> | type may be chapter, or group (for anything else); n may indicate original numbering |
Heading or title | <head> at start of <div>; <trailer> at end | |
Running title/page footer | Omitted | Page number only may be included in <pb> |
Prose paragraph or list item | <p> | Discard any formatting information |
Verse line | <l> | Use only for verse lines in display blocks |
Other textual features are treated differently at different encoding levels. They are listed in the following table:
Textual Feature | Level 0 Encoding | Level 1 Encoding | Note |
Table of contents, errata list, other liminal matter | omitted | <gap> | use unit and quantity to specify what has been omitted |
Mid-chapter structural marker | omitted | <milestone/> | use unit and type to supply further detail |
Authorial footnote | omitted | transcribe text of note text within a <note> within <div type="notes"> inside <back>; mark point of attachment with a <ref> | use target on <ref> to point to <note> |
Font change | Mark with <hi> (no attributes) | Replace with <foreign>, <title>, <label>, <emph> as appropriate | |
Graphic or illustration | omitted | <gap unit="graphic"> | |
Quotation or display block | <p> (or series of <l>) | <quote> containing one or more <p> or <l> | ? |
Editorial correction | unmarked | <corr> | Use when encoded text differs from printed original |
This section describes the metadata associated with each text (title, authorship, date etc.) and with the collection as a whole. The intention is to provide this in a standardised way to facilitate subsetting of the collection, using (for example) coded values for the descriptive selection criteria associated with the text. As far as possible, our text should represent the first complete printed edition of each novel selected.
The TEI Header provides a very large number of possibilities for encoding such metadata. We will provide a checklist of the TEI Header elements which are always to be provided for each text, possibly in the form of a template. As in the body of the text, the intention is to provide a guaranteed minimal level of information, consistent across all parts of the ELTeC.
Note that metadata may be supplied at (at least) two levels: the level of the ELTeC as a whole, and that of individual texts within it. Information which applies uniformly to all parts of the collection should be supplied in the ELTeC header; information specific to a particular document in the text header.
Every ELTeC text includes a TEI Header supplying metadata to describe it. There is also a TEI Header for the whole collection, which has additional information common to all the texts. This section lists the header components which should be supplied for every text, indicating briefly specific usage rules.
Each of the TEI elements shown above must be provided and used as described below. The ELTeC schemas will reject as invalid a document in which these conventions are not followed.
The natural language used for text in the Header should be that of the language collection to which the text belongs, e.g. French if the text is in French. The attribute @xml:lang may be supplied on any element to indicate its content is in some other language where necessary.
This must supply :
This must supply :
Here is a simple example, for a French text :
The page count may be derived from an external bibliographic source, and may not therefore correspond with the actual number of <pb> elements in the transcription. If no page count is available, no <measure unit="pages"> should be supplied.
This must contain at least one <bibl> element containing a bibliographic description of the source text from which the ELTeC version has been derived. This description might include any or all of the following standard TEI bibliographic elements:
This encoding shows that the source of the ELTeC text is the digital facsimile provided under the title and ARK identifier indicated, and that the first edition of this work was published in Paris in 1886.
This must contain a <langUsage> element detailing the language or languages used in the text, followed optionally by a <textClass> element providing descriptive keywords, and by a mandatory <textDesc> element providing the sampling criteria applicable to this text.
The <textDesc> element contains one of each of the following elements from the model.textDescPart class in the order indicated:
These elements are used to represent the sampling criteria applicable to the current text. They are specific to the ELTeC project, and are therefore taken from the ELTeC namespace (http://distantreading.net/eltec/ns) rather than the TEI namespace.
The optional <textClass> element may contain one or more <keywords> elements. Each <keywords> element may contain one or more <term> element describing some aspect of the text. At present the descriptive keywords may be freely chosen. All the terms in a given <keywords> list should use the same language, which should be that of the text itself, unless otherwise specified by means of an @xml:lang attribute.
The <langUsage> element should contain one or more <language> element, one for each of the human languages used in the text. The @ident attribute of this element identifies the language using the ISO 639-2 code in the same way as the @xml:lang attribute. The @usage attribute may be used to indicate approximately what percentage of the text uses this language, or otherwise qualify it by means of a brief description.
This contains at least one <change> element, documenting significant revisions or versions of the asspociated text. Each change element has a @when attribute which gives the date of the change in W3C format (YYYY-MM-DD) and the change elements are given in chronological order, most recent first. The content of the element should be a brief sentence indicating what was done and who was responsible for doing it, using the language of the text.
The ELTeC encoding scheme defined by this document is a TEI-conformant customization, from which user documentation, and formal RELAXNG or DTD specifications are generated automatically.
<TEI> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resource class. Multiple <TEI> elements may be combined within a <TEI> (or <teiCorpus>) element. [4. Default Text Structure 15.1. Varieties of Composite Text] | |||||||||||||||||
Module | textstructure | ||||||||||||||||
Attributes | Attributes att.typed (@type) att.global (xml:id, xml:lang, @n, @xml:base, @xml:space) att.global.rendition (@rend)
| ||||||||||||||||
Contained by | textstructure: TEI | ||||||||||||||||
May contain | |||||||||||||||||
Note | In ELTeC schemas, the attributes xml:lang and xml:id must be supplied for each TEI element. Identifiers should have a common alphabetic prefix followed by up to 5 digits. Language codes should conform to ISO 639-2 | ||||||||||||||||
Example | <TEI xml:id="SPA2001" xml:lang="SPA" xmlns="http://www.tei-c.org/ns/1.0">
<!-- -->
</TEI> This text in the Spanish language has the identifier SPA2001 | ||||||||||||||||
Schematron |
<sch:ns prefix="tei"
uri="http://www.tei-c.org/ns/1.0"/>
<sch:ns prefix="xs"
uri="http://www.w3.org/2001/XMLSchema"/> | ||||||||||||||||
Schematron |
<sch:ns prefix="rng"
uri="http://relaxng.org/ns/structure/1.0"/> | ||||||||||||||||
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="teiHeader"/> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.resource" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="TEI" minOccurs="0" maxOccurs="unbounded"/> </sequence> <elementRef key="TEI" minOccurs="1" maxOccurs="unbounded"/> </alternate> </sequence> </content> | ||||||||||||||||
Schema Declaration | element TEI { att.global.attribute.n, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.typed.attributes, attribute xml:id { text }, attribute xml:lang { text }, ( teiHeader, ( ( model.resource+, TEI* ) | TEI+ ) ) } |
<author> (author) in a bibliographic reference, contains the name(s) of an author, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement] | |
Module | core |
Attributes | Attributes att.canonical (@ref) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Member of | |
Contained by | |
May contain | Character data only |
Note | The ref attribute should be used to reference one or more externally defined identifiers for the author, as defined by an authority file such as VIAF. |
Example | When used within a <titleStmt>, an author's name is given in a standardized format (surname, forename/s, (YYYY-YYYY)) as shown in this example. <author ref="viaf:31996364">Forster, Edward Morgan (1879-1970)</author> |
Example | When used within the <sourceDesc>, an author's name is given in the format used by the source in question, as shown in this example. <author>E.M. Forster</author> |
Example | In cases of multiple authorship, the <author> element within <titleStmt> should be repeated <titleStmt>
<title>The Diary of a Nobody : ELTeC edition </title>
<author>Grossmith, George (1847-1912)</author>
<author>Grossmith, Walter Weedon (1854-1919)</author>
</titleStmt> |
Content model | <content> <textNode/> </content> |
Schema Declaration | element author { att.canonical.attributes, att.global.attributes, text } |
<authorGender> specifies the sex of the author where this is known | |||||||||||
Namespace | http://distantreading.net/eltec/ns | ||||||||||
Module | derived-module-ELTeC | ||||||||||
Attributes | Attributes
| ||||||||||
Contained by | corpus: textDesc | ||||||||||
May contain | Empty element | ||||||||||
Example | indicates that the author of the novel to be described is male (M) <profileDesc
xmlns:e="http://distantreading.net/eltec/ns">
<textDesc>
<authorGender xmlns="http://distantreading.net/eltec/ns" key="M"/>
<!-- ... -->
</textDesc>
</profileDesc> | ||||||||||
Example | indicates that the gender of author of the novel to be described cannot be specified (U) <profileDesc
xmlns:e="http://distantreading.net/eltec/ns">
<textDesc>
<authorGender xmlns="http://distantreading.net/eltec/ns" key="U"/>
<!-- ... -->
</textDesc>
</profileDesc> | ||||||||||
Content model | <content> <empty/> </content> | ||||||||||
Schema Declaration | element authorGender { attribute key { "M" | "F" | "U" | "X" }, empty } |
<availability> (availability) supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc. [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: publicationStmt |
May contain | header: licence |
Note | All ELTeC texts comprise a text which is in the public domain and markup which is licenced under the Creative Commons Attribution licence indicated (CC-BY 4.0). |
Example | <availability>
<licence target="https://creativecommons.org/licenses/by/4.0/">
<p>The TEI mark up is licenced with Creative Commons Attribution (CC-BY 4.0).</p>
</licence>
</availability> |
Content model | <content> <elementRef key="licence"/> </content> |
Schema Declaration | element availability { att.global.attributes, licence } |
<back> (back matter) contains any appendixes, etc. following the main part of a text. [4.7. Back Matter 4. Default Text Structure] | |
Module | textstructure |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | textstructure: text |
May contain | |
Note | Because cultural conventions differ as to which elements are grouped as back matter and which as front matter, the content models for the <back> and <front> elements are identical. |
Example | <back>
<div type="liminal">
<head>Appendix</head>
<p>
<!-- additional text here -->
</p>
</div>
<div type="notes">
<head>Authorial Notes</head>
<note xml:id="ENG18700_N23">
<!-- text of footnote here -->
</note>
</div>
</back> |
Schematron |
<sch:assert test="child::tei:div[@type='notes'] or child::tei:div[@type='liminal']"
role="ERROR">The back matter of a text must contain either liminal or notes
divisions</sch:assert> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.frontPart"/> <classRef key="model.pLike.front"/> <classRef key="model.pLike"/> <classRef key="model.listLike"/> <classRef key="model.global"/> </alternate> <alternate minOccurs="0" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.div1Like"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.frontPart"/> <classRef key="model.div1Like"/> <classRef key="model.global"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.divLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.frontPart"/> <classRef key="model.divLike"/> <classRef key="model.global"/> </alternate> </sequence> </alternate> <sequence minOccurs="0" maxOccurs="1"> <classRef key="model.divBottomPart"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottomPart"/> <classRef key="model.global"/> </alternate> </sequence> </sequence> </content> |
Schema Declaration | element back { att.global.attributes, ( ( model.frontPart | model.pLike.front | model.pLike | model.listLike | model.global )*, ( ( model.div1Like, ( model.frontPart | model.div1Like | model.global )* ) | ( model.divLike, ( model.frontPart | model.divLike | model.global )* ) )?, ( model.divBottomPart, ( model.divBottomPart | model.global )* )? ) } |
<bibl> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements] | |||||||||||
Module | core | ||||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.sortable (@sortKey)
| ||||||||||
Member of | |||||||||||
Contained by | core: bibl header: sourceDesc | ||||||||||
May contain | |||||||||||
Note | Contains phrase-level elements, together with any combination of elements from the model.biblPart class | ||||||||||
Example | shows a full source description with a digital source which represents the first edition <sourceDesc>
<bibl type="digitalSource">
<title>Wuthering Heights (1st edition) : wikisource edition</title>
<ref target="https://en.wikisource.org/wiki/Wuthering_Heights_(1st_edition)"/>
</bibl>
<bibl type="firstEdition">
<title>Wuthering Heights</title>
<title>A novel by</title>
<author>Ellis Bell</author>
<publisher>London: T. C. Newby</publisher>
<date>1847</date>
</bibl>
</sourceDesc> | ||||||||||
Example | <sourceDesc>
<bibl type="printSource">
<title>Opera omnia</title>
<title>Romanzi</title>
<author>Svevo, Italo</author>
<respStmt>
<resp>editor</resp>
<name>Maier, Bruno</name>
</respStmt>
<publisher>dall'Oglio</publisher>
<pubPlace>Milano</pubPlace>
<date>1969</date>
<note>Contiene: Una vita ; Senilita ; La coscienza di Zeno</note>
</bibl>
<bibl type="firstEdition">
<date>1892</date>
</bibl>
</sourceDesc> The ELTeC text derives from a print edition published in 1969. The first edition of the work concerned was published in 1892. We do not know whether or not the print edition used the first edition as a source. | ||||||||||
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.highlighted"/> <classRef key="model.pPart.data"/> <classRef key="model.pPart.edit"/> <classRef key="model.segLike"/> <classRef key="model.ptrLike"/> <classRef key="model.biblPart"/> <classRef key="model.global"/> </alternate> </content> | ||||||||||
Schema Declaration | element bibl { att.global.attributes, att.sortable.attributes, attribute type { "firstEdition" | "printSource" | "digitalSource" | "unspecified" }, ( text | model.gLike | model.highlighted | model.pPart.data | model.pPart.edit | model.segLike | model.ptrLike | model.biblPart | model.global )* } |
<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure] | |
Module | textstructure |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | textstructure: text |
May contain | |
Example | <body>
<div type="chapter">
<head>I.</head>
<p>Utanfor, i Vest, bryt Have paa mot ei sju Milir lang laag Sandstrand.</p>
<p>Det er sjølve Have. Nordhave breidt og fritt, ukløyvt og utøymt, endelaust....</p>
<!-- ... -->
</div>
<!-- more chapters here -->
<trailer>Slutten</trailer>
</body> |
Schematron |
<sch:assert test="descendant::tei:div[@type='chapter' or @type='letter']"
role="ERROR">The
body of a text must contain at least one chapter or letter</sch:assert> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <classRef key="model.divTop"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divTop"/> </alternate> </sequence> <sequence minOccurs="0" maxOccurs="1"> <classRef key="model.divGenLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.divLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.div1Like"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.common"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> <alternate minOccurs="0" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.divLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.div1Like"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> </alternate> </sequence> </alternate> <sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottom"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> </sequence> </content> |
Schema Declaration | element body { att.global.attributes, ( model.global*, ( model.divTop, ( model.global | model.divTop )* )?, ( model.divGenLike, ( model.global | model.divGenLike )* )?, ( ( model.divLike, ( model.global | model.divGenLike )* )+ | ( model.div1Like, ( model.global | model.divGenLike )* )+ | ( ( model.common, model.global* )+, ( ( model.divLike, ( model.global | model.divGenLike )* )+ | ( model.div1Like, ( model.global | model.divGenLike )* )+ )? ) ), ( model.divBottom, model.global* )* ) } |
<canonicity> indicates the degree to which the text has become part of a literary canon | |||||||||||
Namespace | http://distantreading.net/eltec/ns | ||||||||||
Module | derived-module-ELTeC | ||||||||||
Attributes | Attributes
| ||||||||||
Contained by | corpus: textDesc | ||||||||||
May contain | Empty element | ||||||||||
Example | <textDesc
xmlns:e="http://distantreading.net/eltec/ns">
<!-- ... -->
<reprintCount xmlns="http://distantreading.net/eltec/ns" key="medium"/>
<!-- ... -->
</textDesc> | ||||||||||
Content model | <content> <empty/> </content> | ||||||||||
Schema Declaration | element canonicity { attribute key { "high" | "low" | "unspecified" }, empty } |
<change> (change) documents a change or set of changes made during the production of a source document, or during the revision of an electronic file. [2.6. The Revision Description 2.4.1. Creation 11.7. Identifying Changes and Revisions] | |||||||||
Module | header | ||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) att.datable.w3c (when, @notBefore, @notAfter, @from, @to)
| ||||||||
Contained by | header: revisionDesc | ||||||||
May contain | |||||||||
Note | In ELTeC texts, the when attribute must be supplied and should indicate a date in the format YYY-MM-DD. | ||||||||
Example | <change when="2018-11-01">Conversion with CLIGStoELTeC stylesheet for ELTeC-1</change> | ||||||||
Content model | <content> <macroRef key="macro.specialPara"/> </content> | ||||||||
Schema Declaration | element change { att.datable.w3c.attribute.notBefore, att.datable.w3c.attribute.notAfter, att.datable.w3c.attribute.from, att.datable.w3c.attribute.to, att.global.attributes, att.typed.attributes, attribute when { text }, macro.specialPara } |
<corr> (correction) contains the correct form of a passage apparently erroneous in the copy text. [3.5.1. Apparent Errors] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) |
Member of | |
Contained by | |
May contain | |
Example | In this text, the words "al" and "hombre" have been added by the editor/transcriber, to replace an original which omits these words for some reason or represents them with a non-standard or erroneous orthography. <p>... me había presentado aún ocasión de asombrar <corr>al</corr> mundo con ningún hecho
heroico; pero el oírme llamar <corr>hombre</corr> me llenó de orgul...</p> |
Content model | <content> <macroRef key="macro.paraContent"/> </content> |
Schema Declaration | element corr { att.global.attributes, att.typed.attributes, macro.paraContent } |
<date> (date) contains a date in any format. [3.6.4. Dates and Times 2.2.4. Publication, Distribution, Licensing, etc. 2.6. The Revision Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 15.2.3. The Setting Description 13.4. Dates] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref) att.datable (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) att.dimensions (@unit, @quantity, @extent) att.typed (@type) |
Member of | |
Contained by | |
May contain | |
Note | <date> is used within <publicationStmt> and within <bibl>. |
Example | indicatesthe date of the publication of a novel in ELTeC within <publicationStmt> <publicationStmt>
<availability>
<licence target="https://creativecommons.org/licenses/by/4.0/">
<p>
<!-- description -->
</p>
</licence>
</availability>
<p> Published as part of ELTeC <date>2018-11-01</date>
</p>
</publicationStmt> |
Example | indicating the date of the first edition of a novel within <sourceDesc> <sourceDesc>
<bibl type="firstEdition">
<!-- -->
<date>1871</date>
<!-- -->
</bibl>
</sourceDesc> |
Schematron |
<sch:assert test="ancestor::tei:teiHeader"
role="ERROR"> The date element should not be used
outside the TEI Header </sch:assert> |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.global"/> </alternate> </content> |
Schema Declaration | element date { att.global.attributes, att.canonical.attributes, att.datable.attributes, att.dimensions.attributes, att.typed.attributes, ( text | model.gLike | model.phrase | model.global )* } |
<distributor> (distributor) supplies the name of a person or other agency responsible for the distribution of a text. [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref) |
Member of | |
Contained by | core: bibl header: publicationStmt |
May contain | |
Example | <distributor>Oxford Text Archive</distributor>
<distributor>Redwood and Burn Ltd</distributor> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element distributor { att.global.attributes, att.canonical.attributes, macro.phraseSeq } |
<div> (text division) contains a subdivision of the front, body, or back of a text. [4.1. Divisions of the Body] | |||||||||||
Module | textstructure | ||||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | |||||||||||
Note | A division of type= | ||||||||||
Example | <body>
<div type="chapter">
<head>III</head>
<p>Em casa do dr. Carvalho, Claudio pouco fallou com Emilia, elle prezo a uma meza do
"whist", para ser agradavel ao juiz que sem isso se aborrecia, ella dansando sempre.
...</p>
<!-- more paragraphs here -->
</div>
</body> | ||||||||||
Schematron | div of type chapter should not be further subdivided
<sch:report test="@type='chapter' and child::tei:div"> A div of type 'chapter' may not be
further subdivided (except by milestones) </sch:report> | ||||||||||
Schematron |
<s:report test="ancestor::tei:l"> Abstract model violation: Lines may not contain higher-level structural elements such as div.
</s:report> | ||||||||||
Schematron |
<s:report test="ancestor::tei:p or ancestor::tei:ab and not(ancestor::tei:floatingText)"> Abstract model violation: p and ab may not contain higher-level structural elements such as div.
</s:report> | ||||||||||
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divTop"/> <classRef key="model.global"/> </alternate> <sequence minOccurs="0" maxOccurs="1"> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <alternate minOccurs="1" maxOccurs="1"> <classRef key="model.divLike"/> <classRef key="model.divGenLike"/> </alternate> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> <sequence minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.common"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> <sequence minOccurs="0" maxOccurs="unbounded"> <alternate minOccurs="1" maxOccurs="1"> <classRef key="model.divLike"/> <classRef key="model.divGenLike"/> </alternate> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> </sequence> </alternate> <sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottom"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> </sequence> </sequence> </content> | ||||||||||
Schema Declaration | element div { att.global.attributes, attribute type { "titlepage" | "notes" | "liminal" | "chapter" | "letter" | "group" }?, ( ( model.divTop | model.global )*, ( ( ( ( model.divLike | model.divGenLike ), model.global* )+ | ( ( model.common, model.global* )+, ( ( model.divLike | model.divGenLike ), model.global* )* ) ), ( model.divBottom, model.global* )* )? ) } |
<emph> (emphasized) marks words or phrases which are stressed or emphasized for linguistic or rhetorical effect. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Member of | |
Contained by | |
May contain | |
Example | The editor/transcriber wishes to show that the word "my" is linguistically emphasized in the original source: Oh—don’t mind <emph>my</emph> feelings—call
me a mangy monkey—I’ve tried hard enough to look like one! |
Content model | <content> <macroRef key="macro.paraContent"/> </content> |
Schema Declaration | element emph { att.global.attributes, macro.paraContent } |
<encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived. [2.3. The Encoding Description 2.1.1. The TEI Header and Its Components] | |||||||||||
Module | header | ||||||||||
Attributes | Attributesatt.global (n, @xml:id, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend)
| ||||||||||
Contained by | header: teiHeader | ||||||||||
May contain | core: p | ||||||||||
Example | describes the level of encoding of the TEI document, either level 0, 1 or 2 <encodingDesc n="eltec-0">
<p>Encoded to ELTeC level zero</p>
</encodingDesc> | ||||||||||
Content model | <content> <elementRef key="p"/> </content> | ||||||||||
Schema Declaration | element encodingDesc { att.global.attribute.xmlid, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, attribute n { "eltec-0" | "eltec-1" | "eltec-2" }, p } |
<extent> (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units. [2.2.3. Type and Extent of File 2.2. The File Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 10.7.1. Object Description] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Member of | |
Contained by | |
May contain | core: measure |
Note | Must contain at least one <measure> element, indicating the word count. Other indications of size are optional. |
Example | A book of 235 pages, containing 102,345 words <extent>
<measure unit="words">102345</measure>
<measure unit="pages">235</measure>
</extent> |
Schematron |
<sch:assert test="child::tei:measure[@unit eq 'words']">You must provide a word
count</sch:assert> |
Content model | <content> <elementRef key="measure" minOccurs="1" maxOccurs="unbounded"/> </content> |
Schema Declaration | element extent { att.global.attributes, measure+ } |
<fileDesc> (file description) contains a full bibliographic description of an electronic file. [2.2. The File Description 2.1.1. The TEI Header and Its Components] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: teiHeader |
May contain | header: extent publicationStmt sourceDesc titleStmt |
Note | The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elsewhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived. |
Example | <fileDesc>
<titleStmt>
<!-- information about the title of the work -->
</titleStmt>
<extent>
<!-- information about the size of the work -->
</extent>
<publicationStmt>
<p>Adicionado à coleção ELTeC <date>20 de novembro de 2018</date>. </p>
</publicationStmt>
<sourceDesc>
<bibl>
<!-- bibliographic description of the source/s of the work
-->
</bibl>
</sourceDesc>
</fileDesc> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="titleStmt"/> <elementRef key="extent"/> <elementRef key="publicationStmt"/> <elementRef key="sourceDesc"/> </sequence> </content> |
Schema Declaration | element fileDesc { att.global.attributes, ( titleStmt, extent, publicationStmt, sourceDesc ) } |
<foreign> (foreign) identifies a word or phrase as belonging to some language other than that of the surrounding text. [3.3.2.1. Foreign Words or Expressions] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Member of | |
Contained by | |
May contain | |
Note | The global xml:lang attribute should be supplied for this element to identify the language of the word or phrase marked. As elsewhere, its value should be a language tag as defined in 6.1. Language Identification. This element is intended for use only where no other element is available to mark the phrase or words concerned. The global xml:lang attribute should be used in preference to this element where it is intended to mark the language of the whole of some text element. The <distinct> element may be used to identify phrases belonging to sublanguages or registers not generally regarded as true languages. |
Example | The Latin phrase "Ab urbe condita" is not in the same language (Portuguese) as the rest of the paragraph <p>E calcando a espada debaixo do pé esquerdo, curvou-a: <foreign>Ab urbe condita</foreign>,
da fundação de Roma, no ano seiscentos e três. </p> |
Example | In this example, the whole quotation is given in a different language (Spanish). The foreign language concerned is specified by means of the xml:lang attribute. The <foreign> element can only be used to enclose words and phrases directly, rather than to enclose <l> or <quote> elements, and must therefore be repeated for the content of each line. <p>E cá fóra veriamos o velho mendigo no mesmo lugar ainda, cantando ao som da sanfona:</p>
<quote>
<l>
<foreign xml:lang="es">«Rosa fresca, rosa fresca,</foreign>
</l>
<l>
<foreign xml:lang="es">tan garrida y con amor;</foreign>
</l>
<l>
<foreign xml:lang="es">quando vos tuve em mis braços,</foreign>
</l>
<l>
<foreign xml:lang="es">no vos supe servir, no,</foreign>
</l>
<l>
<foreign xml:lang="es">y agora que os serviria</foreign>
</l>
<l>
<foreign xml:lang="es">no vos puedo aver no.</foreign>
<ref target="#note2"/>
</l>
</quote> |
Example | An alternative and more economical encoding for the foregoing example: <p>E cá fóra veriamos o velho mendigo no mesmo lugar ainda, cantando ao som da sanfona:</p>
<quote xml:lang="es">
<l>«Rosa fresca, rosa fresca,</l>
<l>tan garrida y con amor;</l>
<!-- ... etc. -->
</quote> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element foreign { att.global.attributes, macro.phraseSeq } |
<front> (front matter) contains any prefatory matter (headers, abstracts, title page, prefaces, dedications, etc.) found at the start of a document, before the main body. [4.6. Title Pages 4. Default Text Structure] | |
Module | textstructure |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | textstructure: text |
May contain | |
Note | Because cultural conventions differ as to which elements are grouped as front matter and which as back matter, the content models for the <front> and <back> elements are identical. |
Example | <front>
<div type="titlePage">
<p>Bucura Dumbravă</p>
<p>HAIDUCUL</p>
<p>Tradus</p>
<p>de</p>
<p>Teodor Nica</p>
<p>ediția a IV-a</p>
<p>București</p>
<p>Editura Librăriei Școlelor C. Sfetea</p>
<p>63-64, - Calea Moșilor, - 62-64</p>
<p>1919</p>
</div>
<div type="liminal">
<head>PREFAȚĂ LA EDȚIA ÎNTÂIA</head>
<p>Isvoarele, de cari m'am slujit la studiul vieții lin Iancu Jianu și a timpului său,
sunt cele următoare...</p>
<p>BUCURA DUMBRAVĂ.</p>
<p>București, 1911.</p>
</div>
</front> |
Schematron |
<sch:assert test="child::tei:div[@type='titlepage'] or child::tei:div[@type='liminal']"
role="ERROR">The front matter of a text must contain either liminal or titlepage
divisions</sch:assert> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.frontPart"/> <classRef key="model.pLike"/> <classRef key="model.pLike.front"/> <classRef key="model.global"/> </alternate> <sequence minOccurs="0" maxOccurs="1"> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.div1Like"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.div1Like"/> <classRef key="model.frontPart"/> <classRef key="model.global"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.divLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divLike"/> <classRef key="model.frontPart"/> <classRef key="model.global"/> </alternate> </sequence> </alternate> <sequence minOccurs="0" maxOccurs="1"> <classRef key="model.divBottom"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottom"/> <classRef key="model.global"/> </alternate> </sequence> </sequence> </sequence> </content> |
Schema Declaration | element front { att.global.attributes, ( ( model.frontPart | model.pLike | model.pLike.front | model.global )*, ( ( ( model.div1Like, ( model.div1Like | model.frontPart | model.global )* ) | ( model.divLike, ( model.divLike | model.frontPart | model.global )* ) ), ( model.divBottom, ( model.divBottom | model.global )* )? )? ) } |
<gap> (gap) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible. [3.5.3. Additions, Deletions, and Omissions] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.dimensions (@unit, @quantity, @extent) |
Member of | |
Contained by | |
May contain | Empty element |
Note | In an ELTeC level 1 transcription, the unit attribute of this element may be used to indicate what has been omitted from a transcription. |
Example | Two consecutive graphic components omitted from transcription: <gap unit="graphic" quantity="2"/> |
Example | Table of contents omitted from transcription: <gap unit="toc"/> |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.descLike"/> <classRef key="model.certLike"/> </alternate> </content> |
Schema Declaration | element gap { att.global.attributes, att.dimensions.attributes, ( model.descLike | model.certLike )* } |
<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. [4.2.1. Headings and Trailers] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) |
Member of | |
Contained by | |
May contain | |
Note | The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section. |
Example | <div type="part">
<head>BOOK I.</head>
<head>MISS BROOKE.</head>
<div type="chapter">
<head>CHAPTER I.</head>
<quote> Since I can do no good because a woman, Reach constantly at something that is near
it. —The Maid's Tragedy: BEAUMONT AND FLETCHER. </quote>
<p>Miss Brooke had that kind of beauty which seems to be thrown into relief by poor
dress.... </p>
<!-- ... -->
</div>
<!-- ... -->
</div> A heading of any kind at the start of a division of any kind may be marked using <head>. In this example, there are two headings at the start of the first part, and one at the start of the first chapter. The epigraph at the start of the first chapter is marked up as a quotation and is not a heading. |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <elementRef key="lg"/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.inter"/> <classRef key="model.lLike"/> <classRef key="model.global"/> </alternate> </content> |
Schema Declaration | element head { att.global.attributes, att.typed.attributes, ( text | lg | model.gLike | model.phrase | model.inter | model.lLike | model.global )* } |
<hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Member of | |
Contained by | |
May contain | |
Note | This element is used at level0 for any kind of typographic salience recorded by the encoder. At level1 a semantic interpretation (for example using <emph>, <foreign> etc.) replaces it. |
Example | <p>Ha önök ismernék a <hi>Spleen</hi>-t, a <hi>Végtelenség </hi> prométheüszi gőgjét, a
<hi>Rejtelmek</hi> fásultságát és magas röptét, szóval, ha önök ismernék <hi>Icarus</hi>t,
alkalmasint ilyenformán méltóztatnának okoskodni...</p> |
Content model | <content> <macroRef key="macro.paraContent"/> </content> |
Schema Declaration | element hi { att.global.attributes, macro.paraContent } |
<idno> (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way. [13.3.1. Basic Principles 2.2.4. Publication, Distribution, Licensing, etc. 2.2.5. The Series Statement 3.12.2.4. Imprint, Size of a Document, and Reprint Information] | |||||||||||
Module | header | ||||||||||
Attributes | Attributes
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | header: idno character data | ||||||||||
Note | <idno> should be used for labels which identify an object or concept in a formal cataloguing system such as a database or an RDF store, or in a distributed system such as the World Wide Web. Some suggested values for type on <idno> are ISBN, ISSN, DOI, and URI. | ||||||||||
Example | <idno type="ISBN">978-1-906964-22-1</idno>
<idno type="ISSN">0143-3385</idno>
<idno type="DOI">10.1000/123</idno>
<idno type="URI">http://www.worldcat.org/oclc/185922478</idno>
<idno type="URI">http://authority.nzetc.org/463/</idno>
<idno type="LT">Thomason Tract E.537(17)</idno>
<idno type="Wing">C695</idno>
<idno type="oldCat">
<g ref="#sym"/>345
</idno> In the last case, the identifier includes a non-Unicode character which is defined elsewhere by means of a <glyph> or <char> element referenced here as #sym . | ||||||||||
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <elementRef key="idno"/> </alternate> </content> | ||||||||||
Schema Declaration | element idno { attribute type { "ISBN" | "ISSN" | "DOI" | "URI" | "VIAF" | "ESTC" | "OCLC" }?, ( text | model.gLike | idno )* } |
<keywords> (keywords) contains a list of keywords or phrases identifying the topic or nature of a text. [2.4.3. The Text Classification] | |||||||
Module | header | ||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
| ||||||
Contained by | header: textClass | ||||||
May contain | core: term | ||||||
Note | In ELTeC texts, this element may only be used within a <textClass> element, and may contain only a sequence of <term> elements. Its usage is optional. | ||||||
Example | <textClass>
<keywords>
<term xml:lang="eng">juvenile literature</term>
<term xml:lang="deu">bildungsroman</term>
</keywords>
</textClass> | ||||||
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="term" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="list"/> </alternate> </content> | ||||||
Schema Declaration | element keywords { att.global.attributes, attribute scheme { text }?, ( term+ | list ) } |
<l> (verse line) contains a single, possibly incomplete, line of verse. [3.13.1. Core Tags for Verse 3.13. Passages of Verse or Drama 7.2.5. Speech Contents] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Member of | |
Contained by | |
May contain | |
Example | <p>Mimi chante avec un sourire gracieux et un désintéressement tout particulier des couplets
dont voici le refrain: <quote>
<l>Notre bonheur est accompli</l>
<l>Voilà le culte rétabli.</l>
</quote>
</p> |
Schematron |
<s:report test="ancestor::tei:l[not(.//tei:note//tei:l[. = current()])]"> Abstract model violation: Lines may not contain lines or lg elements.
</s:report> |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.inter"/> <classRef key="model.global"/> </alternate> </content> |
Schema Declaration | element l { att.global.attributes, ( text | model.gLike | model.phrase | model.inter | model.global )* } |
<label> (label) contains any label or heading used to identify part of a text, typically but not exclusively in a list or glossary. [3.8. Lists] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) |
Member of | |
Contained by | |
May contain | |
Example | <p>
<label>April 5.</label>-Two shoulders of mutton arrived, Carrie having arranged with
another butcher without consulting me. Gowing called, and fell over scraper coming in.
<hi>Must</hi> get that scraper removed.
</p>
<p>
<label>April 6.</label>-Eggs for breakfast simply shocking; sent them back to Borset with
my compliments, and he needn't call any more for orders.
</p> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element label { att.global.attributes, att.typed.attributes, macro.phraseSeq } |
<langUsage> (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text. [2.4.2. Language Usage 2.4. The Profile Description 15.3.2. Declarable Elements] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: profileDesc |
May contain | |
Example | provides information about the language of the novel, using <language> <langUsage>
<language ident="fra">French</language>
</langUsage> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <classRef key="model.pLike" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="language" minOccurs="1" maxOccurs="unbounded"/> </alternate> </content> |
Schema Declaration | element langUsage { att.global.attributes, ( model.pLike+ | language+ ) } |
<language> (language) characterizes a single language or sublanguage used within a text. [2.4.2. Language Usage] | |||||||||||||
Module | header | ||||||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
| ||||||||||||
Contained by | header: langUsage | ||||||||||||
May contain | |||||||||||||
Note | Particularly for sublanguages, an informal prose characterization should be supplied as content for the element. | ||||||||||||
Example | <langUsage>
<language ident="en-US" usage="75">modern American English</language>
<language ident="i-az-Arab" usage="20">Azerbaijani in Arabic script</language>
<language ident="x-lap" usage="05">Pig Latin</language>
</langUsage> | ||||||||||||
Content model | <content> <macroRef key="macro.phraseSeq.limited"/> </content> | ||||||||||||
Schema Declaration | element language { att.global.attributes, attribute ident { text }, attribute usage { text }?, macro.phraseSeq.limited } |
<licence> contains information about a licence or other legal agreement applicable to the text. [2.2.4. Publication, Distribution, Licensing, etc.] | |||||||||
Module | header | ||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend))
| ||||||||
Contained by | header: availability | ||||||||
May contain | core: p | ||||||||
Note | The TEI XML markup added to all components of ELTeC is made available under a CC-BY licence. The textual content is in the public domain. | ||||||||
Example | <licence target="https://creativecommons.org/licenses/by/4.0/">
<p>The TEI mark up is licenced with Creative Commons Attribution (CC-BY 4.0).</p>
</licence> | ||||||||
Content model | <content> <elementRef key="p" minOccurs="0"/> </content> | ||||||||
Schema Declaration | element licence { att.global.attributes, attribute target { list { + } }, p? } |
<measure> (measure) contains a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name. [3.6.3. Numbers and Measures] | |||||||||||
Module | core | ||||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.measurement (unit, @unitRef, @quantity, @commodity)
| ||||||||||
Contained by | header: extent | ||||||||||
May contain | XSD token | ||||||||||
Note | An indication of the number of words is mandatory. Indicating the number of pages is optional. If information for page or volume count is not available the relevant <measure> element should be absent. Spaces and other punctuation marks are not permitted as content of the <measure> element. | ||||||||||
Example | describes two measurements for <extent>: the number of words and the number of pages <extent>
<measure unit="words">71043</measure>
<measure unit="pages">364</measure>
</extent> | ||||||||||
Content model | <content> <dataRef key="teidata.numeric"/> </content> | ||||||||||
Schema Declaration | element measure { att.global.attributes, att.measurement.attribute.unitRef, att.measurement.attribute.quantity, att.measurement.attribute.commodity, attribute unit { "pages" | "words" | "vols" }, teidata.numeric } |
<milestone> (milestone) marks a boundary point separating any kind of section of a text, typically but not necessarily indicating a point at which some part of a standard reference system changes, where the change is not represented by a structural element. [3.11.3. Milestone Elements] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.milestoneUnit (@unit) att.typed (@type) |
Member of | |
Contained by | |
May contain | Empty element |
Note | The unit attribute should describe the kind of unit delimited by the tag, for example "subSection"; it is mandatory. If the milestone numbers or labels the unit in question, the n attribute may be used to carry the name or number given. The type attribute should describe the kind of milestone indication found in the source, for example "stars", "line", "numbering", etc.; it is optional. |
Example | <milestone unit="subSection"
type="asterisk"/> |
Example | <div type="group">
<head>BOOK THE FIRST</head>
<head>THE DAYS BEFORE TONO-BUNGAY WAS INVENTED</head>
<div type="chapter">
<head>CHAPTER THE FIRST</head>
<milestone unit="subSection" n="I"/>
<p>Most people in this world seem to live "in character" ... </p>
<!-- ... -->
<p>....of an altogether different sort from that of Tono-Bungay.</p>
<milestone unit="subSection" n="II"/>
<p>I write that much and look at it, and wonder ... </p>
</div>
</div> |
Content model | <content> <empty/> </content> |
Schema Declaration | element milestone { att.global.attributes, att.milestoneUnit.attributes, att.typed.attributes, empty } |
<name> (name, proper noun) contains a proper noun or noun phrase. [3.6.1. Referring Strings] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.datable (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) att.typed (@type) |
Member of | |
Contained by | core: respStmt |
May contain | |
Note | Not permitted outside the header |
Example | provides the name of a person who is neither an author nor a publisher <name>Christof Schöch</name> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element name { att.global.attributes, att.datable.attributes, att.typed.attributes, macro.phraseSeq } |
<note> (note) contains a note or annotation. [3.9.1. Notes and Simple Annotation 2.2.6. The Notes Statement 3.12.2.8. Notes and Statement of Language 9.3.5.4. Notes within Entries] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.pointing (@target) att.typed (@type) |
Member of | |
Contained by | |
May contain | |
Example | In this example, two authorial footnotes have been encoded. Each note is encoded with a <note> element, and carries a unique identifier on its xml:id attribute. A <ref> element replaces the siglum in the running text indicating the point where the note is attached. <body>
<!-- ... -->
<p>Au milieu de ces spirituels convives on remarquait une figure angélique, c'était celle
de la fille de madame de Condorcet, de cette ravissante Eliza <ref target="#FR0726_N1">[1]</ref> qui, à peine dans l'âge de l'adolescence, avait déjà la taille et les traits
réguliers d'une statue grecque.</p>
<!-- ... -->
<p>—Je ne sortirai point aujourd'hui, j'ai mal à la tête, une longue coiffure me
fatiguerait; Ellénore arrangera mes cheveux, et me mettra ma baigneuse <ref target="#FR0726_N2">[2]</ref>.</p>
</body>
<back>
<div type="notes">
<note xml:id="FR0726_N1">[Note 1: Elle a épousé depuis M. O'Connor.]</note>
<note xml:id="FR0726_N2">[Note 2: Sorte de bonnet négligé, qui était à la mode en ce
temps.]</note>
</div>
</back> |
Schematron |
<sch:assert test="parent::tei:div[@type='notes']"
role="ERROR">Notes must be given out of
line and inside a div of type 'notes'</sch:assert> |
Content model | <content> <macroRef key="macro.specialPara"/> </content> |
Schema Declaration | element note { att.global.attributes, att.pointing.attributes, att.typed.attributes, macro.specialPara } |
<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Member of | |
Contained by | |
May contain | |
Example | <div type="chapter">
<head> 1</head>
<p>Fräulein Lotti war soeben erwacht. ....</p>
<p>Frau Katze schüttelt den Kopf, schließt die Augen, leckt die fadendünnen Lippen und
gähnt wie ein Tiger.</p>
<p>Ihre Gebieterin hakt den Fensterflügel ein, damit die Spaziergängerin bequem eintreten
könne, wenn es ihr genehm sein würde heimzukehren....</p>
<!-- ... -->
</div> |
Content model | <content> <macroRef key="macro.paraContent"/> </content> |
Schema Declaration | element p { att.global.attributes, macro.paraContent } |
<pb> (page beginning) marks the beginning of a new page in a paginated document. [3.11.3. Milestone Elements] | |||||||
Module | core | ||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | Empty element | ||||||
Note | A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself. | ||||||
Example | A page break may be associated with a facsimile image of the page it introduces by means of the facs attribute <body>
<pb n="1" facs="page1.png"/>
<!-- page1.png contains an image of the page;
the text it contains is encoded here -->
<p>
<!-- ... -->
</p>
<pb n="2" facs="page2.png"/>
<!-- similarly, for page 2 -->
<p>
<!-- ... -->
</p>
</body> | ||||||
Example | If a page break interrupts a word the word fragments should be reassembled following it. <p>My own relations too were nobly generous and by their kindness I have been
<pb n="100"/> established in this shop, and for the last year have carried on this little
business.... </p> | ||||||
Content model | <content> <empty/> </content> | ||||||
Schema Declaration | element pb { att.global.attributes, att.typed.attributes, attribute facs { text }?, empty } |
<pc> (punctuation character) contains a character or string of characters regarded as constituting a single punctuation mark. [17.1.2. Below the Word Level 17.4.2. Lightweight Linguistic Annotation] | |||||||||||||||||||||
Module | analysis | ||||||||||||||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.segLike (@function) att.typed (@type) att.linguistic (@lemma, @lemmaRef, @pos, @msd, @join)
| ||||||||||||||||||||
Member of | |||||||||||||||||||||
Contained by | |||||||||||||||||||||
May contain | core: corr character data | ||||||||||||||||||||
Example | <phr>
<w>do</w>
<w>you</w>
<w>understand</w>
<pc type="interrogative">?</pc>
</phr> | ||||||||||||||||||||
Example | Example encoding of the German sentence Wir fahren in den Urlaub., encoded with attributes from att.linguistic discussed in section [[undefined AILALW]]. <s>
<w pos="PPER" msd="1.Pl.*.Nom">Wir</w>
<w pos="VVFIN" msd="1.Pl.Pres.Ind">fahren</w>
<w pos="APPR" msd="--">in</w>
<w pos="ART" msd="Def.Masc.Akk.Sg.">den</w>
<w pos="NN" msd="Masc.Akk.Sg.">Urlaub</w>
<pc pos="$." msd="--" join="left">.</pc>
</s> | ||||||||||||||||||||
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <elementRef key="c"/> <classRef key="model.pPart.edit"/> </alternate> </content> | ||||||||||||||||||||
Schema Declaration | element pc { att.global.attributes, att.segLike.attributes, att.typed.attributes, att.linguistic.attributes, attribute force { "strong" | "weak" | "inter" }?, attribute unit { text }?, attribute pre { text }?, ( text | model.gLike | c | model.pPart.edit )* } |
<profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. [2.4. The Profile Description 2.1.1. The TEI Header and Its Components] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: teiHeader |
May contain | |
Note | Although the content model permits it, it is rarely meaningful to supply multiple occurrences for any of the child elements of <profileDesc> unless these are documenting multiple texts. |
Example | <profileDesc
xmlns:e="http://distantreading.net/eltec/ns">
<langUsage>
<language ident="fra">French</language>
</langUsage>
<textDesc>
<e:authorGender key="M"/>
<e:size key="long"/>
<e:reprintCount key="high"/>
<e:timeSlot key="T1"/>
</textDesc>
</profileDesc> Profile for a French text, with a male author, containing more than 100,000 words, of high reprintCount, first published between 1840 and 1859. |
Example | <profileDesc
xmlns:e="http://distantreading.net/eltec/ns">
<langUsage>
<language ident="de">German</language>
</langUsage>
<textDesc>
<authorGender xmlns="http://distantreading.net/eltec/ns" key="F"/>
<size xmlns="http://distantreading.net/eltec/ns" key="long"/>
<reprintCount xmlns="http://distantreading.net/eltec/ns" key="low"/>
<timeSlot xmlns="http://distantreading.net/eltec/ns" key="T4"/>
</textDesc>
</profileDesc> Profile for a German text, with a female author, containing between 10 and 50,000 words, of low reprintCount, first published between 1900 and 1920. |
Example | If descriptive keywords are available for a text, these may be included within a <textClass> element prefixed to the <textDesc>, as in this example: <profileDesc
xmlns:e="http://distantreading.net/eltec/ns">
<langUsage>
<language ident="de">German</language>
</langUsage>
<textClass>
<keywords>
<term>bildungsroman</term>
</keywords>
</textClass>
<textDesc>
<authorGender xmlns="http://distantreading.net/eltec/ns" key="F"/>
<size xmlns="http://distantreading.net/eltec/ns" key="long"/>
<reprintCount xmlns="http://distantreading.net/eltec/ns" key="low"/>
<timeSlot xmlns="http://distantreading.net/eltec/ns" key="T4"/>
</textDesc>
</profileDesc> |
Content model | <content> <elementRef key="langUsage" minOccurs="1" maxOccurs="1"/> <elementRef key="textClass" minOccurs="0" maxOccurs="1"/> <elementRef key="textDesc" minOccurs="1" maxOccurs="1"/> </content> |
Schema Declaration | element profileDesc { att.global.attributes, langUsage, textClass?, textDesc } |
<pubPlace> (publication place) contains the name of the place where a bibliographic item was published. [3.12.2.4. Imprint, Size of a Document, and Reprint Information] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Member of | |
Contained by | core: bibl |
May contain | |
Example | <bibl type="firstEdition">
<title>Herança de lágrimas</title>
<author>Lopo de Souza</author>
<publisher>Redação do Vimaranense-Editora</publisher>
<pubPlace>Guimarães</pubPlace>
<date>1871</date>
</bibl> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element pubPlace { att.global.attributes, macro.phraseSeq } |
<publicationStmt> (publication statement) groups information concerning the publication or distribution of an electronic or other text. [2.2.4. Publication, Distribution, Licensing, etc. 2.2. The File Description] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: fileDesc |
May contain | header: availability distributor |
Note | In a published ELTeC text, the publication statement always has components as shown above: a <publisher> naming the project itself, a <distributor> specifying the Zenodo community within which the text is made available, a <date> showing the text was released, and an <availability> element indicating the licence under which it is made available. One or more <ref> elements may also follow specifying a URL from which the text may be downloaded. These details are added during the publication process, if not already present. |
Example | <publicationStmt>
<publisher ref="https://distant-reading.net">COST Action "Distant Reading for European
Literary History" (CA16204)</publisher>
<distributor ref="https://zenodo.org/communities/eltec/">Zenodo.org</distributor>
<date when="{$today}"/>
<availability>
<licence target="https://creativecommons.org/licenses/by/4.0/"/>
</availability>
<ref type="doi"
target="10.5281/zenodo.8468"/>
<ref type="raw"
target="https://raw.githubusercontent.com/COST-ELTeC/ELTeC-eng/master/level1/ENG18440_Disraeli.xml"/>
</publicationStmt> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="p"/> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="publisher"/> <elementRef key="distributor"/> <elementRef key="date"/> <elementRef key="availability"/> <elementRef key="ref" maxOccurs="unbounded" minOccurs="0"/> </sequence> </alternate> </content> |
Schema Declaration | element publicationStmt { att.global.attributes, ( p | ( publisher, distributor, date, availability, ref* ) ) } |
<publisher> (publisher) provides the name of the organization responsible for the publication or distribution of a bibliographic item. [3.12.2.4. Imprint, Size of a Document, and Reprint Information 2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref) |
Member of | |
Contained by | core: bibl header: publicationStmt |
May contain | |
Note | Not permitted outside the header |
Example | <bibl type="firstEdition">
<title>La baronne trépassée</title>
<publisher>Baudry</publisher>
<pubPlace>Paris</pubPlace>
<date>1852</date>
</bibl> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element publisher { att.global.attributes, att.canonical.attributes, macro.phraseSeq } |
<quote> (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text. [3.3.3. Quotation 4.3.1. Grouped Texts] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) att.notated (@notation) |
Member of | |
Contained by | |
May contain | |
Note | In ELTeC this element is used for any kind of quotation or pseudo quotation appearing in the body of a text, including epigraphs, citations, etc. |
Example | In this example, the two lines of verse are quoted and do not form part of the narrative: <p>О, многозначајне ли су речи покојног Његуша II:</p>
<quote>
<l>„Благо томе ко довијек живи,</l>
<l>имао се рашта и родити!...</l>
</quote> |
Content model | <content> <macroRef key="macro.specialPara"/> </content> |
Schema Declaration | element quote { att.global.attributes, att.typed.attributes, att.notated.attributes, macro.specialPara } |
<ref> (reference) defines a reference to another location, possibly modified by additional text or comment. [3.7. Simple Links and Cross-References 16.1. Links] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.pointing (@target) att.typed (@type) |
Member of | |
Contained by | |
May contain | |
Note | In ELTeC the <ref> element is used only to provide a link from the body of a text to an associated authorial note. Its content is conventionalised as shown and may be removed in a level2 version of the text. |
Example | <p>"May happen <ref target="#ENG18482_N21">[21]</ref> yo'd better take him, Alice;...</p> |
Content model | <content> <macroRef key="macro.paraContent"/> </content> |
Schema Declaration | element ref { att.global.attributes, att.pointing.attributes, att.typed.attributes, macro.paraContent } |
<reprintCount> indicates how frequently the title has been reprinted | |||||||||||
Namespace | http://distantreading.net/eltec/ns | ||||||||||
Module | derived-module-ELTeC | ||||||||||
Attributes | Attributes
| ||||||||||
Contained by | corpus: textDesc | ||||||||||
May contain | Empty element | ||||||||||
Example | <textDesc
xmlns:e="http://distantreading.net/eltec/ns">
<!-- ... -->
<reprintCount xmlns="http://distantreading.net/eltec/ns" key="high"/>
<!-- ... -->
</textDesc> | ||||||||||
Content model | <content> <empty/> </content> | ||||||||||
Schema Declaration | element reprintCount { attribute key { "high" | "low" | "unspecified" }, empty } |
<resp> (responsibility) contains a phrase describing the nature of a person's intellectual responsibility, or an organization's role in the production or distribution of a work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref) att.datable (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) |
Contained by | core: respStmt |
May contain | |
Note | The attribute ref, inherited from the class att.canonical may be used to indicate the kind of responsibility in a normalized form by referring directly to a standardized list of responsibility types, such as that maintained by a naming authority, for example the list maintained at http://www.loc.gov/marc/relators/relacode.html for bibliographic usage. |
Example | <respStmt>
<resp ref="http://id.loc.gov/vocabulary/relators/com.html">compiler</resp>
<name>Edward Child</name>
</respStmt> |
Content model | <content> <macroRef key="macro.phraseSeq.limited"/> </content> |
Schema Declaration | element resp { att.global.attributes, att.canonical.attributes, att.datable.attributes, macro.phraseSeq.limited } |
<respStmt> (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organizations which have played a role in the production or distribution of a bibliographic work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref) |
Member of | |
Contained by | |
May contain | |
Note | In ELTeC this element is used either within the <titleStmt>, where it documents responsibility for some aspect of the ELTeC text's creation, or within a <bibl> where it documents responsibity for the bibliographic item concerned (other than authorship) |
Example | <respStmt>
<resp>ELTeC conversion</resp>
<name>Leonard Konle</name>
</respStmt> |
Example | When several names are associated with the same responsibility, they may be grouped within a single <respStmt> as in the following example: <respStmt>
<resp>Original data capture</resp>
<name>Meredith Bach</name>
<name>Mary Meehan</name>
<name>Online Distributed Proofreading Team</name>
</respStmt> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="resp" minOccurs="1" maxOccurs="unbounded"/> <classRef key="model.nameLike.agent" minOccurs="1" maxOccurs="unbounded"/> </sequence> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.nameLike.agent" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="resp" minOccurs="1" maxOccurs="unbounded"/> </sequence> </alternate> <elementRef key="note" minOccurs="0" maxOccurs="unbounded"/> </sequence> </content> |
Schema Declaration | element respStmt { att.global.attributes, att.canonical.attributes, ( ( ( resp+, model.nameLike.agent+ ) | ( model.nameLike.agent+, resp+ ) ), note* ) } |
<revisionDesc> (revision description) summarizes the revision history for a file. [2.6. The Revision Description 2.1.1. The TEI Header and Its Components] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: teiHeader |
May contain | header: change |
Note | When several significant changes are recorded, each should be documented using a separate <change> element, given in reverse chronological order i.e. most recent first. |
Example | <revisionDesc>
<change when="2018-12-12">Spell check completed</change>
<change when="2018-11-01">Initial conversion to ELTeC-1 using CLIGStoELTeC stylesheet
</change>
</revisionDesc> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="list"/> <elementRef key="listChange"/> <elementRef key="change" minOccurs="1" maxOccurs="unbounded"/> </alternate> </content> |
Schema Declaration | element revisionDesc { att.global.attributes, ( list | listChange | change+ ) } |
<rs> (referencing string) contains a general purpose name or referring string. [13.2.1. Personal Names 3.6.1. Referring Strings] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) |
Member of | |
Contained by | |
May contain | |
Example | <q>My dear <rs type="person">Mr. Bennet</rs>, </q> said <rs type="person">his lady</rs>
to him one day,
<q>have you heard that <rs type="place">Netherfield Park</rs> is let at
last?</q> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element rs { att.global.attributes, att.typed.attributes, macro.phraseSeq } |
<s> (s-unit) contains a sentence-like division of a text. [17.1. Linguistic Segment Categories 8.4.1. Segmentation] | |
Module | analysis |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.segLike (@function) att.typed (@type) att.notated (@notation) |
Member of | |
Contained by | |
May contain | |
Note | The <s> element may be used to mark orthographic sentences, or any other segmentation of a text, provided that the segmentation is end-to-end, complete, and non-nesting. For segmentation which is partial or recursive, the <seg> should be used instead. The type attribute may be used to indicate the type of segmentation intended, according to any convenient typology. |
Example | <s>
<w pos="DET">Here</w>
<w pos="AUX" join="left" lemma="be">'s</w>
<w pos="DET">a</w>
<emph>
<w pos="ADV">really</w>
<w pos="ADJ">silly</w>
</emph>
<w pos="NOUN">example</w>
<pc join="left">.</pc>
</s> |
Schematron |
<s:report test="tei:s">You may not nest one s element within
another: use seg instead</s:report> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <classRef key="model.global"/> <classRef key="model.pPart.edit"/> <classRef key="model.limitedPhrase"/> </alternate> </content> |
Schema Declaration | element s { att.global.attributes, att.segLike.attributes, att.typed.attributes, att.notated.attributes, ( w | pc | model.global | model.pPart.edit | model.limitedPhrase )+ } |
<size> indicates the size group to which the text belongs | |||||||||
Namespace | http://distantreading.net/eltec/ns | ||||||||
Module | derived-module-ELTeC | ||||||||
Attributes | Attributes
| ||||||||
Contained by | corpus: textDesc | ||||||||
May contain | Empty element | ||||||||
Example | indicates that a novel contains more than 100,000 words (long) <textDesc
xmlns:e="http://distantreading.net/eltec/ns">
<!-- ... -->
<size xmlns="http://distantreading.net/eltec/ns" key="long"/>
<!-- ... -->
</textDesc> | ||||||||
Content model | <content> <empty/> </content> | ||||||||
Schema Declaration | element size { attribute key { "long" | "medium" | "short" }, empty } |
<sourceDesc> (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. [2.2.7. The Source Description] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: fileDesc |
May contain | |
Example | In ELTeC, the source or sources of a text are documented using one or more <bibl> elements of appropriate types. In this case, the ELTeC text derives from a digital version published by Éfélé in 2014 which is believed to be derived from a first edition published in Paris in 1848. <sourceDesc>
<bibl type="digitalSource">
<publisher> Éfélé</publisher>, <date>2014</date>
<ref target="http://efele.net/ebooks/livres/000067"/>
</bibl>
<bibl type="firstEdition">Paris: Furne, J.-J. Dubochet et Cie, J. Hetzel et Paulin,
<date>1848</date>. </bibl>
</sourceDesc> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <classRef key="model.pLike" minOccurs="1" maxOccurs="unbounded"/> <alternate minOccurs="1" maxOccurs="unbounded"> <classRef key="model.biblLike"/> <classRef key="model.sourceDescPart"/> <classRef key="model.listLike"/> </alternate> </alternate> </content> |
Schema Declaration | element sourceDesc { att.global.attributes, ( model.pLike+ | ( model.biblLike | model.sourceDescPart | model.listLike )+ ) } |
<span> associates an interpretative annotation directly with a span of text. [17.3. Spans and Interpretations] | |||||||||||||||||||||
Module | analysis | ||||||||||||||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.pointing (@target) att.interpLike (@inst)
| ||||||||||||||||||||
Member of | |||||||||||||||||||||
Contained by | |||||||||||||||||||||
May contain | |||||||||||||||||||||
Example | <p xml:id="para2">(The "aftermath" starts here)</p>
<p xml:id="para3">(The "aftermath" continues here)</p>
<p xml:id="para4">(The "aftermath" ends in this paragraph)</p>
<!-- ... -->
<span type="structure" from="#para2"
to="#para4">aftermath</span> | ||||||||||||||||||||
Schematron |
<s:report test="@from and @target">Only one of the attributes @target and @from may be supplied on <s:name/>
</s:report> | ||||||||||||||||||||
Schematron |
<s:report test="@to and @target">Only one of the attributes @target and @to may be supplied on <s:name/>
</s:report> | ||||||||||||||||||||
Schematron |
<s:report test="@to and not(@from)">If @to is supplied on <s:name/>, @from must be supplied as well</s:report> | ||||||||||||||||||||
Schematron |
<s:report test="contains(normalize-space(@to),' ') or contains(normalize-space(@from),'
')">The attributes @to and @from on <s:name/> may each contain only a single value</s:report> | ||||||||||||||||||||
Content model | <content> <macroRef key="macro.phraseSeq.limited"/> </content> | ||||||||||||||||||||
Schema Declaration | element span { att.global.attributes, att.interpLike.attribute.inst, att.pointing.attributes, attribute type { text }?, attribute from { text }?, attribute to { text }?, macro.phraseSeq.limited } |
<spanGrp> (span group) collects together span tags. [17.3. Spans and Interpretations] | |||||||||
Module | analysis | ||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.interpLike (@inst)
| ||||||||
Member of | |||||||||
Contained by | |||||||||
May contain | analysis: span | ||||||||
Example | <u xml:id="UU1">Can I have ten oranges and a kilo of bananas please?</u>
<u xml:id="UU2">Yes, anything else?</u>
<u xml:id="UU3">No thanks.</u>
<u xml:id="UU4">That'll be dollar forty.</u>
<u xml:id="UU5">Two dollars</u>
<u xml:id="UU6">Sixty, eighty, two dollars.
<anchor xml:id="UU6e"/>Thank you.<anchor xml:id="UU6f"/>
</u>
<spanGrp type="transactions">
<span from="#UU1">sale request</span>
<span from="#UU2" to="#UU3">sale compliance</span>
<span from="#UU4">sale</span>
<span from="#UU5" to="#UU6">purchase</span>
<span from="#UU6e" to="#UU6f">purchase closure</span>
</spanGrp> | ||||||||
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.descLike" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="span" minOccurs="0" maxOccurs="unbounded"/> </sequence> </content> | ||||||||
Schema Declaration | element spanGrp { att.global.attributes, att.interpLike.attribute.inst, attribute type { text }?, ( model.descLike*, span* ) } |
<teiHeader> (TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources. [2.1.1. The TEI Header and Its Components 15.1. Varieties of Composite Text] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | textstructure: TEI |
May contain | header: encodingDesc fileDesc profileDesc revisionDesc |
Note | One of the few elements unconditionally required in any TEI document. |
Example | <teiHeader>
<fileDesc>
<titleStmt>
<!-- information about the author and title -->
</titleStmt>
<extent>
<!-- information about the size of the text-->
</extent>
<publicationStmt>
<availability>
<!-- information about licensing and
publication of the ELTeC text-->
</availability>
</publicationStmt>
<sourceDesc>
<!-- information about the source(s) from which the
ELTeC text was derived -->
</sourceDesc>
</fileDesc>
<encodingDesc n="eltec-1">
<!-- indication of the encoding level -->
</encodingDesc>
<profileDesc>
<langUsage>
<!-- indication of the language -->
</langUsage>
<textDesc>
<!-- classification of the text according to the ELTeC
sampling criteria -->
</textDesc>
</profileDesc>
<revisionDesc>
<!-- Change log for the digital file -->
</revisionDesc>
</teiHeader> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="fileDesc"/> <elementRef key="encodingDesc"/> <elementRef key="profileDesc"/> <elementRef key="revisionDesc"/> </sequence> </content> |
Schema Declaration | element teiHeader { att.global.attributes, ( fileDesc, encodingDesc, profileDesc, revisionDesc ) } |
<term> (term) contains a single-word, multi-word, or symbolic designation which is regarded as a technical term. [3.4.1. Terms and Glosses] | |
Module | core |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) att.canonical (@ref) |
Contained by | header: keywords |
May contain | |
Note | In ELTeC this element is used only in the header, to specify a descriptive keyword for the text being documented. |
Example | <keywords xml:lang="en">
<term>silver fork</term>
<term>society</term>
</keywords> |
Schematron |
<s:assert test="child::* or child::text()[normalize-space()]"
role="ERROR">A <term> must
contain some text!</s:assert> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element term { att.global.attributes, att.typed.attributes, att.canonical.attributes, macro.phraseSeq } |
<text> (text) contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 15.1. Varieties of Composite Text] | |
Module | textstructure |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) |
Member of | |
Contained by | textstructure: TEI |
May contain | |
Example | <text>
<front>
<!-- front matter e.g. titlepage -->
</front>
<body>
<!-- body of the text -->
</body>
<back>
<!-- back matter e.g. notes-->
</back>
</text> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <elementRef key="front"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="body"/> <elementRef key="group"/> </alternate> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <elementRef key="back"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> </sequence> </content> |
Schema Declaration | element text { att.global.attributes, att.typed.attributes, ( model.global*, ( front, model.global* )?, ( body | group ), model.global*, ( back, model.global* )? ) } |
<textClass> (text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc. [2.4.3. The Text Classification] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: profileDesc |
May contain | header: keywords |
Example | <textClass>
<keywords>
<term xml:lang="eng">juvenile literature</term>
<term xml:lang="deu">bildungsroman</term>
</keywords>
</textClass> |
Content model | <content> <elementRef key="keywords"/> </content> |
Schema Declaration | element textClass { att.global.attributes, keywords } |
<textDesc> (text description) provides a description of a text in terms of its situational parameters. [15.2.1. The Text Description] | |
Module | corpus |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: profileDesc |
May contain | derived-module-ELTeC: authorGender canonicity reprintCount size timeSlot |
Note | The elements <authorGender>, <size>, <reprintCount>, and <timeSlot> are not TEI elements, and do not belong to the TEI namespace. Their namespace must be specified, either in full as in this example, or by means of a namespace prefix defined on some hierarchically superior element. Each of these four elements must be supplied exactly once, and in the order specified. |
Example | <textDesc
xmlns:e="http://distantreading.net/eltec/ns">
<authorGender xmlns="http://distantreading.net/eltec/ns" key="F"/>
<size xmlns="http://distantreading.net/eltec/ns" key="long"/>
<reprintCount xmlns="http://distantreading.net/eltec/ns" key="high"/>
<timeSlot xmlns="http://distantreading.net/eltec/ns" key="T2"/>
</textDesc> Profile for a text with a female author, containing between over 100,000 words, of high reprintCount, first published between 1860 and 1879. |
Schematron |
<sch:report test="child::*:canonicity">The element formerly known as "canonicity" has now
been renamed "reprintCount"</sch:report> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="authorGender"/> <elementRef key="size"/> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="canonicity"/> <elementRef key="reprintCount"/> </alternate> <elementRef key="timeSlot"/> </sequence> </content> |
Schema Declaration | element textDesc { att.global.attributes, ( authorGender, size, ( canonicity | reprintCount ), timeSlot ) } |
<timeSlot> specifies the time period during which the work was first published as a single volume | |||||||||
Namespace | http://distantreading.net/eltec/ns | ||||||||
Module | derived-module-ELTeC | ||||||||
Attributes | Attributes
| ||||||||
Contained by | corpus: textDesc | ||||||||
May contain | Empty element | ||||||||
Example | indicates that the novel described was first published between 1840 and 1859 (T1) <textDesc
xmlns:e="http://distantreading.net/eltec/ns">
<!-- ... -->
<timeSlot xmlns="http://distantreading.net/eltec/ns" key="T1"/>
<!-- ... -->
</textDesc> | ||||||||
Content model | <content> <empty/> </content> | ||||||||
Schema Declaration | element timeSlot { attribute key { "T1" | "T2" | "T3" | "T4" }, empty } |
<title> (title) contains a title for any kind of work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.5. The Series Statement] | |||||||||||
Module | core | ||||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.canonical (@ref)
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | |||||||||||
Note | In ELTeC, this element is available only as metadata within the TEI header. | ||||||||||
Example | <titleStmt>
<title>Wuthering Heights : ELTeC edition</title>
<!-- ... -->
</titleStmt> | ||||||||||
Example | The ref attribute may optionally be used to reference an authority file entry for the title; in this case in VIAF <titleStmt>
<title ref="viaf:194763311">El Señor de Bembibre : edición ELTeC</title>
<!-- ... -->
</titleStmt> | ||||||||||
Schematron |
<s:assert test="child::* or child::text()[normalize-space()]"
role="ERROR">provide a title for each
novel followed by the phrase "ELTeC edition" (or a similar expression in the language of
the text)</s:assert> | ||||||||||
Content model | <content> <macroRef key="macro.paraContent"/> </content> | ||||||||||
Schema Declaration | element title { att.global.attributes, att.canonical.attributes, attribute level { "a" | "m" | "j" | "s" | "u" }?, macro.paraContent } |
<titleStmt> (title statement) groups information about the title of a work and those responsible for its content. [2.2.1. The Title Statement 2.2. The File Description] | |
Module | header |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) |
Contained by | header: fileDesc |
May contain | |
Example | <titleStmt>
<title>Silas Marner: The Weaver of Raveloe : ELTec edition</title>
<author ref="viaf:89000553">Eliot, George (pseud.) (1819-1880)</author>
<respStmt>
<!-- ... -->
</respStmt>
</titleStmt> |
Example | <titleStmt>
<title ref="viaf:194763311">El Señor de Bembibre : edición ELTeC</title>
<author ref="viaf:27087132">Gil y Carrasco, Enrique (1815-1846)</author>
<respStmt>
<!-- ... -->
</respStmt>
</titleStmt> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="title" minOccurs="1" maxOccurs="unbounded"/> <classRef key="model.respLike" minOccurs="0" maxOccurs="unbounded"/> </sequence> </content> |
Schema Declaration | element titleStmt { att.global.attributes, ( title+, model.respLike* ) } |
<trailer> contains a closing title or footer appearing at the end of a division of a text. [4.2.4. Content of Textual Divisions 4.2. Elements Common to All Divisions] | |
Module | textstructure |
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.typed (@type) |
Member of | |
Contained by | |
May contain | |
Example | <div type="volume" n="1">
<!-- more chapters here -->
<div type="chapter" n="23">
<!-- more paragraphs here -->
<p>.... and to think of the money it cost!</p>
</div>
<trailer>End of the first volume.</trailer>
</div> |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <elementRef key="lg"/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.inter"/> <classRef key="model.lLike"/> <classRef key="model.global"/> </alternate> </content> |
Schema Declaration | element trailer { att.global.attributes, att.typed.attributes, ( text | lg | model.gLike | model.phrase | model.inter | model.lLike | model.global )* } |
<w> (word) represents a grammatical (not necessarily orthographic) word. [17.1. Linguistic Segment Categories 17.4.2. Lightweight Linguistic Annotation] | |||||||||
Module | analysis | ||||||||
Attributes | Attributes att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend)) att.segLike (@function) att.typed (@type) att.notated (@notation) att.linguistic (pos, @lemma, @lemmaRef, @msd, @join)
| ||||||||
Member of | |||||||||
Contained by | |||||||||
May contain | analysis: w character data | ||||||||
Example | <s>
<w pos="DET">Here</w>
<w pos="AUX" join="left" lemma="be">'s</w>
<w pos="DET">a</w>
<emph>
<w pos="ADV">really</w>
<w pos="ADJ">silly</w>
</emph>
<w pos="NOUN">example</w>
<w pos="PUNCT" join="left">.</w>
</s> | ||||||||
Example | <s>
<w pos="NOUN">Carte</w>
<w pos="ADP" lemma="des">
<w pos="PART">de</w>
<w pos="DET">les</w>
</w>
<w pos="NOUN">vins</w>
</s> | ||||||||
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <elementRef key="w"/> </alternate> </content> | ||||||||
Schema Declaration | element w { att.global.attributes, att.segLike.attributes, att.typed.attributes, att.linguistic.attribute.lemma, att.linguistic.attribute.lemmaRef, att.linguistic.attribute.msd, att.linguistic.attribute.join, att.notated.attributes, attribute pos { text }?, ( text | model.gLike | w )* } |
model.attributable groups elements that contain a word or phrase that can be attributed to a source. | |
Module | tei |
Used by | |
Members | model.quoteLike[quote] |
model.biblLike groups elements containing a bibliographic description. | |
Module | tei |
Used by | |
Members | bibl |
model.biblPart groups elements which represent components of a bibliographic description. | |
Module | tei |
Used by | |
Members | model.imprintPart[distributor pubPlace publisher] model.respLike[author respStmt] bibl extent idno title |
model.common groups common chunk- and inter-level elements. | |
Module | tei |
Used by | |
Members | model.divPart[model.lLike[l] model.pLike[p]] model.inter[model.attributable[model.quoteLike[quote]] model.egLike model.labelLike[label] model.listLike model.oddDecl model.stageLike] |
Note | This class defines the set of chunk- and inter-level elements; it is used in many content models, including those for textual divisions. |
model.dateLike groups elements containing temporal expressions. | |
Module | tei |
Used by | |
Members | date |
model.divBottom groups elements appearing at the end of a text division. | |
Module | tei |
Used by | |
Members | model.divBottomPart[trailer] model.divWrapper |
model.divBottomPart groups elements which can occur only at the end of a text division. | |
Module | tei |
Used by | |
Members | trailer |
model.divPart groups paragraph-level elements appearing directly within divisions. | |
Module | tei |
Used by | |
Members | model.lLike[l] model.pLike[p] |
Note | Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items. |
model.divTop groups elements appearing at the beginning of a text division. | |
Module | tei |
Used by | |
Members | model.divTopPart[model.headLike[head]] model.divWrapper |
model.divTopPart groups elements which can occur only at the beginning of a text division. | |
Module | tei |
Used by | |
Members | model.headLike[head] |
model.global groups elements which may appear at any point within a TEI text. | |
Module | tei |
Used by | |
Members | model.global.edit[gap] model.global.meta[span spanGrp] model.milestoneLike[milestone pb] model.noteLike[note] |
model.global.edit groups globally available elements which perform a specifically editorial function. | |
Module | tei |
Used by | |
Members | gap |
model.global.meta groups globally available elements which describe the status of other elements. | |
Module | tei |
Used by | |
Members | span spanGrp |
Note | Elements in this class are typically used to hold groups of links or of abstract interpretations, or by provide indications of certainty etc. It may find be convenient to localize all metadata elements, for example to contain them within the same divison as the elements that they relate to; or to locate them all to a division of their own. They may however appear at any point in a TEI text. |
model.headLike groups elements used to provide a title or heading at the start of a text division. | |
Module | tei |
Used by | |
Members | head |
model.hiLike groups phrase-level elements which are typographically distinct but to which no specific function can be attributed. | |
Module | tei |
Used by | |
Members | hi |
model.highlighted groups phrase-level elements which are typographically distinct. | |
Module | tei |
Used by | |
Members | model.emphLike[emph foreign title] model.hiLike[hi] |
model.imprintPart groups the bibliographic elements which occur inside imprints. | |
Module | tei |
Used by | |
Members | distributor pubPlace publisher |
model.inter groups elements which can appear either within or between paragraph-like elements. | |
Module | tei |
Used by | |
Members | model.attributable[model.quoteLike[quote]] model.egLike model.labelLike[label] model.listLike model.oddDecl model.stageLike |
model.lLike groups elements representing metrical components such as verse lines. | |
Module | tei |
Used by | |
Members | l |
model.labelLike groups elements used to gloss or explain other parts of a document. | |
Module | tei |
Used by | |
Members | label |
model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources. | |
Module | tei |
Used by | |
Members | model.emphLike[emph foreign title] model.hiLike[hi] model.pPart.data[model.addressLike model.dateLike[date] model.measureLike model.nameLike[model.offsetLike model.placeStateLike[model.placeNamePart] rs]] model.pPart.editorial model.pPart.msdesc model.phrase.xml model.ptrLike[ref] |
model.nameLike groups elements which name or refer to a person, place, or organization. | |
Module | tei |
Used by | |
Members | model.offsetLike model.placeStateLike[model.placeNamePart] rs |
Note | A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc. |
model.noteLike groups globally-available note-like elements. | |
Module | tei |
Used by | |
Members | note |
model.pLike groups paragraph-like elements. | |
Module | tei |
Used by | |
Members | p |
model.pPart.data groups phrase-level elements containing names, dates, numbers, measures, and similar data. | |
Module | tei |
Used by | |
Members | model.addressLike model.dateLike[date] model.measureLike model.nameLike[model.offsetLike model.placeStateLike[model.placeNamePart] rs] |
model.pPart.edit groups phrase-level elements for simple editorial correction and transcription. | |
Module | tei |
Used by | |
Members | model.pPart.editorial model.pPart.transcriptional[corr] |
model.pPart.transcriptional groups phrase-level elements used for editorial transcription of pre-existing source materials. | |
Module | tei |
Used by | |
Members | corr |
model.phrase groups elements which can occur at the level of individual words or phrases. | |
Module | tei |
Used by | |
Members | model.graphicLike model.highlighted[model.emphLike[emph foreign title] model.hiLike[hi]] model.lPart model.pPart.data[model.addressLike model.dateLike[date] model.measureLike model.nameLike[model.offsetLike model.placeStateLike[model.placeNamePart] rs]] model.pPart.edit[model.pPart.editorial model.pPart.transcriptional[corr]] model.pPart.msdesc model.phrase.xml model.ptrLike[ref] model.segLike[pc s w] model.specDescLike |
Note | This class of elements can occur within paragraphs, list items, lines of verse, etc. |
model.placeStateLike groups elements which describe changing states of a place. | |
Module | tei |
Used by | |
Members | model.placeNamePart |
model.ptrLike groups elements used for purposes of location and reference. | |
Module | tei |
Used by | |
Members | ref |
model.quoteLike groups elements used to directly contain quotations. | |
Module | tei |
Used by | |
Members | quote |
model.segLike groups elements used for arbitrary segmentation. | |
Module | tei |
Used by | |
Members | pc s w |
Note | The principles on which segmentation is carried out, and any special codes or attribute values used, should be defined explicitly in the <segmentation> element of the <encodingDesc> within the associated TEI header. |
att.canonical provides attributes which can be used to associate a representation such as a name or title with canonical information about the object being named or referenced. | |||||||||||
Module | tei | ||||||||||
Members | author date distributor publisher resp respStmt term title | ||||||||||
Attributes | Attributes
|
att.datable provides attributes for normalization of elements that contain dates, times, or datable events. | |
Module | tei |
Members | change date name resp |
Attributes | Attributes att.datable.w3c (@when, @notBefore, @notAfter, @from, @to) |
Note | This ‘superclass’ provides attributes that can be used to provide normalized values of temporal information. By default, the attributes from the att.datable.w3c class are provided. If the module for names & dates is loaded, this class also provides attributes from the att.datable.iso and att.datable.custom classes. In general, the possible values of attributes restricted to the W3C datatypes form a subset of those values available via the ISO 8601 standard. However, the greater expressiveness of the ISO datatypes may not be needed, and there exists much greater software support for the W3C datatypes. |
att.datable.w3c provides attributes for normalization of elements that contain datable events conforming to the W3C XML Schema Part 2: Datatypes Second Edition. | |||||||||||||||||||||||||||||||||||||
Module | tei | ||||||||||||||||||||||||||||||||||||
Members | att.datable[change date name resp] | ||||||||||||||||||||||||||||||||||||
Attributes | Attributes
| ||||||||||||||||||||||||||||||||||||
Schematron |
<sch:rule context="tei:*[@when]">
<sch:report test="@notBefore|@notAfter|@from|@to"
role="nonfatal">The @when attribute cannot be used with any other att.datable.w3c attributes.</sch:report>
</sch:rule> | ||||||||||||||||||||||||||||||||||||
Schematron |
<sch:rule context="tei:*[@from]">
<sch:report test="@notBefore"
role="nonfatal">The @from and @notBefore attributes cannot be used together.</sch:report>
</sch:rule> | ||||||||||||||||||||||||||||||||||||
Schematron |
<sch:rule context="tei:*[@to]">
<sch:report test="@notAfter"
role="nonfatal">The @to and @notAfter attributes cannot be used together.</sch:report>
</sch:rule> | ||||||||||||||||||||||||||||||||||||
Example | <date from="1863-05-28" to="1863-06-01">28 May through 1 June 1863</date> | ||||||||||||||||||||||||||||||||||||
Note | The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by XML Schema Part 2: Datatypes Second Edition, using the Gregorian calendar. The most commonly-encountered format for the date portion of a temporal attribute is Note that this format does not currently permit use of the value 0000 to represent the year 1 BCE; instead the value -0001 should be used. |
att.dimensions provides attributes for describing the size of physical objects. | |||||||||||||||||||||||||
Module | tei | ||||||||||||||||||||||||
Members | date gap | ||||||||||||||||||||||||
Attributes | Attributes
|
att.global provides attributes common to all elements in the TEI encoding scheme. | |||||||||||||||||||||||||||||||||||||||||||||
Module | tei | ||||||||||||||||||||||||||||||||||||||||||||
Members | TEI author availability back bibl body change corr date distributor div emph encodingDesc extent fileDesc foreign front gap head hi keywords l label langUsage language licence measure milestone name note p pb pc profileDesc pubPlace publicationStmt publisher quote ref resp respStmt revisionDesc rs s sourceDesc span spanGrp teiHeader term text textClass textDesc title titleStmt trailer w | ||||||||||||||||||||||||||||||||||||||||||||
Attributes | Attributes att.global.rendition (@rend)
|
att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme. | |||||||||||
Module | tei | ||||||||||
Members | att.global[TEI author availability back bibl body change corr date distributor div emph encodingDesc extent fileDesc foreign front gap head hi keywords l label langUsage language licence measure milestone name note p pb pc profileDesc pubPlace publicationStmt publisher quote ref resp respStmt revisionDesc rs s sourceDesc span spanGrp teiHeader term text textClass textDesc title titleStmt trailer w] | ||||||||||
Attributes | Attributes
|
att.linguistic provides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically <w> and <pc> in the analysis module. | |||||||||||||||||||||||||||||||||||||||||||||||||||
Module | analysis | ||||||||||||||||||||||||||||||||||||||||||||||||||
Members | pc w | ||||||||||||||||||||||||||||||||||||||||||||||||||
Attributes | Attributes
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Note | These attributes make it possible to encode simple language corpora and to add a layer of linguistic information to any tokenized resource. See section 17.4.2. Lightweight Linguistic Annotation for discussion. |
att.milestoneUnit provides an attribute to indicate the type of section which is changing at a specific milestone. | |||||||||||||
Module | core | ||||||||||||
Members | milestone | ||||||||||||
Attributes | Attributes
|
att.notated provides an attribute to indicate any specialised notation used for element content. | |||||||
Module | tei | ||||||
Members | quote s w | ||||||
Attributes | Attributes
|
att.pointing provides a set of attributes used by all elements which point to other elements by means of one or more URI references. | |||||||||
Module | tei | ||||||||
Members | licence note ref span | ||||||||
Attributes | Attributes
|
att.segLike provides attributes for elements used for arbitrary segmentation. | |||||||||
Module | tei | ||||||||
Members | pc s w | ||||||||
Attributes | Attributes
|
att.sortable provides attributes for elements in lists or groups that are sortable, but whose sorting key cannot be derived mechanically from the element content. | |||||||||||
Module | tei | ||||||||||
Members | bibl | ||||||||||
Attributes | Attributes
|
att.typed provides attributes which can be used to classify or subclassify elements in any way. | |||||||||||
Module | tei | ||||||||||
Members | att.interpLike[span spanGrp] TEI bibl change corr date div head idno label milestone name note pb pc quote ref rs s term text trailer w | ||||||||||
Attributes | Attributes
| ||||||||||
Schematron |
<sch:rule context="tei:*[@subtype]">
<sch:assert test="@type">The <sch:name/> element should not be categorized in detail with @subtype unless also categorized in general with @type</sch:assert>
</sch:rule> | ||||||||||
Note | When appropriate, values from an established typology should be used. Alternatively a typology may be defined in the associated TEI header. If values are to be taken from a project-specific list, this should be defined using the <valList> element in the project-specific schema description, as described in 23.3.1.3. Modification of Attribute and Attribute Value Lists . |
macro.paraContent (paragraph content) defines the content of paragraphs and similar elements. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.inter"/> <classRef key="model.global"/> <elementRef key="lg"/> <classRef key="model.lLike"/> </alternate> </content> |
Declaration | macro.paraContent = ( text | model.gLike | model.phrase | model.inter | model.global | lg | model.lLike )* |
macro.phraseSeq (phrase sequence) defines a sequence of character data and phrase-level elements. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.attributable"/> <classRef key="model.phrase"/> <classRef key="model.global"/> </alternate> </content> |
Declaration | macro.phraseSeq = ( text | model.gLike | model.attributable | model.phrase | model.global )* |
macro.phraseSeq.limited (limited phrase sequence) defines a sequence of character data and those phrase-level elements that are not typically used for transcribing extant documents. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.limitedPhrase"/> <classRef key="model.global"/> </alternate> </content> |
Declaration | macro.phraseSeq.limited = ( text | model.limitedPhrase | model.global )* |
macro.specialPara ('special' paragraph content) defines the content model of elements such as notes or list items, which either contain a series of component-level elements or else have the same structure as a paragraph, containing a series of phrase-level and inter-level elements. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.inter"/> <classRef key="model.divPart"/> <classRef key="model.global"/> </alternate> </content> |
Declaration | macro.specialPara = ( text | model.gLike | model.phrase | model.inter | model.divPart | model.global )* |
teidata.certainty defines the range of attribute values expressing a degree of certainty. | |
Module | tei |
Used by | |
Content model | <content> <valList type="closed"> <valItem ident="high"/> <valItem ident="medium"/> <valItem ident="low"/> <valItem ident="unknown"/> </valList> </content> |
Declaration | teidata.certainty = "high" | "medium" | "low" | "unknown" |
Note | Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter. |
teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities. | |
Module | tei |
Used by | Element:
|
Content model | <content> <dataRef key="teidata.word"/> </content> |
Declaration | teidata.enumerated = teidata.word |
Note | Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element. |
teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="language"/> <valList> <valItem ident=""/> </valList> </alternate> </content> |
Declaration | teidata.language = xsd:language | ( "" ) |
Note | The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice. A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable.
There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications. Second, an entire language tag can consist of only a private use subtag. These tags start with Examples include
The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML. |
teidata.numeric defines the range of attribute values used for numeric values. | |
Module | tei |
Used by | |
Content model | <content rend="replace"> <dataRef name="token" restriction="([\d]+)"/> </content> |
Declaration | teidata.numeric = token { pattern = "([\d]+)" } |
Note | We restrict all numeric data to positive integer values only |
teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="anyURI"/> </content> |
Declaration | teidata.pointer = xsd:anyURI |
Note | The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, |
teidata.probability defines the range of attribute values expressing a probability. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="double"/> </content> |
Declaration | teidata.probability = xsd:double |
Note | Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true. |
teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification. | |
Module | tei |
Used by | Element:
|
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="date"/> <dataRef name="gYear"/> <dataRef name="gMonth"/> <dataRef name="gDay"/> <dataRef name="gYearMonth"/> <dataRef name="gMonthDay"/> <dataRef name="time"/> <dataRef name="dateTime"/> </alternate> </content> |
Declaration | teidata.temporal.w3c = xsd:date | xsd:gYear | xsd:gMonth | xsd:gDay | xsd:gYearMonth | xsd:gMonthDay | xsd:time | xsd:dateTime |
Note | If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used. |
teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="string"/> </content> |
Declaration | teidata.text = string |
Note | Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted. |
teidata.truthValue defines the range of attribute values used to express a truth value. | |
Module | tei |
Used by | Element:
|
Content model | <content> <dataRef name="boolean"/> </content> |
Declaration | teidata.truthValue = xsd:boolean |
Note | The possible values of this datatype are 1 or true, or 0 or false. This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: teidata.xTruthValue. |
teidata.word defines the range of attribute values expressed as a single word or token. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="token" restriction="[^\p{C}\p{Z}]+"/> </content> |
Declaration | teidata.word = token { pattern = "[^\p{C}\p{Z}]+" } |
Note | Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. |