Distant Reading; Institute for Documentology and Scholarly Editing;               Centre for Information Modelling - Austrian Centre for Digital Humanities, University of Graz

An Introduction to Markup, XML, and TEI

Martina Scholger

Centre for Information Modelling - Austrian Centre for Digital Humanities, University of Graz

Sahle's text wheel

© IDE

Where is the text?

From print to ELTeC edition

Left: Fanny Lewald, Clementine; Right: Fanny Lewald, ELTeC editionLeft: Fanny Lewald, Clementine; Right: Fanny Lewald, ELTeC edition
Left: Fanny Lewald, Clementine; Right: Fanny Lewald, ELTeC edition

What is this?

Lennä, lennä, leppäkerttu,
ison kiven juureen.
Lennä leikkikedon kautta
unipuuhuun suureen.
Kulta-kultalehden alla
äiti puuron keittää.
Unituutu leppäkertun
lämpimästi peittää.
Laula, laula, unilintu,
tuosku, tuomenterttu.
Nuku, punapaitulainen,
pikku leppäkerttu.

What does markup do?

Types of markup

Descriptive markup

Markup as a scholarly activity

Separation of form and content

Separation of form and content

Example from Letters 1916 (http://letters1916.maynoothuniversity.ie/item/1261)

XML – Extensible Markup Language

XML representation

© Woman Writers Project

XML elements

XML attributes

More details on XML

Well-formed XML documents

XML Terminology

© James Cummings

Test yourself!

Which of the following examples are well-formed?

 <name>Fanny Lewald</name> 
<persName><forename>Fanny</forename><surname>Lewald</surname></persName> 
<persName><forename>Fanny<surname></forename>Lewald</surname></persName> 
  <name type="person">Fanny Lewald</name> 

Test yourself!

Which of the following examples are well-formed?

 <name type=person>Fanny Lewald</name> 
 <name type="person">Fanny Lewald</Name> 
 <1name type="person">Fanny Lewald</1name> 
 <name>Fanny Lewald<person/></name> 
 <name type="person" type="writer">Fanny Lewald</name> 

Validity means

Namespaces

Exercise 1

Shakespeare's Sonnet 18
Shakespeare's Sonnet 18

oXygen XML Editor

Exercise 1: Mark up a poem in XML

Exercise 1: Mark up a poem in XML

Useful shortcuts in oXygen

MacPCDescription
Command + NCTRL + Nopen a new document
Command + SCTRL + Ssave document
Command + ECTRL + Eencloses selected content in a tag
Command + Shift + CommaCTRL + Shift + Commasurrounds selected content in a comment
Command + Shift + PCTRL + Shift + Pformats the document (pretty code)
Command + Shift + VCTRL + Shift + Vvalidates your document
Command + Shift + WCTRL + Shift + Wchecks if your document is well-formed

What kind of document can the TEI cope with?

books, journals, manuscripts, letters, rolls of papyrus, coins, notebooks, postcards, inscription tablets, web pages, etc.

Dramatic text

Shakespeare's First Folio, The Tragedie of Hamlet

Diaries and letters

Left: William Godwin's Diary; Right: Letter from Richard Quiney to William Shakespeare
Left: William Godwin's Diary; Right: Letter from Richard Quiney to William Shakespeare

Dictionaries

Oxford English Dictionary
Oxford English Dictionary

Medieval calendar and account books

Left: record of daily menu of a monastery; Right: Account books of Basel
Left: record of daily menu of a monastery; Right: Account books of Basel

Medieval manuscripts (witnesses)

Saint Patrick's Confessio

Inscriptions (squeeze) and seals

Left: Epigraphic collection; right: Seals of Salzburg Bishops

For the encoding of epigraphic documents see EpiDoc

Poems

Left: Peace and War by Guillaume Appolinare; Right: William Shakespeare's Sonnets
Left: Peace and War by Guillaume Appolinare; Right: William Shakespeare's Sonnets

Postcards

Visual Archive of Southeastern Europe

Print and online journals

Left: Spectators; Right: RIDE: A review journal for digital editions and resources

Authorial manuscripts

Left: Shelley-Godwin Archive; Right: Notebooks Hartmut Skerbisch

What is the TEI?

What does the TEI offer?

The TEI Community

The modules

Reading the TEI Guidelines

The elements

TEI Basic structure

<TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <!-- required: the document's metadata -->  <fileDesc>   <titleStmt>    <title> <!-- Title of the electronic document -->    </title>   </titleStmt>   <publicationStmt>    <p> <!-- Publication information -->    </p>   </publicationStmt>   <sourceDesc>    <p> <!-- Source description -->    </p>   </sourceDesc>  </fileDesc> </teiHeader> <text>  <front> <!-- optional front matter of a unitary text -->  </front>  <body>   <div> <!-- required body of first unitary text -->   </div>  </body>  <back> <!-- optional back matter of a unitary text -->  </back> </text> </TEI>

TEI Infrastructure (module 1: infrastructure)

The module introduces the conceptual framework of the TEI

Default text structure (module 4: textstructure)

The module describes the default high-level structure for TEI documents. A full (valid) TEI document combines metadata represented in the <teiHeader> and the document represented by a <text> element (and/or <facsimile> and/or <sourceDoc>).

TEI Header (module 2: header)

The TEI header module provides elements for the description of the encoded work‘s metadata

Elements available in all TEI documents (module 3: core)

The module describes elements which may appear in any kind of text.

Exercise 2: Mark up a poem in TEI

Exercise 2: Mark up a poem in TEI

TEI Stylesheets: HTML Output

Use 'TEI P5 XML' transformation scenario in Oxygen

TEI Customization

All TEI modules

Modules needed for our poem

Credits

Thanks to Lou Burnard, Syd Bauman, James Cummings, Sebastian Rahtz, and the whole TEI Community for sharing workshop materials!

Some important links