logo

ELTeC Summary Page

As well as the following summary statistics, this page provides links to human-readable versions of each text currently included in the European Literary Text Collection (ELTeC). Click on a language code in the table below to see a list of texts now available in that language. Then click on the identifier of a text to see a simple rendering of the text as produced by CETEIcean. The original source files are stored in a GitHub repository at COST-ELTeC, and may be downloaded freely from there.

The following tables list three different flavours of ELTeC corpus. All ELTeC corpora are encoded in TEI-XML according to one of the ELTeC schemas. The ELTeC core corpora are, as far as possible, comparable in size and composition. Each contains a balanced selection of 100 texts respecting all the criteria defined for the ELTeC project. The ELTeC plus corpora contain smaller collections of texts which cover the same period of time as the core corpora, but which do not meet the balance criteria defined for the project: in some cases, the criteria simply could not be satisfied because the required mixture of texts did not exist; in other cases, future iterations of the collection may contain additional texts. The ELTeC extended corpora are ELTeC-conformant in their encoding, but selected according to different design criteria, either to provide additional texts for the same time period, or to provide coverage of a different time period.

The E5C column gives the conformance score calculated for each repository and is displayed in green if conformance is high. The other columns give counts for each of the four balance criteria, with numbers in red indicating that this criterion is unsatisfied. Hovering over the last figure in each column displays the E5C score calculated for that criterion.

This remains a work in progress! Comments and reports of any problems are much appreciated: send them to the WG1 Issue Tracker.

ELTeC-core AUTHORSHIP LENGTH TIME SLOT REPRINT COUNT
Language Last update Texts Words Male Female 1-title 3-title Short Medium Long 1840-59 1860-79 1880-99 1900-20range Frequent Rare E5C
cze 2021-04-09 100 5621667 88 12 62 6 43 49 8 12 21 39 28 27 1 19 80.00
deu 2022-04-19 100 12738842 67 33 35 9 20 37 43 25 25 25 25 0 48 46 96.92
eng 2022-03-19 100 12227703 49 51 70 10 27 27 46 21 22 31 26 10 32 68 100.00
fra 2022-01-24 100 8712219 66 34 58 10 32 38 30 25 25 25 25 0 44 56 101.54
hun 2022-01-24 100 6948590 79 21 71 9 47 31 22 22 21 27 30 9 32 67 100.00
pol 2022-04-21 100 8500172 58 42 1 33 33 35 32 8 11 35 46 38 39 61 80.00
por 2022-03-15 100 6799385 83 17 73 9 40 41 19 13 37 19 31 24 26 60 94.62
rom 2022-05-31 100 5951910 79 16 59 9 49 31 20 6 21 25 48 42 24 76 83.08
slv 2022-02-02 100 5682120 89 11 26 5 53 39 8 2 13 36 49 47 48 52 78.46
spa 2022-05-16 100 8737928 78 22 46 10 34 35 31 23 22 29 26 7 46 54 100.00
srp 2022-03-17 100 4931503 92 8 48 11 55 39 6 2 18 40 40 38 38 62 80.77
ELTeC-plus AUTHORSHIP LENGTH TIME SLOT REPRINT COUNT
Language Last update Texts Words Male Female 1-title 3-title Short Medium Long 1840-59 1860-79 1880-99 1900-20range Frequent Rare E5C
gle 2022-04-08 1 24471 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1.54
gre 2022-01-24 17 98607 13 4 14 1 17 0 0 0 2 8 7 8 4 7 52.31
gsw 2022-04-11 47 3017496 29 18 18 7 13 28 6 1 2 15 29 28 0 0 50.00
hrv 2022-01-26 21 1440018 21 0 4 0 6 12 3 6 12 2 1 11 1 0 23.08
ita 2022-05-05 70 5535905 59 11 29 5 26 30 14 8 18 21 23 15 39 4 70.77
lit 2022-05-25 32 947634 25 7 18 1 24 3 5 6 4 6 16 12 9 23 60.00
lav 2022-04-28 31 2553907 27 4 14 1 10 14 7 0 2 6 23 23 4 26 52.31
nor 2022-04-28 57 3527715 40 17 21 12 28 19 10 4 3 32 18 29 32 25 70.00
swe 2021-04-11 58 4960085 29 28 18 8 16 24 18 15 3 20 20 17 17 41 76.92
ukr 2021-04-09 50 1840062 37 13 23 7 34 13 3 5 10 11 24 19 30 20 70.77
ELTeC-extension AUTHORSHIP LENGTH TIME SLOT REPRINT COUNT
Language Last update Texts Words Male Female 1-title 3-title Short Medium Long 1840-59 1860-79 1880-99 1900-20range Frequent Rare E5C
nor-ext 2022-04-27 5 187124 2 3 5 0 4 1 0 0 0 2 3 3 3 2 35.38
fra-ext1 2022-04-07 370 32942955 20 8 38 8 81 161 128 98 150 115 7 143 11 17 54.86
fra-ext2 2022-03-25 100 7549824 80 18 49 3 48 30 22 0 0 0 0 0 0 0 76.92
fra-ext3 2022-04-07 17 1220673 0 0 5 1 5 9 3 17 0 0 0 17 0 0 22.31
eng-ext 2022-03-26 14 1798258 7 7 9 1 3 2 9 1 5 0 8 8 5 6 68.46
srp-ext 2022-03-09 20 331568 17 3 12 0 20 0 0 0 1 9 10 10 6 14 48.46
por-ext 2021-09-22 21 894495 18 3 21 0 13 5 3 1 5 8 7 7 5 9 56.92

Summary produced: 2022-05-31