1 Transcription

The aim of a diplomatic transcription is to represent the text of a primary source with as little editorial intervention as possible.

Encodeable textual features include:

page layout
orthography
word division
punctuation
abbreviations
additions and deletions
errors and omissions
variant letter forms

1.1 Text

Manuscript text should be transcribed as UTF8-encoded electronic text. No entities (e.g. á for á) need be used.

Each distinct word, and any markup associated with it, should be contained within a <w> element. The following example demonstrates this method, as well as several other aspects of text encoding discussed in this section:

<w>
<choice>
  <reg>konungasynir</reg>
  <orig>k<choice>
    <am>̅</am>
    <ex>on</ex>
   </choice>ga<g ref="#slong"/>yner</orig>
</choice>
</w>

1.1.1 Orthography and spacing

The text should be transcribed exactly as it is with respect to orthography and spacing. With the exception of small capitals, used to denote geminates (principally N and R, but potentially also D, G, M, S and T), variant forms of the same letter (allographs) need not be distinguished. It may, in some cases, be deemed necessary also to distinguish between:

high and round s
ordinary and round r (r-rotunda)
ordinary and insular forms of f and v
ordinary and uncial forms of d, e, m and t

The means of encoding these characters are discussed below (1.1.6 Variant letter forms).

Note that only ligatures with an independent phonemic value (a and e, double a etc.) are to be represented; ligatures which are the result of graphic economy should be treated as two separate characters (high s + t, for example).
If desired, a <hi> element may be used to indicate text which has been emphasised by the scribe (e.g. large initials):

Here the value of the rend attribute indicates that the initial is two lines high.

1.1.2 Abbreviations and expansions

Expand abbreviations in accordance with the normal spelling of the scribe in question. Use <am> to encode the abbreviation marker and <ex> to indicate supplied letters. Surround both of these with <choice>:

H<choice>
<am>:</am>
<ex>elgi</ex>
</choice>

The <am> element is optional, and if it is not present, then <choice> may also be omitted. Encoding abbreviation expansions is required:

1.1.3 Corrections and emendations

Use the <supplied> element to indicate letters which have been added by the editor. The reason attribute is used to indicate the reason for the editorial intervention. Typical values for reason include:

illegible: letters or words which are now unreadable but assumed to have originally been in the manuscript (which in a printed edition would be placed in square brackets)
omitted: letters or words assumed to have been inadvertently omitted by the scribe (which in a printed edition would be placed in angle brackets)
damaged: letters or words assumed to have originally been in the manuscript but are now illegible due to damage

maklig<supplied reason="omitted">t</supplied>

lid<supplied reason="illegible">z</supplied>

If the supplied text has been taken from another authority, such as a printed edition, this information should be encoded within the source attribute. The value of this attribute should correspond to a valid xml:id from the project bibliography:

The <supplied> element should only be used when the missing text can be reconstructed with a very high degree of accuracy. When such is not the case <gap> should be used instead, with both an reason and an extent attribute. The extent should be given as the number of characters presumed missing, which can then be made to display as a series of small noughts, as is customary in a printed edition.

Editorial corrections may be encoded using <sic>, which contains the text from the manuscript, and <corr>, which contains the editor's correction. The <corr> element must always have either a source or resp (e.g. resp="#BES") attribute:

<w>
<choice>
  <sic>
   <choice>
    <orig>g<choice>
      <am>͛</am>
      <ex>ior</ex>
     </choice>it</orig>
    <reg>giorit</reg>
   </choice>
  </sic>
  <corr source="#KaaGri2000">
   <choice>
    <orig>g<choice>
      <am>͛</am>
      <ex>ior</ex>
     </choice>ir</orig>
    <reg>gerir</reg>
   </choice>
  </corr>
</choice>
</w>

1.1.4 Additions and deletions

Additions and deletions made in the manuscript by the scribe or in another hand should be indicated with the <add> and <del> elements; further information may (but need not) be given as attribute values. The placement of additions on the page is indicated with the place attribute:

For deletions, the type attribute might include:

overstrike: the text has been struck-through
overwrite: the text has been overwritten
erase: the text has been erased
subpunction: dots have been placed below the text so as to indicate deletion
superpunction: dots have been placed above the text so as to indicate deletion

If a portion of text has been substituted by the scribe, then the deletions and additions should be placed within a <subst> element:

The <surplus> element may be used to indicate text deemed superfluous by an editor, for example in the case of a dittography. A reason attribute should indicate why the editor deems the material extraneous:

1.1.5 Punctuation

The <pc> element is used to encode punctuation characters. Original punctuation from the manuscript is optional, but a level of normalised punctuation is required. When using normalised punctuation, modern standards for capitalization (e.g. at sentence beginnings) should be observed:

1.1.6 Variant letter forms

In virtually all cases, Unicode code points are available to encode the abstract characters found in a manuscript. However, in some cases it may be desirable to provide the option of different renderings of a given character. In such cases, the <g> element may be used, with a ref attribute pointing to a definition of the required character or glyph in the <encodingDesc> section of the document header. Variant letter forms are optional for semi-diplomatic transcriptions.

In the text:

<g ref="#slong"/>mijdi

In the document header:

<glyph xml:id="slong">
<glyphName>LATIN SMALL LETTER LONG S</glyphName>
<mapping type="dipl">ſ</mapping>
<mapping type="codepoint">U+017F</mapping>
<mapping type="norm">s</mapping>
</glyph>

1.1.7 Quotations

Direct speech may be encoded by using <q>:

<q>
<w>
  <choice>
   <reg>Leitið</reg>
   <orig>leitid</orig>
  </choice>
</w>
<w>
  <choice>
   <reg>þangat</reg>
   <orig>þangad</orig>
  </choice>
</w>
<w>
  <choice>
   <reg>fyrst</reg>
   <orig>
    <g ref="#fins"/>yr<g ref="#slong"/>t</orig>
  </choice>
</w>
<pc>,</pc>
</q>

Note: This element is optional.

1.1.8 Proper names

Proper names (e.g. of people or places) should be encoded with the <name> element. The type of name (person, place, animal or object) should be indicated with the type attribute. In semi-diplomatic transcriptions, capitalisation may be normalised:

<name type="person">
<w>
  <choice>
   <orig>haughne</orig>
   <reg>Högni</reg>
  </choice>
</w>
</name>

1.2 Structure

1.2.1 Boundaries

Indicate line-, column- and page-boundaries using the empty milestone tags <lb>, <cb> and <pb>, giving a number for each as the value of the n attribute.

These elements should come at the beginning of the line/column/page to which they refer.

1.2.2 Divisions

Large structural divisions in the text, such as chapters, should be encoded using the <div> element and the type and n attributes:

Each <div> will contain one or more <p> elements.

1.2.3 Headings

If desired, chapter headings may be tagged using <head>, which is placed immediately after <div> and before the first <p>. The nature of the <head>, i.e. whether it is found in the manuscript itself or supplied by an editor, should be indicated in the value of the type attribute, as in the following examples:

<head type="rubric">I. Cap<choice>
<am>.</am>
<ex>itulum</ex>
</choice>
</head>

<head type="supplied">Chapter 3</head>

1.2.4 Formework, catchwords

Use the <fw> element to indicate catchwords. The type attribute is used to indicate the type of catchword. Typical values for type include:

catch: the catchword is located at the bottom of a page and matches the first word of the next page as normal
pseudocatch: a word is positioned like a catchword, but the first word on the following page does not match
last-line-catch: the catchword is written as part of the last line of text and matches the first word of the next page

1.3 Normalisation

The original orthography from the manuscript must be encoded, but an additional level of normalisation may be added. The <reg> element is used to mark the normalised form of a word and the <orig> element to indicate the original spelling. These elements are grouped as alternatives using the <choice> element:

<choice>
<reg>hallardyra</reg>
<orig>hallardira</orig>
</choice>

Where the two forms are the same there is obviously no need to use <orig> and <reg>.

More complex encoding is possible, using other elements discussed previously in this section, such as <am> and <ex>:

<choice>
<reg>konungr</reg>
<orig>k<choice>
   <am>:</am>
   <ex>ongur</ex>
  </choice>
</orig>
</choice>