Metrics: Overview
Here you can see a simplified overview of the main stages building the metrical
analysis process, from the text to the observed data with their rich query and reporting
features.
Text extraction and parsing
Reading this schema from top left to bottom right, first of all we need texts to
analyze. This system components can get text from several sources, mainly Packard
Humanities cd-roms (by automatically converting their Beta code into Unicode: for
a demo cf. Ibis, anyway I
have created a specialized application which browses,
extracts and converts texts from such cd-roms); of course, other digital formats are
allowed (whatever their encoding is) or even from direct typing (cf. the Word
addin Theuth).
Once we have a text, a smart parsing removes all the irrelevant noise characters
from it: some of these characters (like line numbers) are used to collect metadata,
others (like non-textual characters as diplai, parentheses, etc.) are just discarded
(but they will be restored at their place when generating an output). Finally, the
text filtered in this way is normalized so that the phonemic analysis can get an
input where all the confusing or irrelevant variants (uppercase, lowercase, final
or lunate sigma, etc.) have been removed.