After prosodies, the second big subsystem is the syntactic
one. Any serious metrical analysis cannot do without words classification: as for
any language, we must distinguish between the purely graphical notion of word and
its linguistic reality. For instance, nobody would ever extract as full “words”
appositives like “a”, “in”, “the” from the sentence “a dog barks in the house”:
here the true words are rather “a dog”, “barks” and “in the house”, and we thus
have only 2 true wordends. As for metrics, this means that not only word types define
true and false (i.e. purely graphical) wordends, but that several phenomena (e.g.
accent, elision, sentence position, etc.) are strictly related to them.
Thus, for Greek we deal with words classification by distinguishing among the so-called
lexical words (the “full” semantic words) and the crucial class of appositives,
which usually have higher textual frequency and a very small size. They in turn
include words with and without accent (clitics), which finally part into enclitics
and proclitics according to their connection to the left or the right.
The syntactic subsystem uses very complex syntagmatic algorithms to take into account
all the surface changes of these words and detect their nature. It takes all the
data about appositives from a relational database, and it enables the metrical subsystem
to deal with some 16 types of wordends, as defined by the combination of 4 factors
(true or false -i.e. merely graphical- wordend, presence of hiatus between words,
presence of aspiration in hiatus, presence of elision in hiatus: you can test these
differences online by querying the sample database here).
All its data are stored as usual in the data layers linked to the text segments.
A trivial sample is enough to show how different are the results when we apply this
linguistical analysis. Here are two charts showing the distribution of purely graphical
wordends (top) and the distribution of 'true' wordends (bottom) in the sample hexametric
text for this site (Aratus Phaenomena):
As you can see, we have peaks in the locations corresponding to the main caesurae,
but they look very uneven (e.g. the trithemimeres is even higher than the much more
important penthemimeres!), and we even find peaks in unexpected positions. Also
the most severe bridges like Lehrs and Hermann show an unexpectedly high number
of violations. If instead we take into account only ‘true’ wordends (right) everything
changes. Here you can clearly see that we have the expected balance among the different
caesurae, and even the bridges correspond to much deeper valleys. So the syntactic
stage is really crucial for a correct metrical analysis.