The 'Black Box' Model: Associative And Cyclic Behaviour
The description of this model would be out of the scope of this demonstrative site,
and the details of its structure should not be published to wide audiences until
it's part of a product in its development stage. Anyway, I'll try to give you the
idea, which, after all, in its general terms it's rather easy to grasp for scholars,
as it's just the definition of a historical grammar in computer terms. Any inflected
form is not generated with practical rules, empirically defined from the pattern
of the final (surface) word and designed to fill a specific 'cell' of an ideal grid.
Rather than a grid, you should imagine a 'black box', where various types of transformational
rules are stuffed in without any predefined order or outcome in mind.
Each of these rules gets an input form, defines the scope and conditions for its
application and the transformation it applies to any input form; additionally, it
carries a number of attributes defining its position in the history of the language,
both in the vertical axis (relative and absolute chronology, where available) and
in the horizontal one (attestation and synchronic classifications like 'vulgar',
'classical', etc.).
There are many different types of rules, covering all the traditional grounds of
a historical grammar: phonology, morphology, lexicon, morphophonology, etc. Each
of these types has its peculiar and well-defined behaviour in terms of order, conditions
and scopes of its application; there are even meta-rules, whose function is not
to transform the rules output but their behaviour or their attributes in specific
contexts. What is crucial anyway is that such rules can be inserted in this box
in any order at any time, and the software itself will be able to use all them as
required by each single inflection request. There is no predefined path to follow:
there are just rules, and the software applies all the ones which satisfy the input
in the way and in the order defined by its designed behaviour. In this sense the
software operates in an 'associative' manner, and the path followed by the transformations
chain from the word's theme up to the final form is not defined in advance but discovered
on the way. The software will continue transforming any form until it does not satisfy
the input of any of the existing and applicable rules; in this sense, its behaviour
is cyclic.
Model Advantages
This kind of approach has among others two big advantages, both practical and theorical:
first, it's possible to build such a complex grammar model by successive stages,
by simply adding new rules to the existing box; for any given word, the software
will be able to generate inflected forms up to the stage which can be reached by
the rules present in the box. For instance, if we ask the software to inflect a
verb but there is still no rule for verbs inflection the program will just reply
that it has not yet 'learnt' the rules to do it, just like a student who has not
yet studied the chapters on verb in his grammar.
As soon as we add one rule for the verb the software will apply it, bringing the
output forms up to the most recent stage which can be reached with the existing
rules. Of course, the output will not coincide with the desired one until all the
necessary rules are in the box: for instance, if we just have the rule which adds
the ending -ī to the theme rŏs-ā- to form the genitive singular of the first declension
the software will apply it and then stop, thus generating a (trisyllabic) form rŏs-ā-ī
marked as archaic and of 'partial' attestation (i.e. forms in -āī are attested,
but it's not implied that effectively any of them is the form rosāī itself). If
later we add the (morpho)phonological rule which transforms this final -āī into
a 'long' diphthong, the phonological rule which shortens the first element of any
final 'long' diphthong -ai, and the phonological rule which transforms this -ai
into -ae, the final outcome will be the (desired) rosae.
Of course the definition of 'desired' is just the effect of the practical decision
to stop at the 'classical' stage of the Latin language: yet we could add the rule
which transforms -ae into -e and go past the 'classical' era as well, but this is
not the chronological stage targeted by the commercial product.
The beauty of this approach is not only that the software immediately adopts any
newly inserted rule without requiring to change or add a single line of code, but
also that the new rule will be automatically used in ALL the contexts where it will
be applicable. For instance, the same rules cited above and inserted once for ever
in the box will automatically be applied to generate not only the genitive singular,
but also dative singular and nominative-vocative plural. Thus, the more rules are
added the more effective the software is in generating the full chain of transformations
leading from the theme to the desired output stage: sometimes I just have to add
a start rule which just adds an ending and this transformation is enough to trigger
a long chain of events up to the final form.
Also, by simply adding new rules we can get more outputs for the same 'cell' not
only in the diachronic axis, but also as final outcome of the inflection process:
for instance, by adding a couple of new rules for the first declension genitive
singular which add -aes and -s to the theme the outcomes for the same 'cell' (first
declension genitive singular) will be three: besides rŏs-ā-ī > ... > rŏsae
also rŏsaes (an archaic pattern attested in some inscriptions of republican age)
and rŏsās (from *-eH2-e/os). Of
course these are outcomes dating to different ages (and possibly regions), and they
are marked as such by the software, but all these forms are at the end of various
transformational chains, so they are all reported here. It's easy to imagine the
complexity introduced by such variants in the flat 'grid' model cited above, where
a single form should branch in the same unique cell to generate all the variants
from the same process. With the 'box' model, we just 'blindly' add two rules and
the software will automatically react to the new 'grammar' defined by them and their
interaction with existing rules.
All these rules are defined in plain XML files, loaded by the software on startup:
so once the software architecture is operative all what I need to do to add inflection
capabilities is editing an XML file. This way the tool can be effectively used as
a true interactive laboratory for the definition of a generative historical grammar
of Classical languages: we can "experiment" (in the true sense of the term) with
rules definitions and interactions to find out which are the simplest or best working;
for each change the software can regenerate the full inflection of all the words,
with all its historical stages, either attested or just reconstructed.
To make a practical sample, once we add the rule for rhotacism in Latin the software
would generate monstra like xrorārum
from *rŏs-ā-sŏm until we don't add a lexical constraint against the application
of rhotacism to this word's radix (which is just a reflection of the fact that this
is not an original Latin word, like most 'Mediterranean' substrate words); or like
xPoplerio from Poplosio until
we define the relative chronology of the same rule.
This leads to a fascinating project where computer science must conjugate with historical
linguistics to generate all and only the 'grammatical' (in technical sense) word
forms up to a predefined chronological stage, requiring sharp and unambiguous formulations
of all the rules with their full details. As for metrics,
this is another side of the need for full formalization of theorical aspects which
are one of the most challenging, difficult and yet rewarding aspects of applying
informatics to humanities.