Mathematical Linguistics


Mathematical Linguistics

 

a mathematical discipline that develops a formal apparatus for describing the structure of natural languages and of some formal languages.

Mathematical linguistics arose in the 1950’s as a result of the urgent need to clarify basic concepts in linguistics. Mathematical linguistics chiefly makes use of algebra, the theory of algorithms, and the theory of automatons. Although not a part of linguistics, mathematical linguistics has developed in close relation to it. A linguistic field of investigation that employs mathematics is sometimes called mathematical linguistics.

The mathematical description of language is founded on F. de Saussure’s concept of language as a mechanism whose functioning is revealed in the speech habits of its users. Speech results in “correct texts,” that is, sequences of speech units that adhere to definite laws, many of which can be described mathematically. The study of the methods of mathematically describing correct texts (primarily, sentences) is one branch of mathematical linguistics and is called the theory of descriptive methods for syntactic structures. To describe the syntactic structure of a sentence, we may either isolate its constituents— words that function as complete syntactic units—or indicate for each word those words (if any) that are directly dependent on it. Thus, in the sentence loshadi kushaiut oves (horses eat oats), a description using the first method will yield the following constituents: the entire sentence I, each separate word, and the phrase C = Kushaiut oves (eat oats; see Figure 1).

Figure 1. (I) sentence, (C) phrase. The Russian sentence translates “Horses eat oats.” The arrows indicate direct embedding.

The second method yields the scheme shown in Figure 2. The mathematical means used to describe sentence structure is called a tree of constituents (first method) or a tree of syntactic subordination (second method).

Figure 2. The Russian sentence translates “Horses eat oats”

Another branch of mathematical linguistics, and one that occupies a central place in it, is the theory of formal grammars, whose chief proponent is N. Chomsky. Chomsky studies methods of describing the lawlike regularities that characterize not only isolated texts, but the entire set of correct texts in a given language. These lawlike regularities are described by constructing a “formal grammar”—an abstract device that can produce, by means of a uniform procedure, correct texts in a given language and that permits the description of the structure of these texts.

The most widely used type of formal grammar is generative, or Chomskian, grammar. Generative grammar is an ordered system Г = 〈 V, W, I, R 〉, where V and W are disjoint finite sets, I is an element of W, and R is a finite set of rules of the type φ→ψ, where φ and ψ are chains (finite sequences) of elements in V and W. If φ → ψ is a rule of grammar Γ and ω1 and ω2 are chains of elements in V and W, we say that the chain ω1ψω2 can be immediately derived in Γ from ωφω2. If ξ0, ξ1, …, ξn are chains and the chain ξi is immediately derivable from, ξi−1 for every i = 1, … n, we say that ξn is derivable from ξ0 in Γ. The set of chains of elements in V derivable in Γ from I is called the language that can be generated by the grammar Γ. If all rules in Γ have the form A → ψ, where A is an element of W, Γ is called a contextless, or context-free, grammar.

Interpreted linguistically, the elements of V are generally words, the elements of W are symbols for grammatical categories, and I is the symbol for the category “sentence.” In a context-free grammar, the derivation of a sentence yields a tree of constituents in which each constituent consists of words that derive from a single element of W, so that, the grammatical category of each constituent is indicated. Thus, if a given grammar contains the rules

(1) ISx,y, nom Vy

(2) VyVytSx,y, acc

(3) Smasc, sing, accoves

(4) Sfem, pl, nomloshadi

(5) Vpltkushaiut

where Vy denotes the category “verb group in number y,” vyt denotes the category “transitive verb in number y,” and Sx,y,z denotes the category “noun of gender x, in number y, and in case z,” the sentence loshadi kushaiut oves has the derivation depicted in Figure 3. Formal grammars are used to describe not only natural languages but also formal languages, particularly programming languages.

Figure 3. (f) sentence, (V) verb, (S) substantive, (pi) plural, (fern) feminine, (nom) nominative, (t) transitive, (masc) masculine, (sing) singular, (ace) accusative. The Russian sentence translates “Horses eat oats.”

Mathematical linguistics also deals with the study of analytic models of language. In these models, formal constructions are produced on the basis of the intuitive knowledge of certain speech data (for example, sets of correct sentences); formal constructions provide information about the structure of the language. The application of mathematical linguistics to real languages is part of the study of linguistics.

REFERENCES

Chomsky, N. Sintakticheskie struktury. In Novoe v lingvistike, issue 2. Moscow, 1962. (Translated from English.)
Gladkii, A. V., and I. A. Mel’chuk. Elementy matematicheskoi lingvistiki. Moscow, 1969.
Marcus, S. Teoretiko-mnozhestvennye modeli iazykov Moscow, 1970. (Translated from English.)
Gladkii, A. V. Formal’nye grammatiki i iazyki. Moscow, 1973.

A. V. GLADKII