Proceedings NEMLAP/CONLL98, Sydney, 185-194, 1998Modularity in Inductively-Learned Word Pronunciation Systems Antal van den Bosch1, Ton Weijters2, Walter Daelemans11 ILK / Computational Linguistics 2 Department of Information TechnologyTilburg University Eindhoven University of TechnologyP.O. Box 90153 P.O. Box 513NL-5000 LE Tilburg NL-5600 MB EindhovenThe Netherlands The Netherlandsfantalb, A.J.M.M.Weijters@tm.tue.nlAbstractIn leading morpho-phonological theories andstate-of-the-art text-to-speech systems it isassumed that word pronunciation cannot belearned or performed without in-between anal-yses at several abstraction levels (e.g., mor-phological, graphemic, phonemic, syllabic, andstress levels). We challenge this assump-tion for the case of English word pronunci-ation. Using igtree, an inductive-learningdecision-tree algorithms, we train and testthree word-pronunciation systems in which thenumber of abstraction levels (implemented assequenced modules) is reduced from ve, viathree, to one. The latter system, classifyingletter strings directly as mapping to phonemeswith stress markers, yields signi cantly bettergeneralisation accuracies than the two multi-module systems. Analyses of empirical resultsindicate that positive utility e ects of sequenc-ing modules are outweighed by cascading er-rors passed on between modules.1 IntroductionLearning word pronunciation can be a hard taskwhen the relation between the spelling of a languageand its corresponding pronunciation is many-to-many. The English writing system and its pronunci-ation are a notoriously complex example, caused byan apparent con ict between analogy and inconsis-tency:Analogy. When two words or word chunks have asimilar spelling, they tend to have a similar pro-nunciation. This tendency (which generalises toother language tasks as well) is usually referredto as the analogy principle(De Saussure, 1916;Yvon, 1996; Daelemans, 1996).This research was partially performed by the rstand seco


