Deriving and Tabulating English Spelling-to-Sound Correspondences

Richard L. Venezky

A series of programs has been written for deriving and tabulating English spelling-to-sound and sound-to-spelling correspondences. One program scans the spelling and pronunciation of a word and decides which sounds in the pronunciation are derived from which letters in the spelling. Alternate vowel and consonant clusters are formed in each symbol string (spelling and pronunciation) and are matched by a routine that searches for such irregularities as silent letters and vocalic consonants. A dictionary containing the spelling and pronunciation of approximately 20,000 English words is the input to this program. The output is a separate record for each spelling-to-sound correspondence in a word, including the spelling of the word, its code number in the dictionary, its frequency of occurrence, and the correspondence type (consonant or vowel). These records are then sorted and input to another program which tabulates the frequency of occurrence of every spelling-to-sound and sound-to-spelling correspondence found by the first program. Separate tabulations are made for the different positions which the spelling cluster occupies in the words in which it appears (initial, medial, and final positions). This information along with the words which comprise each correspondence statistic are then listed . Separate listings of these same data for the 5,000 most common words in English, and for the graphic monosyllables have also been obtained, along with reversed and alphabetized lists of spellings and of pronunciations. Data obtained from this study represent the most complete tabulation of spelling-to-sound and sound-to-spelling correspondences ever compiled. These results have formed the basis of a recently published analysis of Fnglish orthography and are being used in a number of universities in research aimed towards the improvement of reading and spelling instruction.

