—Language Identification (LID) is the automated
process of identifying what language is being spoken from a
sample of speech by an unknown speaker. In this work we
present a web-based LID system using Shifted Delta Cepstral
(SDC) features derived from Mel-Frequency Cepstral
Coefficients to gather relevant acoustic information from speech
signals, and Gaussian Mixture Models (GMM) as a classifier.
Speech corpora comprising four languages (English, Spanish,
French and German) were made up of recordings from audio
media found on the Internet. A web implementation was done
using up-to-date web technologies with GNU Octave running on
the server side to perform numerical computations. Results
showed a system accuracy ranging from 72.5% to up to 80%
depending on the duration of speech test segments.
—GMM, language identification, MFCC, SDC.
The authors are with the Faculty of Engineering, National Autonomous
University of Mexico, Coyoacán, Mexico City, 04510, Mexico (e-mail:
Cite:Mauricio M. Olvera, Angel Sánchez, and Larry H. Escobar, "Web-Based Automatic Language Identification System," International Journal of Information and Electronics Engineering vol. 6, no. 5, pp. 304-307, 2016.