Detailansicht

Statistical Properties of Turkish Words

Contemporary Printed Turkish Word Characteristics and Smoothing Techniques

DALKILIÇ, GÖKHAN

LAP Lambert Academic Publishing

ISBN/EAN: 9783838351582

Umbreit-Nr.: 4733978

Sprache: Englisch

Umfang: 140 S.

Format in cm: 0.9 x 22 x 15

Einband: kartoniertes Buch

Erschienen am 15.04.2010

Auflage: 1/2010

€ 59,00

(inklusive MwSt.)

Lieferbar innerhalb 1 - 2 Wochen

Beim Buchhandel bestellen

Zusatztext
- For speech recognition, OCR, etc. determination of the structural properties of a natural language is essential. These properties can be analyzed under two different categories; morphological and statistical analysis. For statistical analysis, a corpus which is a representative sample of the natural language is needed. Word n-gram frequencies of that corpus can be determined by using suitable algorithms and missing n-grams can be estimated by using smoothing techniques. In this study, in order to compare and apply smoothing techniques to Turkish, a corpus named TurCo was created. In order to calculate word n-grams, different algorithms were tested. After finding n-gram word lists, their characteristics were analyzed. For generalization, Zipf''s Law was applied, and to increase the accuracy in Zipf''s Law, Mandelbrot Law was applied by finding the appropriate constants of Mandelbrot. As the corpus could not be big enough to represent all of the language, smoothing techniques were used to estimate the unseen word n-grams. This study can help professionals working on speech recognition, cryptanalysis, and author recognition in Turkish.
Autorenportrait
- Feristah Örücü: She had received the B.S. and M.S. degrees in Comp Eng from DEU, Turkey. She has been a Ph.D. student and a Res Asst of Dept of Comp Eng of DEU. Gökhan Dalkiliç: He had received M.S. degrees in Comp Sci from USC, and from Ege Univ CI, Ph.D. degree in Comp Eng from DEU. He has been an Asst Prof of the Dept of Comp Eng of DEU.