Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections

Gürkan Şahin; Fatih Amasyalı

-

Year 2014, Volume: 4 Issue: 7, 13 - 20, 30.12.2014

Abstract

— There are various methods about information extraction from large texts. One of them is method of templates. We developed an automatic system that aims to produce pairs which have semantic relation between them using templates. We worked with morphological resolved and unresolved datasets. We obtained better templates from morphological resolved dataset. In our experiments, we observed that if too many templates were used for producing pairs, accuracy of produced pairs decreased. Also, we obtain better results for fixed and more reliable templates with using growing datasets

Keywords

— Natural Language Processing, Information

References

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K., “Introduction to WordNet: An On-line Lexical Database”, 1993.
Automatic Extraction of Semantic Relationships Using Turkish Dictionary Definitions", Emre Yazıcı, M.Fatih Amasyalı, EMO Bilimsel Dergi, Vol. 1, No. 1, pp. 1-13, 2011
Amasyalı M. F., "Türkçe Wordnet'in Otomatik Olarak Oluşturulması", SIU 2005, 2005.
http://lucene.apache.org/core/
http://tr.wikipedia.org/wiki/Lucene
Hearst, M., ``Automated Discovery of WordNet Relations,'' in WordNet: An Electronic Lexical Database, Christiane Fellbaum (ed.), MIT Press, 1998.
htpp://maya.cs.depaul.edu/~classes/etc584/papers/brin.pdf
http://rtw.ml.cmu.edu/rtw/
Andrew Carlson1, Justin Betteridge1, Bryan Kisiel1, Burr
Settles1, Estevam R. Hruschka Jr.2, and Tom M. Mitchell.,
“Toward an Architecture for Never-Ending Language Learning”
http://tika.apache.org/
http://www.kemik.yildiz.edu.tr/?id=28
http://tr.wikipedia.org/wiki/Morfoloji
http://tr.wikipedia.org/wiki/Zemberek_%28yaz%C4%B1l%C4%B1m%29

Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections

Year 2014, Volume: 4 Issue: 7, 13 - 20, 30.12.2014

Gürkan Şahin Fatih Amasyalı

Abstract

Geniş metinlerden bilgi çıkarımı konusunda çeşitli yöntemler bulunmaktadır. Bunlardan bir tanesi de şablonlar yöntemidir. Bu çalışmada şablonlar yöntemini kullanarak aralarında belli anlamsal ilişki bulunan ikililerin elde edilmesini sağlayan otomatik bir sistem geliştirilmiştir. Çalışma kapsamında morfolojik olarak çözümlenmiş ve çözümlenmemiş veri setleri üzerinde ayrı ayrı çalışılmıştır. Morfolojik olarak çözümlenmiş veri setinden daha iyi yapıda şablonlar elde edilmiştir. Yapılan denemeler sonucunda sürekli artan sayıda şablon kullanıldığı taktirde üretilen ikililerin doğruluklarının azaldığı görülmüştür. Sabit sayıda daha güvenilir şablonlardan büyüyen veri seti üzerinde daha iyi sonuçlar elde edilmiştir.

There are various methods about information extraction from large texts. One of them is method of templates. At this paper, we developed an automatic system that aims to produce pairs which have semantic relation between them using templates. In this study, we worked with morphological resolved and unresolved datasets. We obtained better templates from morphological resolved dataset. After experiments we observed that if many templates were used for producing pairs, accuracy of produced pairs was diminished. Also, we obtain better results for fixed and more reliable templates with using growing datasets.

Keywords

Doğal Dil İşleme, Bilgi Çıkarımı, Şablonlar Yöntemi, Morfolojik Analiz, Anlamsal İlişki

References

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K., “Introduction to WordNet: An On-line Lexical Database”, 1993.
Automatic Extraction of Semantic Relationships Using Turkish Dictionary Definitions", Emre Yazıcı, M.Fatih Amasyalı, EMO Bilimsel Dergi, Vol. 1, No. 1, pp. 1-13, 2011
Amasyalı M. F., "Türkçe Wordnet'in Otomatik Olarak Oluşturulması", SIU 2005, 2005.
http://lucene.apache.org/core/
http://tr.wikipedia.org/wiki/Lucene
Hearst, M., ``Automated Discovery of WordNet Relations,'' in WordNet: An Electronic Lexical Database, Christiane Fellbaum (ed.), MIT Press, 1998.
htpp://maya.cs.depaul.edu/~classes/etc584/papers/brin.pdf
http://rtw.ml.cmu.edu/rtw/
Andrew Carlson1, Justin Betteridge1, Bryan Kisiel1, Burr
Settles1, Estevam R. Hruschka Jr.2, and Tom M. Mitchell.,
“Toward an Architecture for Never-Ending Language Learning”
http://tika.apache.org/
http://www.kemik.yildiz.edu.tr/?id=28
http://tr.wikipedia.org/wiki/Morfoloji
http://tr.wikipedia.org/wiki/Zemberek_%28yaz%C4%B1l%C4%B1m%29

There are 15 citations in total.

Details

Primary Language	Turkish
Journal Section	Akademik ve/veya teknolojik bilimsel makale
Authors	Gürkan Şahin This is me Fatih Amasyalı
Publication Date	December 30, 2014
Submission Date	October 9, 2013
Published in Issue	Year 2014 Volume: 4 Issue: 7

Cite

APA	Şahin, G., & Amasyalı, F. (2014). Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections. EMO Bilimsel Dergi, 4(7), 13-20.
AMA	Şahin G, Amasyalı F. Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections. EMO Bilimsel Dergi. December 2014;4(7):13-20.
Chicago	Şahin, Gürkan, and Fatih Amasyalı. “Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections”. EMO Bilimsel Dergi 4, no. 7 (December 2014): 13-20.
EndNote	Şahin G, Amasyalı F (December 1, 2014) Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections. EMO Bilimsel Dergi 4 7 13–20.
IEEE	G. Şahin and F. Amasyalı, “Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections”, EMO Bilimsel Dergi, vol. 4, no. 7, pp. 13–20, 2014.
ISNAD	Şahin, Gürkan - Amasyalı, Fatih. “Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections”. EMO Bilimsel Dergi 4/7 (December 2014), 13-20.
JAMA	Şahin G, Amasyalı F. Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections. EMO Bilimsel Dergi. 2014;4:13–20.
MLA	Şahin, Gürkan and Fatih Amasyalı. “Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections”. EMO Bilimsel Dergi, vol. 4, no. 7, 2014, pp. 13-20.
Vancouver	Şahin G, Amasyalı F. Geniş Metin Koleksiyonlarından İteratif Bilgi Çıkarımı Iterative Information Extraction from Large Text Collections. EMO Bilimsel Dergi. 2014;4(7):13-20.

Download Cover Image

Article Files

Full Text

EMO BİLİMSEL DERGİ
Elektrik, Elektronik, Bilgisayar, Biyomedikal, Kontrol Mühendisliği Bilimsel Hakemli Dergisi
TMMOB ELEKTRİK MÜHENDİSLERİ ODASI
IHLAMUR SOKAK NO:10 KIZILAY/ANKARA
TEL: +90 (312) 425 32 72 (PBX) - FAKS: +90 (312) 417 38 18
bilimseldergi@emo.org.tr