keronsummit.blogg.se - Part of speech tagger

#Part of speech tagger software#

(eds.) New Ways of Analyzing Variation in English, pp. Moore, D.S.: Statistics: Concepts and Controversies, 3rd edn. 3rd Revision, 2nd printing, February 1995. Santorini, B.: Part-of-speech tagging guidelines for the Penn treebank project. In: 5th International Conference on Language Resources and Evaluation, LREC 2006 (2006) Levy, R., Andrew, G.: Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2003 (2003) 246–253 (1997)ĭickinson, M., Meurers, W.D.: Detecting errors in part-of-speech annotation. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. Samuelsson, C., Voutilainen, A.: Comparing a linguistic and a stochastic tagger. In: 7th Conference of the European Chapter of the Association for Computational Linguistics, pp. Voutilainen, A., Järvinen, T.: Specifying a shallow grammatical representation for parsing purposes. (eds.) Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 133–142 (1996)Ībney, S., Schapire, R.E., Singer, Y.: Boosting applied to tagging and PP attachment. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. Magerman, D.M.: Natural language parsing as statistical pattern recognition. Michigan Slavic Studies, Ann Arbor (1992) (eds.) For Henry Kučera: Studies in Slavic Philology and Computational Linguistics. In: Mackie, A.W., McAuley, T.K., Simmons, C.

#Part of speech tagger software#

Honours thesis, Department of Computer Science and Software Engineering, University of Melbourne (2005)Ĭhurch, K.W.: Current practice in part of speech tagging and suggestions for the future. MacKinlay, A.: The effects of part-of-speech tagsets on tagger performance. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. 467–474 (2005)Ĭlark, A.: Combining distributional and morphological information for part of speech induction. Tsuruoka, Y., Tsujii, J.: Bidirectional inference with the easiest-first strategy for tagging sequence data. 1) (2005)Ĭollins, M.: Ranking algorithms for named entity extraction: Boosting and the voted perceptron. Computational Linguistics 19, 313–330 (1993)įinkel, J., Dingare, S., Manning, C., Nissim, M., Alex, B., Grover, C.: Exploring the boundaries: Gene and protein identification in biomedical text. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn treebank. 167–176 (2010)Ĭollins, M.: Discriminative training methods for Hidden Markov Models: Theory and experiments with perceptron algorithms. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. Subramanya, A., Petrov, S., Pereira, F.: Efficient graph-based semi-supervised learning of structured tagging models. In: Proceedings of the ACL 2010 Conference Short Papers, pp. Søgaard, A.: Simple semi-supervised training of part-of-speech taggers. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pp. Spoustová, D.j., Hajič, J., Raab, J., Spousta, M.: Semi-supervised training for the averaged perceptron POS tagger. Shen, L., Satta, G., Joshi, A.: Guided learning for bidirectional sequence classification. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. This process is experimental and the keywords may be updated as the learning algorithm improves. These keywords were added by machine and not by the authors. While conventions can be used in such cases to improve tagging consistency, they lack a strong linguistic basis. The status of some words may not be able to be adequately captured by assigning them to one of a small number of categories. However, I conclude by suggesting that there are also limits to this process.

That is, from improved descriptive linguistics. Rather, I suggest and begin to demonstrate that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained. The prospects for further gains from semi-supervised learning also seem quite limited.

However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better machine learning or better features in a discriminative sequence classifier. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy.