Keyphrase Digger (KD) is a rule-based system for keyphrase extraction. It is a Java re-implementation of KX tool (Pianta and Tonelli, 2010) with a new architecture and new features. KD combines statistical measures with linguistic information given by PoS patterns to identify and extract weighted keyphrases from texts.
Extraction of multi-words
Multilinguality (EN, IT, and DE)
Easily extendible to other languages
Higher customizability than KX
High processing speed
Clustering of keyphrases under the same lemma
Various accepted formats and PoS tagsets: Stanford PoS Tagger (EN), TreeTagger (IT and EN), TextPro (IT and EN)
Boost of specific PoS patterns
Integration of Apache Lucene Library
Moretti, G., Sprugnoli, R., Tonelli, S. “Digging in the Dirt: Extracting Keyphrases from Texts with KD“. In Proceedings of the Second Italian Conference on Computational Linguistics (CLiC-it 2015), Trento, Italy.
DOWNLOAD KD SOFTWARE PACKAGE.
[Current release v1.2: German added + new function to add a new language + bug fixes.]
TRY THE ONLINE DEMO.