'A good dictionary/corpus to crosscheck plural nouns
I am using "nltk" to identify nouns and then "inflect" to find the plural form of the noun. I have added a contingency where the plural form is crosschecked with a dictionary/corpus and if that plural word is not present then append "(s)" rather than using the plural form. The following is a small part of the code (the crosschecking part).
import inflect
word = input()
p = inflect.engine()
pluralized = p.plural(word
with open("words.rtf") as f:
text = f.read().strip().split()
if pluralized in text:
newword = pluralized
else:
newword = word+"(s)"
print(word," : ",newword)
The problem is that the dictionary/corpus I am using, "words.rtf" doesn't have most of the plural forms of possible words. Is there a text file with more plural examples or a better way to crosscheck. I want to reject plurals of abbreviations and acronyms, and accept only plurals of proper English words. For example,
knife: knives
ID: ID(s) #not IDS
Solution 1:[1]
If you're looking for something to help with inflections you can checkout pyInflect or LemmInflect. These will do a much better job for you than NLTK.
If you're really just looking for a list of words, check out the Debian package wamerican. If you're on Linux it's probably already installed in /usr/share/dict
. For windows, I believe you can use 7-zip or several other programs to extract the .deb file, then just use the word list inside the archive.
There are also larger list like wamerican-large, -huge and -insane and wbritish versions (see similar packages on the right side of the wamerican page).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | bivouac0 |