python 3.x - How to create a categoried tagged corpus reader -
i have bunch of files , categories listed in cats.txt in same folder. want create categorizedtaggedcorpusreader this.
this how files look.
tried many ways in nltk , failed create categorizedtaggedcorpusreader, inside cats.txt have filename , category name space apart, each filename can have multiple categories.
for instance :
mail_1_adapter adapter
mail_1_alert alert
messagebody_24862499 others
etc...
can please show me better way can create corpus , make of it.
your file format fine. how did try create reader , didn't work? don't show code, there's no telling you're doing wrong. need tell reader should read categories file cats.txt
, e.g. this:
nltk.corpus.reader import categorizedtaggedcorpusreader reader = categorizedtaggedcorpusreader(<path>, r"^[^.]*$", cat_file="cats.txt")
your categories file cats.txt
not part of corpus, used regexp ^[^.]*$
matches not containing dot. if doesn't correctly describe files, change definition needed include corpus files, exclude cats.txt
.
Comments
Post a Comment