python 3.x - How to create a categoried tagged corpus reader -


i have bunch of files , categories listed in cats.txt in same folder. want create categorizedtaggedcorpusreader this. enter image description here

this how files look.

tried many ways in nltk , failed create categorizedtaggedcorpusreader, inside cats.txt have filename , category name space apart, each filename can have multiple categories.

for instance :
mail_1_adapter adapter
mail_1_alert alert
messagebody_24862499 others
etc...

can please show me better way can create corpus , make of it.

your file format fine. how did try create reader , didn't work? don't show code, there's no telling you're doing wrong. need tell reader should read categories file cats.txt, e.g. this:

 nltk.corpus.reader import categorizedtaggedcorpusreader  reader = categorizedtaggedcorpusreader(<path>, r"^[^.]*$", cat_file="cats.txt") 

your categories file cats.txt not part of corpus, used regexp ^[^.]*$ matches not containing dot. if doesn't correctly describe files, change definition needed include corpus files, exclude cats.txt.


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -