python - Using downloaded NLTK data on AWS Elastic Beanstalk -
i have django app running on aws elastic beanstalk. use nltk corpus package (stopwords) obtain using nltk downloader.
for quick hack, ran nltk downloader on current (single) elastic beanstalk ec2 instance , saved needed corpus /usr/local/share/nltk_data. works on single instance when load balancer decides create new instances wiped (it survives deployments).
my question best approach here data?
should store on s3 , tie elastic beanstalk?
or, easier (and better) write (python?) script called eb configuration each new instance download , place data in folder accessible app (for life of instance)? way if need add other corpus downloads or python-specific or nltk-specific things it's happening in python , not requiring manual s3 work?
if supports writing script eb configuration, example great, not sure how exactly.
thanks!
it easy use s3 specific use-case (in combination iam , ec2 instance roles).
even fast changing data (nltk corpora slow changing assume), 1 can manually sync differences existing s3 location instances have new data available when need it.
the key give instances iam roles, using instance profiles. proper policy, can securely access s3 without having define aws credentials manually, or in script needs access aws cli on instance start, etc.
there significant security benefits using instance profiles iam permissions aws resources, eliminates hard-coding credentials scripts, git code etc.
then assuming aws cli installed on linux via apt, pip etc:
# create bucket (once). # put in region / az ec2 instances # minimize data xfer # can run these wherever bucket / data aws s3 mb s3://mybucket --region us-west-1 # sync wherever first time & whenever needed aws s3 sync /usr/local/share/nltk_data s3://mybucket # can run below on instances # # put instance startup script after install of awscli etc. # or in myscript.sh file on instance (even gist) # wherever want instance have data or sync aws s3 sync s3://mybucket/nltk_data /path/where/i/need the nice thing sync command not copy on files have not been modified when putting s3 , pulling down. makes super handy things common datasets, backups, etc.
Comments
Post a Comment