hive - IIS Logs Straming to Hadoop real time -
i trying poc in hadoop log aggregation. have multiple iis servers hosting atleast 100 sites. want to stream logs continously hdfs , parse data , store in hive further analytics.
1) apache kafka correct choice or apache flume
2) after streaming better use apache storm , ingest data hive
please suggestions , information of kind of problem statement.
thanks
you can use either kafka or flume
can combine both data hdfs
but need write code there opensource data flow management tools available, don't need write code. eg. nifi , streamsets
you don't need use separate ingestion tools, can directly use data flow tools put data hive table. once table created in hive can analytics providing queries.
let me know need else on this.
Comments
Post a Comment