amazon s3 - Is DynamoDB suitable as an S3 Metadata index? -


i store , query large quantity of raw event data. architecture use 'data lake' architecture s3 holds actual event data, , dynamodb used index , provide metadata. architecture talked , recommended in many places:

however, struggling understand how use dynamodb purposes of querying event data in s3. in link aws blog above, use example of storing customer events produced multiple different servers:

s3 path format: [4-digit hash]/[server id]/[year]-[month]-[day]-[hour]-[minute]/[customer id]-[epoch timestamp].data

eg: a5b2/i-31cc02/2015-07-05-00-25/87423-1436055953839.data

and schema record event in dynamodb looks like:

customer id (partition key), timestamp-server (sort key), s3-key, size 87423, 1436055953839-i-31cc02, a5b2/i-31cc02/2015-07-05-00-25/87423-1436055953839.data, 1234 

i perform query such as: "get me customer events produced servers in last 24 hours" far understand, it's impossible efficiently query dynamodb without using partition key. cannot specify partition key kind of query.

given requirement, should use database other dynamodb record events in s3? or need use different type of dynamodb schema?

the architecture looks fine , feasible using dynamodb database. dynamodbmapper class (present in aws sdk java) can used create model has useful methods data s3.

dynamodbmapper

gets3clientcache() returns underlying s3clientcache accessing s3.

dynamodb database can't queried without partition key. have scan whole dynamodb database if partition key not available. however, can create global secondary index (gsi) on date/time field , query data use case.

in simple terms, gsi similar index present in rdbms. difference can directly query gsi rather main table. normally, gsi required if query dynamodb use case when partition key not available. there options available include all (or) selective fields present in main table in gsi.

global secondary index (gsi)

difference between scan , query in dynamodb

yes, in use case, looks gsi can't use case requires a range query on partition key. dynamodb supports equality operator. dynamodb supports range queries on sort keys or other non-key attributes if partition key available. may have scan dynamodb fulfill use case costly operation.

either have think alternate data model can query partition key or use other database.


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -