hadoop - Filtering time based data in Pig -


i'm using pig 0.11.1 in local mode now, loading data csv.

so far, i've been able load our data set , perform required calculations on it. next step take samples data , perform same calculations. replicate existing processes, want grab data point every fifteen minutes.

this trouble comes in. can write filter in pig match if data point on fifteen minute interval, how grab data points near fifteen minute boundary?

i need @ fifteen minute mark , grab record that's there. if there no record right on mark (most likely), need grab next record after mark.

i think i'll need write own filter udf, seems udf need stateful knows when it's found first match after time interval. haven't been able find examples of stateful udfs, , can tell it's bad idea given won't know how data mapped/reduced when run against hadoop.

i in couple of steps, storing key/timestamp values , writing python script parse those. i'd keep of process in pig possible, though.

edit: data @ basic this: {id:long, timestamp:long}. timestamp in milliseconds. each set of data sorted on timestamp. if record x falls on 15-minute boundary after minimum timestamp (start time), grab it. otherwise, grab next record after 15 minute boundary, whenever might be. don't have example of expected results because haven't had time sort through data hand.

it might tricky in mapreduce satisfy condition "otherwise, grab next record after 15 minute boundary, whenever might be", if change "grab previous record before 15 minute boundary" quite easy. idea 15 minutes 900000 milliseconds, can group records groups cover 900000 milliseconds, sort them , take top one. here example of script top of head:

inpt = load '....' (id:long, timestamp:long); intervals = foreach inpt generate id, timestamp, timestamp / 900000 interval; grp = group intervals interval; result = foreach grp {     sort = order intervals timestamp desc;     top = limit ord 1;     generate flatten(top); }; 

Comments

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -