Python Pandas value-dependent column creation -


i have pandas dataframe columns "time" , "a". each row, df["time"] integer timestamp , df["a"] float. want create new column "b" has value of df["a"], 1 occurs @ or before 5 seconds in future. can iteratively as:

for in df.index:     df["b"][i] = df["a"][max(df[df["time"] <= df["time"][i]+5].index)] 

however, df has tens of thousands of records takes far long, , need run few hundred times solution isn't option. new pandas (and less new programming in general) i'm not sure if there's obvious solution supported pandas.

it if had way of referencing specific value of df["time"] in each row while creating column, like:

df["b"] = df["a"][max(df[df["time"] <= df["time"][corresponding_row]+5].index)] 

thanks.

edit: here's example of goal is. if dataframe follows:

time 0    0 1    1 4    2 7    3 8    4 10   5 12   6 15   7 18   8 20   9 

then result be:

time  b 0    0  2 1    1  2 4    2  4 7    3  6 8    4  6 10   5  7 12   6  7 15   7  9 18   8  9 20   9  9 

where each line in b comes value of in row time greater @ 5. if time index well, df["b"][0] = df["a"][4] since 4 largest time @ 5 greater 0. in code, 4 = max(df["time"][df["time"] <= 0+5], why df["b"][0] df["a"][4].

use tshift. may need resample first fill in missing values. don't have time test this, try this.

df['b'] = df.resample('s', how='ffill').tshift(5, freq='s').reindex_like(df) 

and tip getting here: if provide few rows of sample data , example of desired result, it's easy copy/paste , try out solution you.

edit

ok, looking @ example data, let's leave time column integers.

in [59]: df out[59]:        time    0     0 1     1 4     2 7     3 8     4 10    5 12    6 15    7 18    8 20    9 

make array containing first , last time values , integers in between.

in [60]: index = np.arange(df.index.values.min(), df.index.values.max() + 1) 

make new dataframe gaps filled in.

in [61]: df1 = df.reindex(index, method='ffill') 

make new column same data shifted 5 -- is, looking forward in time 5 seconds.

in [62]: df1['b'] = df1.shift(-5) 

and drop filled-in times added, taking values original time index.

in [63]: df1.reindex(df.index) out[63]:          b time        0     0   2 1     1   2 4     2   4 7     3   6 8     4   6 10    5   7 12    6   7 15    7   9 18    8 nan 20    9 nan 

how fill in last values, there no "five seconds later" you. judging desired output, maybe use fillna constant value set last value in column a.


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

IIS->Tomcat Redirect: multiple worker with default -