Python Pandas value-dependent column creation -
i have pandas dataframe columns "time" , "a". each row, df["time"] integer timestamp , df["a"] float. want create new column "b" has value of df["a"], 1 occurs @ or before 5 seconds in future. can iteratively as:
for in df.index: df["b"][i] = df["a"][max(df[df["time"] <= df["time"][i]+5].index)]
however, df has tens of thousands of records takes far long, , need run few hundred times solution isn't option. new pandas (and less new programming in general) i'm not sure if there's obvious solution supported pandas.
it if had way of referencing specific value of df["time"] in each row while creating column, like:
df["b"] = df["a"][max(df[df["time"] <= df["time"][corresponding_row]+5].index)]
thanks.
edit: here's example of goal is. if dataframe follows:
time 0 0 1 1 4 2 7 3 8 4 10 5 12 6 15 7 18 8 20 9
then result be:
time b 0 0 2 1 1 2 4 2 4 7 3 6 8 4 6 10 5 7 12 6 7 15 7 9 18 8 9 20 9 9
where each line in b comes value of in row time greater @ 5. if time index well, df["b"][0] = df["a"][4] since 4 largest time @ 5 greater 0. in code, 4 = max(df["time"][df["time"] <= 0+5], why df["b"][0] df["a"][4].
use tshift. may need resample first fill in missing values. don't have time test this, try this.
df['b'] = df.resample('s', how='ffill').tshift(5, freq='s').reindex_like(df)
and tip getting here: if provide few rows of sample data , example of desired result, it's easy copy/paste , try out solution you.
edit
ok, looking @ example data, let's leave time column integers.
in [59]: df out[59]: time 0 0 1 1 4 2 7 3 8 4 10 5 12 6 15 7 18 8 20 9
make array containing first , last time values , integers in between.
in [60]: index = np.arange(df.index.values.min(), df.index.values.max() + 1)
make new dataframe gaps filled in.
in [61]: df1 = df.reindex(index, method='ffill')
make new column same data shifted 5 -- is, looking forward in time 5 seconds.
in [62]: df1['b'] = df1.shift(-5)
and drop filled-in times added, taking values original time index.
in [63]: df1.reindex(df.index) out[63]: b time 0 0 2 1 1 2 4 2 4 7 3 6 8 4 6 10 5 7 12 6 7 15 7 9 18 8 nan 20 9 nan
how fill in last values, there no "five seconds later" you. judging desired output, maybe use fillna
constant value set last value in column a.
Comments
Post a Comment