python - Conditional multi column matching (reviewed with new example) -
i tried rework question in order match quality criteria , spent more time in trying achieve result on own.
given 2 dataframes
a = dataframe({"id" : ["id1"] * 3 + ["id2"] * 3 + ["id3"] * 3, "left" : [6, 2, 5, 2, 1, 4, 5, 2, 4], "right" : [1, 3, 4, 6, 5, 3, 6, 3, 2] }) b = dataframe({"id" : ["id1"] * 6 + ["id2"] * 6 + ["id3"] * 6, "left_and_right" : range(1,7) * 3, "boolen" : [0, 0, 1, 0, 1, 0, 1, 0, 0 , 1, 1, 0, 0, 0, 1, 0, 0, 1] })
the expected result is
result = dataframe({"id" : ["id1"] * 3 + ["id2"] * 3 + ["id3"] * 3, "left" : [6, 2, 5, 2, 1, 4, 5, 2, 4], "right" : [1, 3, 4, 6, 5, 3, 6, 3, 2], "new": [0, 1, 1, 0, 1, 1, 1, 1, 0] })
so want check in each row of dataframe b if there row in dataframe a.id == b.id , if b.left_and_right in (==) a.left or a.rigtht.
if such row found , b.boolen true/1 either value of a.left or a.right, value of a.new in row should true/1.
hope example illustrates better words.
to sum up: want if in each row id same both dataframes whether b.boolen true/1 value in b.left_and_right , if value in a.left or in a.right, new value in a.new should true/1.
i have tried using pd.match() , pd.merge() function in combination & , | operators not achieve wanted result.
some time ago had asked simillar question dealing simillar problem in r (data organized in other way, bit different) fail using same approach in python.
related question: conditional matching of 2 lists multi-column data.frames
thanks
just use boolean masks & (and) , | (or):
in [11]: (a.a == b.a) & ((a.b == b.e) | (a.c == b.e)) # satisfy requirement! out[11]: 0 true 1 true 2 true 3 true dtype: bool in [12]: b.d[(a.a == b.a) & ((a.b == b.e) | (a.c == b.e))] out[12]: 0 0 1 1 2 0 3 0 name: d, dtype: int64 in [13]: a['new'] = b.d[(a.a == b.a) & ((a.b == b.e) | (a.c == b.e))] in [14]: out[14]: b c new 0 foo 1 4 0 1 goo 2 3 1 2 doo 3 1 0 3 boo 4 2 0
update different question:
in [21]: merged = pd.merge(a, b, on='id') in [22]: matching = merged[(merged.left == merged.left_and_right) | (merged.right == merged.left_and_right)] in [23]: (matching.groupby(['id', 'left', 'right'])['boolen'].sum()).reset_index() out[23]: id left right boolen 0 id1 2 3 1 1 id1 5 4 1 2 id1 6 1 0 3 id2 1 5 2 4 id2 2 6 0 5 id2 4 3 1 6 id3 2 3 1 7 id3 4 2 0 8 id3 5 6 1
note there 2 here, perhaps want care > 0.
in [24]: (matching.groupby(['id', 'left', 'right'])['boolen'].sum() > 0).reset_index() out[24]: id left right boolen 0 id1 2 3 true 1 id1 5 4 true 2 id1 6 1 false 3 id2 1 5 true 4 id2 2 6 false 5 id2 4 3 true 6 id3 2 3 true 7 id3 4 2 false 8 id3 5 6 true
you may want rename boolen column new.
Comments
Post a Comment