python - Conditional multi column matching (reviewed with new example) -


i tried rework question in order match quality criteria , spent more time in trying achieve result on own.

given 2 dataframes

a = dataframe({"id" : ["id1"] * 3 + ["id2"] * 3 + ["id3"] * 3,                "left" : [6, 2, 5, 2, 1, 4, 5, 2, 4],                "right" : [1, 3, 4, 6, 5, 3, 6, 3, 2]                             })  b = dataframe({"id" : ["id1"] * 6 + ["id2"] * 6 + ["id3"] * 6,                "left_and_right" : range(1,7) * 3,                  "boolen" : [0, 0, 1, 0, 1, 0, 1, 0, 0 , 1, 1, 0, 0, 0, 1, 0, 0, 1]              }) 

the expected result is

result = dataframe({"id" : ["id1"] * 3 + ["id2"] * 3 + ["id3"] * 3,                     "left"  : [6, 2, 5, 2, 1, 4, 5, 2, 4],                     "right"  : [1, 3, 4, 6, 5, 3, 6, 3, 2],                     "new": [0, 1, 1, 0, 1, 1, 1, 1, 0]                                  }) 

so want check in each row of dataframe b if there row in dataframe a.id == b.id , if b.left_and_right in (==) a.left or a.rigtht.
if such row found , b.boolen true/1 either value of a.left or a.right, value of a.new in row should true/1.

hope example illustrates better words.

to sum up: want if in each row id same both dataframes whether b.boolen true/1 value in b.left_and_right , if value in a.left or in a.right, new value in a.new should true/1.

i have tried using pd.match() , pd.merge() function in combination & , | operators not achieve wanted result.

some time ago had asked simillar question dealing simillar problem in r (data organized in other way, bit different) fail using same approach in python.
related question: conditional matching of 2 lists multi-column data.frames

thanks

just use boolean masks & (and) , | (or):

in [11]: (a.a == b.a) & ((a.b == b.e) | (a.c == b.e)) # satisfy requirement! out[11]:  0    true 1    true 2    true 3    true dtype: bool  in [12]: b.d[(a.a == b.a) & ((a.b == b.e) | (a.c == b.e))] out[12]:  0    0 1    1 2    0 3    0 name: d, dtype: int64  in [13]: a['new'] = b.d[(a.a == b.a) & ((a.b == b.e) | (a.c == b.e))]  in [14]: out[14]:        b  c  new 0  foo  1  4    0 1  goo  2  3    1 2  doo  3  1    0 3  boo  4  2    0 

update different question:

in [21]: merged = pd.merge(a, b, on='id')  in [22]: matching = merged[(merged.left == merged.left_and_right) | (merged.right == merged.left_and_right)]  in [23]: (matching.groupby(['id', 'left', 'right'])['boolen'].sum()).reset_index() out[23]:      id  left  right  boolen 0  id1     2      3       1 1  id1     5      4       1 2  id1     6      1       0 3  id2     1      5       2 4  id2     2      6       0 5  id2     4      3       1 6  id3     2      3       1 7  id3     4      2       0 8  id3     5      6       1 

note there 2 here, perhaps want care > 0.

in [24]: (matching.groupby(['id', 'left', 'right'])['boolen'].sum() > 0).reset_index() out[24]:      id  left  right boolen 0  id1     2      3   true 1  id1     5      4   true 2  id1     6      1  false 3  id2     1      5   true 4  id2     2      6  false 5  id2     4      3   true 6  id3     2      3   true 7  id3     4      2  false 8  id3     5      6   true 

you may want rename boolen column new.


Comments

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -