python - Conditional multi column matching (reviewed with new example) -

i tried rework question in order match quality criteria , spent more time in trying achieve result on own.

given 2 dataframes

a = dataframe({"id" : ["id1"] * 3 + ["id2"] * 3 + ["id3"] * 3,                "left" : [6, 2, 5, 2, 1, 4, 5, 2, 4],                "right" : [1, 3, 4, 6, 5, 3, 6, 3, 2]                             })  b = dataframe({"id" : ["id1"] * 6 + ["id2"] * 6 + ["id3"] * 6,                "left_and_right" : range(1,7) * 3,                  "boolen" : [0, 0, 1, 0, 1, 0, 1, 0, 0 , 1, 1, 0, 0, 0, 1, 0, 0, 1]              })

the expected result is

result = dataframe({"id" : ["id1"] * 3 + ["id2"] * 3 + ["id3"] * 3,                     "left"  : [6, 2, 5, 2, 1, 4, 5, 2, 4],                     "right"  : [1, 3, 4, 6, 5, 3, 6, 3, 2],                     "new": [0, 1, 1, 0, 1, 1, 1, 1, 0]                                  })

so want check in each row of dataframe b if there row in dataframe a.id == b.id , if b.left_and_right in (==) a.left or a.rigtht.
if such row found , b.boolen true/1 either value of a.left or a.right, value of a.new in row should true/1.

hope example illustrates better words.

to sum up: want if in each row id same both dataframes whether b.boolen true/1 value in b.left_and_right , if value in a.left or in a.right, new value in a.new should true/1.

i have tried using pd.match() , pd.merge() function in combination & , | operators not achieve wanted result.

some time ago had asked simillar question dealing simillar problem in r (data organized in other way, bit different) fail using same approach in python.
related question: conditional matching of 2 lists multi-column data.frames

thanks

just use boolean masks & (and) , | (or):

in [11]: (a.a == b.a) & ((a.b == b.e) | (a.c == b.e)) # satisfy requirement! out[11]:  0    true 1    true 2    true 3    true dtype: bool  in [12]: b.d[(a.a == b.a) & ((a.b == b.e) | (a.c == b.e))] out[12]:  0    0 1    1 2    0 3    0 name: d, dtype: int64  in [13]: a['new'] = b.d[(a.a == b.a) & ((a.b == b.e) | (a.c == b.e))]  in [14]: out[14]:        b  c  new 0  foo  1  4    0 1  goo  2  3    1 2  doo  3  1    0 3  boo  4  2    0

update different question:

in [21]: merged = pd.merge(a, b, on='id')  in [22]: matching = merged[(merged.left == merged.left_and_right) | (merged.right == merged.left_and_right)]  in [23]: (matching.groupby(['id', 'left', 'right'])['boolen'].sum()).reset_index() out[23]:      id  left  right  boolen 0  id1     2      3       1 1  id1     5      4       1 2  id1     6      1       0 3  id2     1      5       2 4  id2     2      6       0 5  id2     4      3       1 6  id3     2      3       1 7  id3     4      2       0 8  id3     5      6       1

note there 2 here, perhaps want care > 0.

in [24]: (matching.groupby(['id', 'left', 'right'])['boolen'].sum() > 0).reset_index() out[24]:      id  left  right boolen 0  id1     2      3   true 1  id1     5      4   true 2  id1     6      1  false 3  id2     1      5   true 4  id2     2      6  false 5  id2     4      3   true 6  id3     2      3   true 7  id3     4      2  false 8  id3     5      6   true

you may want rename boolen column new.

Search This Blog

Brazell

python - Conditional multi column matching (reviewed with new example) -

Comments

Post a Comment

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -