Intersection of arrays in PHP. Finding matching Pairs -
i've got array this:
$a = array( array(2 => 1, 4 => 2, 9 => 3), array(3 => 7, 4 => 5, 7 => 3), array(1 => 6, 4 => 5), ... );
so array contains huge amount of sub arrays integer key => integer value. want find subarrays share no keys or if share key value of key must same.
example: $a[1] , $a[2] match because $a[1][4] == $a[2][4] , no other keys match. $a[0] , $a[1] not match because $a[0][4] != $a[1][4].
the number of elements in subarrays may vary.
is there efficient way ? way can think of check each possible pair in nested loop resulting in o(n^2).
if has idea more meaningful title feel free edit it.
maybe code makes more clear: (naive implementation)
$pairs = array(); for($i = 0; $i < count($a); $i++) for($j = $i+1; $j < count($a); $j++) if(array_intersect_key($a[$i], $a[$j]) == array_intersect_assoc($a[$i], $a[$j])) $pairs[] = array($i, $j);
alternative:
$matching = array(); for($i = 0; $i < count($a); $i++) for($j = $i+1; $j < count($a); $j++) if(array_intersect_key($a[$i], $a[$j]) == array_intersect_assoc($a[$i], $a[$j])) list($matching[$i][], $matching[$j][]) = array($j, $i);
there might ways it, depends on if know how many matches (or general 'matchyness' of data). if there's more matches not might better start assuming matches , eliminating.
in case, think can pre-process data. i'm not sure if faster -- depends on distribution of data, i'd start trying , work there:
$a = array( array(2 => 1, 4 => 2, 9 => 3), array(3 => 7, 4 => 5, 7 => 3), array(1 => 6, 4 => 5), array(1 => 6, 4 => 5, 7 => 5), array(2 => 1, 4 => 2, 9 => 3) ); // 1 , 2 match, 2 , 3 match, 0 , 4 match $keydata = array(); ($i = 0; $i < count($a); $i++) { foreach($a[$i] $k => $v) { if (!isset($keydata[$k])) { $keydata[$k] = array(); } if (!isset($keydata[$k][$v])) { $keydata[$k][$v] = array(); } $keydata[$k][$v][] = $i; } } $potentialmatches = array(); foreach ($keydata $key => $values) { // ignore single key/value pairs if (count($values) > 1) { foreach ($values $value => $arrayindices) { ($i = 0; $i < count($arrayindices); $i ++) { ($j = $i + 1; $j < count($arrayindices); $j ++) { $potentialmatches[] = array($arrayindices[$i], $arrayindices[$j]); } } } } } // might need ... /* foreach ($potentialmatches &$m) { array_unique($m); } */ $pairs = array(); foreach ($potentialmatches $m) { if(array_intersect_key($a[$m[0]], $a[$m[1]]) == array_intersect_assoc($a[$m[0]], $a[$m[1]])) { $pairs[] = $m; } } print_r($pairs);
output:
array ( [0] => array ( [0] => 0 [1] => 4 ) [1] => array ( [0] => 1 [1] => 2 ) [2] => array ( [0] => 2 [1] => 3 ) )
edit
as said in comment, doesn't catch arrays don't share any keys -- consider match. code below this, although i'm not sure if it's faster nested solution (and it's going use ton of memory)
// new test data cover case missed $a = array( array(2 => 1, 4 => 2, 9 => 3), array(3 => 7, 4 => 5, 7 => 3), array(1 => 6, 4 => 5), array(1 => 6, 4 => 5, 7 => 5), array(2 => 1, 4 => 2, 9 => 3), array(8 => 3) ); // 1 , 2 match, 2 , 3 match, 0 , 4 match, 5 matches // first assume match, build array of: // indicies => array of potential matches $potentialmatches = array_fill(0, count($a), array_keys($a)); // build data each key, indicies contain key // , indicies each value of key $keydata = array(); ($i = 0; $i < count($a); $i++) { foreach($a[$i] $k => $v) { if (!isset($keydata[$k])) { $keydata[$k] = array(); } if (!isset($keydata[$k][$v])) { $keydata[$k][$v] = array(); } $keydata[$k]['all'][] = $i; $keydata[$k][$v][] = $i; } } // print_r($keydata); // go through key data , eliminate indicies // can't match foreach ($keydata $key => $values) { if (count($values) > 2) { // ignore single key/value pairs // 2 indecies not match if appear in seperate value lists // first list of indicies have key $all = array_unique($values['all']); unset($values['all']); // go through value lists foreach ($values $value => $arrayindices) { // indicies value cannot match other // indices in system, i.e. list $cantmatch = array_diff($all, $arrayindices); // remove indicies can't match potentials list foreach ($arrayindices $index) { $potentialmatches[$index] = array_diff($potentialmatches[$index], $cantmatch); } } } } //print_r($potentialmatches); // said didn't mind output format, that's enough // array contains (x,x) pointless , both (x,y) , (y,x) // can 1 final bit of processing print out in nicer way $pairs = array(); foreach ($potentialmatches $x => $matches) { foreach ($matches $y) { if ( ($x < $y) ) { $pairs[] = array($x, $y); } } } print_r($pairs);
output
array ( [0] => array ( [0] => 0 [1] => 4 ) [1] => array ( [0] => 0 [1] => 5 ) [2] => array ( [0] => 1 [1] => 2 ) [3] => array ( [0] => 1 [1] => 5 ) [4] => array ( [0] => 2 [1] => 3 ) [5] => array ( [0] => 2 [1] => 5 ) [6] => array ( [0] => 3 [1] => 5 ) [7] => array ( [0] => 4 [1] => 5 ) )
Comments
Post a Comment