sql server - Group All Related Records in Many to Many Relationship, SQL graph connected components -

hopefully i'm missing simple solution this.

i have 2 tables. 1 contains list of companies. second contains list of publishers. mapping between 2 many many. bundle or group of companies in table have relationship publisher in table b , vise versa.

the final result (groupid key field). row 1 , 2 in same group because share same company. row 3 in same group because publisher y mapped on company a. row 4 in group because company b mapped group 1 through publisher y.

said simply, time there kind of shared relationship across company , publisher, pair should assigned same group.

row   groupid     company     publisher 1     1                     y 2     1                     x 3     1           b           y 4     1           b           z 5     2           c           w 6     2           c           p 7     2           d           w

fiddle

update:
bounty version: given table in fiddle above of company , publisher pairs, populate groupid field above. think of creating family id encompasses related parents/children.

sql server 2012

i thought using recursive cte, but, far know, it's not possible in sql server use union connect anchor member , recursive member of recursive cte (i think it's possible in postgresql), it's not possible eliminate duplicates.

declare @i int  cte (      select          groupid,          row_number() over(order company) rn      table1 ) update cte set groupid = rn  select @i = @@rowcount  -- while rows updated while @i > 0 begin     update t1 set         groupid = t2.groupid     table1 t1         inner join (             select t2.company, min(t2.groupid) groupid             table1 t2             group t2.company         ) t2 on t2.company = t1.company     t1.groupid > t2.groupid      select @i = @@rowcount      update t1 set         groupid = t2.groupid     table1 t1         inner join (             select t2.publisher, min(t2.groupid) groupid             table1 t2             group t2.publisher         ) t2 on t2.publisher = t1.publisher     t1.groupid > t2.groupid      -- > 0 if rows updated     select @i = @i + @@rowcount end  ;with cte (      select          groupid,          dense_rank() over(order groupid) rn      table1 ) update cte set groupid = rn

sql fiddle demo

i've tried breadth first search algorithm. thought faster (it's better in terms of complexity), i'll provide solution here. i've found it's not faster sql approach, though:

declare @company nvarchar(2), @publisher nvarchar(2), @groupid int  declare @queue table (     company nvarchar(2), publisher nvarchar(2), id int identity(1, 1),     primary key(company, publisher) )  select @groupid = 0  while 1 = 1 begin     select top 1 @company = company, @publisher = publisher     table1     groupid null      if @@rowcount = 0 break      select @groupid = @groupid + 1      insert @queue(company, publisher)     select @company, @publisher      while 1 = 1     begin         select top 1 @company = company, @publisher = publisher         @queue         order id asc          if @@rowcount = 0 break          update table1 set             groupid = @groupid         company = @company , publisher = @publisher          delete @queue company = @company , publisher = @publisher          ;with cte (             select company, publisher table1 company = @company , groupid null             union             select company, publisher table1 publisher = @publisher , groupid null         )         insert @queue(company, publisher)         select distinct c.company, c.publisher         cte c         not exists (select * @queue q q.company = c.company , q.publisher = c.publisher)    end end

sql fiddle demo

i've tested version , gordon linoff's check how it's perform. looks cte worse, couldn't wait while it's complete on more 1000 rows.

here's sql fiddle demo random data. results were:
128 rows:
my rbar solution: 190ms
my sql solution: 27ms
gordon linoff's solution: 958ms
256 rows:
my rbar solution: 560ms
my sql solution: 1226ms
gordon linoff's solution: 45371ms

it's random data, results may not consistent. think timing changed indexes, don't think change whole picture.

old version - using temporary table, calculating groupid without touching initial table:

declare @i int  -- creating table gather possible groupid each row create table #temp (     company varchar(1), publisher varchar(1), groupid varchar(1),     primary key (company, publisher, groupid) )  -- initializing data insert #temp (company, publisher, groupid) select company, publisher, company table1  select @i = @@rowcount  -- while rows inserted #temp while @i > 0 begin     -- expand #temp in both directions     ;with cte (         select             t2.company, t1.publisher,             t1.groupid groupid1, t2.groupid groupid2         #temp t1             inner join #temp t2 on t2.company = t1.company         union         select             t1.company, t2.publisher,             t1.groupid groupid1, t2.groupid groupid2         #temp t1             inner join #temp t2 on t2.publisher = t1.publisher             ), cte2 (         select             company, publisher,             case when groupid1 < groupid2 groupid1 else groupid2 end groupid         cte     )     insert #temp     select company, publisher, groupid     cte2     -- don't insert duplicates     except     select company, publisher, groupid     #temp      -- > 0 if row inserted     select @i = @@rowcount end  select     company, publisher,     dense_rank() over(order min(groupid)) groupid #temp group company, publisher

=> sql fiddle example

Search This Blog

Brazell

sql server - Group All Related Records in Many to Many Relationship, SQL graph connected components -

Comments

Post a Comment

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -