sql server - Group All Related Records in Many to Many Relationship, SQL graph connected components -
hopefully i'm missing simple solution this.
i have 2 tables. 1 contains list of companies. second contains list of publishers. mapping between 2 many many. bundle or group of companies in table have relationship publisher in table b , vise versa.
the final result (groupid key field). row 1 , 2 in same group because share same company. row 3 in same group because publisher y mapped on company a. row 4 in group because company b mapped group 1 through publisher y.
said simply, time there kind of shared relationship across company , publisher, pair should assigned same group.
row groupid company publisher 1 1 y 2 1 x 3 1 b y 4 1 b z 5 2 c w 6 2 c p 7 2 d w update:
bounty version: given table in fiddle above of company , publisher pairs, populate groupid field above. think of creating family id encompasses related parents/children.
sql server 2012
i thought using recursive cte, but, far know, it's not possible in sql server use union connect anchor member , recursive member of recursive cte (i think it's possible in postgresql), it's not possible eliminate duplicates.
declare @i int cte ( select groupid, row_number() over(order company) rn table1 ) update cte set groupid = rn select @i = @@rowcount -- while rows updated while @i > 0 begin update t1 set groupid = t2.groupid table1 t1 inner join ( select t2.company, min(t2.groupid) groupid table1 t2 group t2.company ) t2 on t2.company = t1.company t1.groupid > t2.groupid select @i = @@rowcount update t1 set groupid = t2.groupid table1 t1 inner join ( select t2.publisher, min(t2.groupid) groupid table1 t2 group t2.publisher ) t2 on t2.publisher = t1.publisher t1.groupid > t2.groupid -- > 0 if rows updated select @i = @i + @@rowcount end ;with cte ( select groupid, dense_rank() over(order groupid) rn table1 ) update cte set groupid = rn i've tried breadth first search algorithm. thought faster (it's better in terms of complexity), i'll provide solution here. i've found it's not faster sql approach, though:
declare @company nvarchar(2), @publisher nvarchar(2), @groupid int declare @queue table ( company nvarchar(2), publisher nvarchar(2), id int identity(1, 1), primary key(company, publisher) ) select @groupid = 0 while 1 = 1 begin select top 1 @company = company, @publisher = publisher table1 groupid null if @@rowcount = 0 break select @groupid = @groupid + 1 insert @queue(company, publisher) select @company, @publisher while 1 = 1 begin select top 1 @company = company, @publisher = publisher @queue order id asc if @@rowcount = 0 break update table1 set groupid = @groupid company = @company , publisher = @publisher delete @queue company = @company , publisher = @publisher ;with cte ( select company, publisher table1 company = @company , groupid null union select company, publisher table1 publisher = @publisher , groupid null ) insert @queue(company, publisher) select distinct c.company, c.publisher cte c not exists (select * @queue q q.company = c.company , q.publisher = c.publisher) end end i've tested version , gordon linoff's check how it's perform. looks cte worse, couldn't wait while it's complete on more 1000 rows.
here's sql fiddle demo random data. results were:
128 rows:
my rbar solution: 190ms
my sql solution: 27ms
gordon linoff's solution: 958ms
256 rows:
my rbar solution: 560ms
my sql solution: 1226ms
gordon linoff's solution: 45371ms
it's random data, results may not consistent. think timing changed indexes, don't think change whole picture.
old version - using temporary table, calculating groupid without touching initial table:
declare @i int -- creating table gather possible groupid each row create table #temp ( company varchar(1), publisher varchar(1), groupid varchar(1), primary key (company, publisher, groupid) ) -- initializing data insert #temp (company, publisher, groupid) select company, publisher, company table1 select @i = @@rowcount -- while rows inserted #temp while @i > 0 begin -- expand #temp in both directions ;with cte ( select t2.company, t1.publisher, t1.groupid groupid1, t2.groupid groupid2 #temp t1 inner join #temp t2 on t2.company = t1.company union select t1.company, t2.publisher, t1.groupid groupid1, t2.groupid groupid2 #temp t1 inner join #temp t2 on t2.publisher = t1.publisher ), cte2 ( select company, publisher, case when groupid1 < groupid2 groupid1 else groupid2 end groupid cte ) insert #temp select company, publisher, groupid cte2 -- don't insert duplicates except select company, publisher, groupid #temp -- > 0 if row inserted select @i = @@rowcount end select company, publisher, dense_rank() over(order min(groupid)) groupid #temp group company, publisher
Comments
Post a Comment