SQL server modulus operator to skip to every n'th row on a large table -
i have large table 100,000,000 rows. i'd select every n'th row table. first instinct use this:
select id,name table id%125000=0
to retrieve spread of 800 rows (id clustered index)
this technique works fine on smaller data sets larger table query takes 2.5 minutes. assume because modulus operation applied every row. there more optimal method of row skipping ?
if id
in index, thinking of along these lines:
with ids ( select 1 id union select id + 125000 ids id <= 100000000 ) select ids.id, (select name table t t.id = ids.id) name ids option (maxrecursion 1000);
i think formulation use index on table.
edit:
as think approach, can use actual random ids in table, rather evenly spaced ones:
with ids ( select 1 cnt, abs(convert(bigint,convert(binary(8), newid()))) % 100000000 id union select cnt + 1, abs(convert(bigint,convert(binary(8), newid()))) % 100000000 ids cnt < 800 ) select ids.id, (select name table t t.id = ids.id) name ids option (maxrecursion 1000);
the code actual random number generator came here.
edit:
due quirks in sql server, can still non-contiguous ids, in scenario. accepted answer explains cause. in short, identity values not allocated 1 @ time, rather in groups. server can fail , unused values skipped.
one reason wanted random sampling avoid problem. presumably, above situation rather rare on systems. can use random sampling generate 900 ids. these, should able find 800 available sample.
Comments
Post a Comment