SQL server modulus operator to skip to every n'th row on a large table -


i have large table 100,000,000 rows. i'd select every n'th row table. first instinct use this:

select id,name table id%125000=0 

to retrieve spread of 800 rows (id clustered index)

this technique works fine on smaller data sets larger table query takes 2.5 minutes. assume because modulus operation applied every row. there more optimal method of row skipping ?

if id in index, thinking of along these lines:

with ids (       select 1 id       union       select id + 125000       ids       id <= 100000000   ) select ids.id,        (select name table t t.id = ids.id) name ids option (maxrecursion 1000); 

i think formulation use index on table.

edit:

as think approach, can use actual random ids in table, rather evenly spaced ones:

with ids (       select 1 cnt,              abs(convert(bigint,convert(binary(8), newid()))) % 100000000 id       union       select cnt + 1, abs(convert(bigint,convert(binary(8), newid()))) % 100000000       ids       cnt < 800   )  select ids.id,        (select name table t t.id = ids.id) name ids option (maxrecursion 1000); 

the code actual random number generator came here.

edit:

due quirks in sql server, can still non-contiguous ids, in scenario. accepted answer explains cause. in short, identity values not allocated 1 @ time, rather in groups. server can fail , unused values skipped.

one reason wanted random sampling avoid problem. presumably, above situation rather rare on systems. can use random sampling generate 900 ids. these, should able find 800 available sample.


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

javascript - storing input from prompt in array and displaying the array -