Randomly selecting rows from a table

  • Hi guys

    I have a table from which I need to select a random number of records, e.g. 70 records, based on a specific set of criteria.

     

    can anyone tell me the bets way to do this?

     

    thanks again guys

     


    If riding in a plane is flying, then riding in a boat must be swimming. To experience the element, get out of the vehicle. Skydive!

  • ...ORDER BY NEWID() is a good randomizer, though not really supported for that.

    Here's an example using a cursor I picked up from SQL Server MVP Steve Kass

    declare @keys table (

      pk int primary key

    )

    declare C cursor fast_forward for

      select OrderID from Northwind..Orders

    declare @pk int

    declare @r float

    open C

    fetch next from C into @pk

    while @@fetch_status = 0 begin

      set @r = rand()

      if @r < 0.0125

        insert into @keys values (@pk)

      fetch next from C into @pk

    end

    close C

    deallocate C

    select * from Northwind..Orders O join @keys K

    on K.pk = O.OrderID

    The only disadvantage here is, that you have no real control over the number of rows inserted into the table variable.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • cheers frank, much appreciated.....

    .....would NEWID() work for this kind of thing? do yu have any examples of this in action?

     


    If riding in a plane is flying, then riding in a boat must be swimming. To experience the element, get out of the vehicle. Skydive!

  • Doesn't make too much sense now, but have a look at this and you'll get the idea:

    USE PUBS

    SELECT TOP 10

     au_fname

     , au_lname

    FROM

     authors

    WHERE

     au_fname > 'L%'

    ORDER BY

     NEWID()

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Nice one, NEWID() seems to work fine. Do I need to pass any values into NEWID(x)?

     

    Seems to work fine without though......

    Cheers

     


    If riding in a plane is flying, then riding in a boat must be swimming. To experience the element, get out of the vehicle. Skydive!

  • Try it out, and you'll see

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Great! I have the following code. there is also an age field included which I want to use to split the results of the following query, i.e. 35 out of the resulting 75 in one age group and 35 in the other age group. Can I add a cluase into this code to do this all in one go?

    Any ideas of the best way of doing this? is it possible to split on a certian field?

    select  top 75 *

    into t_Female

    from t_Branch

    where Gender = 'Female'

    order by NEWID()

    go

     

    cheers


    If riding in a plane is flying, then riding in a boat must be swimming. To experience the element, get out of the vehicle. Skydive!

  • Hm, it would be better to explicitely create the table first and then to use two separate statements to select n rows from one group and the rest from the other group. Unfortunately you can't use a UNION or UNION ALL here, since you can have only one ORDER BY clause in such a statement. So something like this doesn't work

    use pubs

    go

    select top 2 au_lname, au_fname, zip

    from authors

    where  zip <= '50000'

    order by newid()

    union all

    select top 2 au_lname, au_fname, zip

    from  authors

    where zip >= '50000'

    order by newid()

    However, if you remove the union all the query works just fine. And, btw, SELECT INTO can cause trouble and performance degradation so it's better to be explicite in your DDL.

     

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • cheers Frank.

    Still leaves me with the same problem really I think.

    Table of thousands of people, male/female of two different age groups.

    need a random sample of 75 male/75 female.

    Of the 75 male, I need to split this 50/50 by the two age groups. Just cant see how I can manipiulate by original query to get this to work.

     


    If riding in a plane is flying, then riding in a boat must be swimming. To experience the element, get out of the vehicle. Skydive!

  • Since each sample should be taken randomly, I assume, I think you must bite the bullet and write 4 queries SELECT ... BY NEWID(). Sorry, I don't know any other way (though that doesn't mean none exists).

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply