Getting Random Results

  • LOL Jeff

    Actually, the results from my tests above don't show that the data is 'more random' with checksum - only that the queries are faster for my data.

    I got the 'more random' point from the post below from Adam Machanic, who got the idea from Itzik Ben-Gan (both good guys, as you probably know).

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/05/18/using-top-and-order-by-with-union-union-all.aspx#1820

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Adrian,

    I've just had a go at implementing your suggestion on my data (I really don't have time, but I can't help myself ). I've called it method #4.

    It's quite long, so I'm going to post 3 messages - this one, 1 for the code I used (so you can tell me if I've implemented anything incorrectly), and 1 for results (so you can see the raw data for yourselves).

    The upshot (for my data) is that #3 is faster in general, but #4 is pretty much the same for small data sets (it's generally slower for 1-10 rows, and slightly faster for 10-about 2000 rows).

    Please post if you do tests on your own data, I'd be interested to see any differing results.

    Ryan

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Here's the code I used (revised from the previous code to include method #4)...

     

    --table to hold results

    CREATE TABLE #Results (NumberOfTableRows INT, NumberOfRows INT, ProbabilityOfEachRow FLOAT,

          Method1_TimeInMs INT, Method2_TimeInMs INT, Method3_TimeInMs INT, Method4_TimeInMs INT)

     

    SET NOCOUNT ON

    DECLARE @Counter TINYINT

    SET @Counter = 0

    WHILE @Counter <= 100 --perform the comparison this many times

    BEGIN

          DECLARE @NumberOfRows INT

          SET @NumberOfRows = power(10, rand()*5) --random sample size to test with

          --

          DECLARE @d DATETIME, @t1 INT, @t2 INT, @t3 INT, @t4 INT

          SET @d = GETDATE();

     

          --Method 1

          SELECT TOP (@NumberOfRows) MessageId FROM dbo.Messages ORDER BY NEWID()

     

          --

          SET @t1 = DATEDIFF(ms, @d, GETDATE()); SET @d = GETDATE();

     

          --Method 2

          SELECT TOP (@NumberOfRows) MessageId FROM dbo.Messages ORDER BY CHECKSUM(NEWID())

     

          --

          SET @t2 = DATEDIFF(ms, @d, GETDATE()); SET @d = GETDATE();

     

          --Method 3

          DECLARE @NumberOfTableRows INT, @ProbabilityOfEachRow FLOAT

          SELECT @NumberOfTableRows = COUNT(*) from dbo.Messages

          SET @ProbabilityOfEachRow = (1+SQRT(1.0/@NumberOfRows)*10) * @NumberOfRows / @NumberOfTableRows --weighted overestimate

     

          SELECT TOP (@NumberOfRows) MessageId FROM Messages --select required number from overestimated sample

          WHERE @ProbabilityOfEachRow >= CAST(CHECKSUM(NEWID(), MessageId) & 0x7fffffff AS FLOAT) / CAST (0x7fffffff AS INT)

     

          --

        DECLARE @TV table(rand_num int)

        DELETE FROM @TV

          SET @t3 = DATEDIFF(ms, @d, GETDATE()); SET @d = GETDATE();

     

        --Method 4

        SELECT @NumberOfTableRows = COUNT(*) from dbo.Messages;

     

        DECLARE @i int

        SET @i = 0

        WHILE (@i < @NumberOfRows)

        BEGIN

            INSERT @TV values (ceiling(rand() * @NumberOfTableRows))

            SET @i = @i + 1

        END

     

        SELECT MessageId FROM

            (SELECT ROW_NUMBER() OVER (ORDER BY MessageId) row, * from dbo.Messages) a

        WHERE a.row in (SELECT rand_num from @TV)

     

          --

          SET @t4 = DATEDIFF(ms, @d, GETDATE()); SET @d = GETDATE();

     

          --Insert results

          INSERT #Results SELECT @NumberOfTableRows, @NumberOfRows, @ProbabilityOfEachRow, @t1, @t2, @t3, @t4

          --

          SET @Counter = @Counter + 1

    END

     

    --select results and tidy up

    PRINT 'Summary'

    SELECT AVG(Method1_TimeInMs) AS 'Method1', AVG(Method2_TimeInMs) AS 'Method2',

           AVG(Method3_TimeInMs) AS 'Method3', AVG(Method4_TimeInMs) AS 'Method4'

    FROM #Results

     

    PRINT 'Summary: NumberOfRows < 2000'

    SELECT AVG(Method1_TimeInMs) AS 'Method1', AVG(Method2_TimeInMs) AS 'Method2',

           AVG(Method3_TimeInMs) AS 'Method3', AVG(Method4_TimeInMs) AS 'Method4'

    FROM #Results WHERE NumberOfRows < 2000

     

    PRINT 'Summary: NumberOfRows BETWEEN 10 AND 2000'

    SELECT AVG(Method1_TimeInMs) AS 'Method1', AVG(Method2_TimeInMs) AS 'Method2',

           AVG(Method3_TimeInMs) AS 'Method3', AVG(Method4_TimeInMs) AS 'Method4'

    FROM #Results WHERE NumberOfRows BETWEEN 10 AND 2000

     

    PRINT 'Summary: NumberOfRows < 10'

    SELECT AVG(Method1_TimeInMs) AS 'Method1', AVG(Method2_TimeInMs) AS 'Method2',

           AVG(Method3_TimeInMs) AS 'Method3', AVG(Method4_TimeInMs) AS 'Method4'

    FROM #Results WHERE NumberOfRows < 10

     

    PRINT 'All data (100 rows)'

    SELECT NumberOfTableRows, NumberOfRows, Method1_TimeInMs, Method2_TimeInMs, Method3_TimeInMs, Method4_TimeInMs FROM #Results

    --DROP TABLE #Results

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • And here's the results data (for 100 separate tests)...

    /*

    Summary

    Method1     Method2     Method3     Method4

    ----------- ----------- ----------- -----------

    2539        1976        1057        2476

     

    Summary: NumberOfRows < 2000

    Method1     Method2     Method3     Method4

    ----------- ----------- ----------- -----------

    2214        1624        719         780

     

    Summary: NumberOfRows BETWEEN 10 AND 2000

    Method1     Method2     Method3     Method4

    ----------- ----------- ----------- -----------

    2235        1629        847         808

     

    Summary: NumberOfRows < 10

    Method1     Method2     Method3     Method4

    ----------- ----------- ----------- -----------

    2151        1608        317         694

     

    All data (100 rows)

    NumberOfTableRows NumberOfRows Method1_TimeInMs Method2_TimeInMs Method3_TimeInMs Method4_TimeInMs

    ----------------- ------------ ---------------- ---------------- ---------------- ----------------

    1079894           9            2140             1563             623              610

    1079894           9943         2533             2046             1356             2766

    1079894           12562        2656             2233             1376             4563

    1079894           85           2106             1563             733              843

    1079894           1850         2093             1626             1203             1016

    1079894           1            2170             1580             153              283

    1079894           13529        2936             2170             1393             4296

    1079894           221          2186             1640             953              860

    1079894           349          2186             1623             1016             733

    1079894           9233         2533             1966             1393             2296

    1079894           15335        2826             2266             1406             3560

    1079894           9720         2486             2170             1360             2563

    1079894           26           2153             1566             826              780

    1079894           32           2173             1610             686              640

    1079894           73           2156             1656             780              640

    1079894           19701        3220             2780             1423             4716

    1079894           1171         2236             1716             1186             1266

    1079894           26           2143             1576             623              673

    1079894           1427         2106             1596             1216             1046

    1079894           11913        2593             2046             1390             3626

    1079894           295          2203             1610             1000             890

    1079894           9            2093             1563             593              656

    1079894           2            2076             1546             186              330

    1079894           10604        2500             2013             1376             2813

    1079894           1777         2156             1640             1250             1140

    1079894           10           2220             1530             423              653

    1079894           1653         2470             1640             1186             1050

    1079894           19           2200             1610             563              640

    1079894           4            2126             2030             326              610

    1079894           20           2656             1593             593              626

    1079894           232          2076             1576             1000             660

    1079894           3564         2420             1686             1283             1403

    1079894           3986         2393             1793             1330             1813

    1079894           17862        3000             2656             1420             6390

    1079894           4            2080             1546             280              873

    1079894           3955         2220             1690             1310             2080

    1079894           73174        5543             4813             3860             16436

    1079894           4032         2390             1783             1313             2560

    1079894           29           2283             1593             716              703

    1079894           63555        5766             4906             2750             13513

    1079894           6            2233             1580             360              1560

    1079894           25           2156             1580             530              640

    1079894           69           2283             1576             780              656

    1079894           2895         2203             1640             1296             1486

    1079894           102          2296             1716             796              690

    1079894           19           2076             1563             500              640

    1079894           11           2203             1610             326              623

    1079894           236          2160             1606             923              733

    1079894           9            2140             1563             296              610

    1079894           2166         2173             1640             1280             1093

    1079894           247          2156             1593             920              690

    1079894           12269        2640             2123             1393             3903

    1079894           20852        3360             2593             1470             4843

    1079894           8740         2516             2046             1360             2623

    1079894           13           2140             1563             390              733

    1079894           70059        5000             4783             3406             16530

    1079894           17           2906             2123             610              1733

    1079894           1109         2093             1593             1203             906

    1079894           9069         2673             1920             1376             2936

    1079894           1            2140             1580             360              780

    1079894           119          2060             1593             800              640

    1079894           111          2123             1626             920              656

    1079894           10           2140             1610             593              640

    1079894           2            2203             1563             296              453

    1079894           4            2156             1576             313              546

    1079894           4            2173             1640             406              500

    1079894           14           2360             1560             376              610

    1079894           478          2543             1596             1093             750

    1079894           577          2233             1563             1046             813

    1079894           168          2623             2110             936              656

    1079894           49114        4390             3890             2406             10346

    1079894           1            2123             1530             250              1156

    1079894           127          2216             1626             860              656

    1079894           78744        6170             5656             3000             17216

    1079894           18300        2876             2326             1406             5390

    1079894           4427         2236             1780             1343             1890

    1079894           68           2190             1593             703              733

    1079894           1493         2280             1626             1296             983

    1079894           253          2500             1703             970              686

    1079894           1            2186             1720             156              653

    1079894           11           2283             1576             406              623

    1079894           21056        2813             2860             1470             5093

    1079894           1293         2170             1533             1203             1186

    1079894           32914        3326             3030             1673             6983

    1079894           1            2313             1593             156              830

    1079894           20495        2983             2483             1436             4406

    1079894           2            2076             1563             330              656

    1079894           4490         2263             1796             1330             1796

    1079894           6762         2376             1886             1376             2360

    1079894           64           2140             1640             720              733

    1079894           1377         2216             1783             1216             890

    1079894           11596        2720             2190             1390             4013

    1079894           1703         2233             1596             1263             1236

    1079894           50737        4653             4236             2576             17970

    1079894           723          2140             1576             1173             1563

    1079894           748          2153             1580             1156             860

    1079894           35           2140             1563             653              656

    1079894           218          2236             1576             890              656

    1079894           13           2250             1796             593              640

    1079894           27           2250             1580             593              640

    1079894           54905        4890             4516             2640             12296

    */

     

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply