SELECT 1 random row query help, please

  • Kiril,

    at least your script can run a good deal faster, if you'll choose to go this way. You needn't copy the whole source table into #TBL and needn't so much scans on it.

    DECLARE @CategoryId INT = 6565;

    DECLARE @DS1MinValue INT= 1; --row_number() start

    DECLARE @DS1MaxValue INT;

    DECLARE @DS2MinValue INT= 1;

    DECLARE @DS2MaxValue INT;

    SELECT Id

    ,rnV = case when categoryId is not null then row_number() over(partition by categoryId order by (select 1)) end

    ,rnN = case when categoryId is null then row_number() over(partition by categoryId order by (select 1)) end

    ,CategoryId, SKU

    INTO #TBL

    FROM @t

    WHERE CategoryId = @CategoryId OR categoryId IS NULL;

    SELECT @DS1MaxValue= MAX(rnV),@DS2MaxValue=MAX(rnN)

    FROM #TBL;

    IF @DS1MaxValue IS NOT NULL

    BEGIN

    -- pick a random row from data set where CategoryId IS NOT NULL

    SELECT * FROM #TBL WHERE rnv = ROUND(((@DS1MaxValue - @DS1MinValue) * RAND() + @DS1MinValue), 0);

    END

    ELSE

    BEGIN

    -- try pick a random row from data set where CategoryId IS NULL

    SELECT * FROM #TBL WHERE rnN = ROUND(((@DS2MaxValue - @DS2MinValue) * RAND() + @DS2MinValue), 0);

    END;

    DROP TABLE #TBL;

  • kiril.lazarov.77 (12/10/2015)


    TheSQLGuru (12/9/2015)


    Handling gaps with my mechanism really is simple and efficient. Been there, done that, got the tshirt.

    But your new requirement about category makes it much more likely you will need to do the sort thing, although possibly with an exists check first. If row exists with your category, do the sort/rownumber thing for that category. If it doesn't exist do the sort/rownumber thing for null categories. As Scott said, these would hopefully all be narrow-index hits.

    Hi Kevin, Many thanks for your help. So basically you're saying I should use the script I attached a couple of comments above? Any way I can make that faster? That's very slow when I run it on my db. Thanks

    Sorry - I am on an international trip and pretty brain-dead right now, so I can't give detailed information or a script because it may well come out as gibberish! :hehe: But key points are:

    1) don't put stuff into a temp table when you don't need to, and here I cannot see that it is necessary

    2) you must do as-narrow-as-possible index seek/scan to accomplish your query

    3) I still think (but cannot be certain without much more information about probabilities of hitting both a category and a gap) it would be faster to do an index seek first to see if ANY row exists with a non-null category passed in. If you get a hit, acquire the row using a full row_number random thing to get one row. If you don't get a hit, do the iterative check for a computed ID like I mentioned. If you have huge gaps of missing data this could be less efficient some of the time but likely still more efficient overall.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 2 posts - 16 through 16 (of 16 total)

You must be logged in to reply to this topic. Login to reply