Loading dummy data

  • If I want to populate a table with dummy data to test procs and then add indexes we needed, do I have to add unique values in each row or will it be the same if the there was the same row 1 000 000 times. Basically is 1 000 000 unique rows processed differently from 1 000 000 unique rows.


    The Fastest Methods aren't always the Quickest Methods

  • you need to try to generate different values for each row - or at least in the columns which are important. At best you need data representative of actual data if you want to test - I've seen many a failure due to not having "typical" data sets.

    There's lots of ways to do this but usually a loop does the trick - in fact I'm just doing some testing of that type ( well generating data for testing of a specific kind ) I populate tables with 1 million rows each for this.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • do you have any known methods to generate a lot of random rows, I could do it through a C# app but is there a good way SQL


    The Fastest Methods aren't always the Quickest Methods

  • well I wrote up my benchmark tests so you can see how I attempted to generate data. I use this type of approach.

    http://www.grumpyolddba.co.uk/infrastructure/sqlbenchmark.htm

    I used this script to generate character data ( 17,500 rows unique ) which was used for testing foreign key lookups performance. ( I needed a reasonable amount of unique values for a lookup table ) you can also cast a guid to varchar - from an indexing point of view you just need the minimum uniqueness.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • sorry forgot to paste!

    set nocount on

    declare @scount tinyint=65,@scount2 tinyint=70,@scount3 tinyint=68

    declare @loop int =1

    while @loop<17577

    begin

    --insert into dbo.LookupCharOne(Squirrel)

    select CHAR(@scount)+CHAR(@scount2)+CHAR(@scount3)+'sqlsquirrel'

    set @scount+=1;

    if @scount>90

    BEGIN

    set @scount=65

    set @scount2+=1

    END

    if @scount2>90

    BEGIN

    set @scount2=65

    set @scount3+=1

    END

    if @scount3>90 set @scount3=65

    end

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • thank you, so it doesn't have to be completely different values just that they are unique?


    The Fastest Methods aren't always the Quickest Methods

  • INSERT INTO TAB1 values('Jahanzaib','Pakistan')

    GO 400000

    INSERT INTO TAB1 values('Colin Roberts','UK')

    GO 400000

    INSERT INTO TAB1 values('Binary','South Africa')

    GO 100000

    INSERT INTO TAB1 values('SuperMan','USA')

    GO 100000

    that is an example to load the dummy data just execute it and then apply queries to find the indexes requirement.Note It ,I assumed that there is an Identity column ID in this table,you can do same as I define with your table structure as well,create 5 to 10 different records and then iterate it with your requirement with GO operator

    Regards,
    Syed Jahanzaib Bin Hassan
    BSCS | MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog
    www.aureus-salah.com

  • BinaryDigit (5/13/2011)


    thank you, so it doesn't have to be completely different values just that they are unique?

    One or the other will do just fine 😉

    Jeff Moden has a sample data generator which has been used for years on this forum. Here's a very small adjustment to his procedure which generates 100,000 rows of mixed-datatype data:

    USE [Testing]

    GO

    /****** Object: StoredProcedure [dbo].[CreateSampleData] Script Date: 05/13/2011 12:25:45 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- =============================================

    -- Author: Jeff Moden

    -- Create date: <Create Date,,>

    -- Description: <Description,,>

    -- =============================================

    ALTER PROCEDURE [dbo].[CreateSampleData]

    AS

    --===== Preset the environment for appearance and speed

    SET NOCOUNT ON

    --===== If the test table already exists, drop it in case we need to rerun.

    -- The 3 part naming is overkill, but prevents accidents on real tables.

    IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL

    DROP TABLE TempDB.dbo.TransactionDetail

    --===== Create the test table (TransactionDetail) with a NON clustered PK

    CREATE TABLE TempDB.dbo.TransactionDetail (

    TransactionDetailID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    [Date] DATETIME,

    AccountID INT,

    Amount MONEY,

    Sentence VARCHAR(200),

    AccountRunningTotal MONEY, --Running total across each account

    AccountRunningCount INT, --Like "Rank" across each account

    NCID INT)

    INSERT INTO TempDB.dbo.TransactionDetail (Date, AccountID, Amount, Sentence)

    SELECT TOP 100000

    --10 years worth of dates with times from 1/1/2000 to 12/31/2009

    CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,

    --100 different account numbers

    ABS(CHECKSUM(NEWID()))%100+1,

    --Dollar amounts from -99.99 to + 99.99

    CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY),

    --Randomised number of words

    iTVF.Sentence

    FROM Master.dbo.SysColumns sc1

    CROSS JOIN Master.dbo.SysColumns sc2

    CROSS APPLY (

    SELECT Word AS 'data()'

    FROM (

    SELECT TOP ((sc1.colorder*sc2.colorder)%9+1) word

    FROM (

    SELECT word = CAST('the' AS VARCHAR(200)) UNION ALL

    SELECT 'quick' UNION ALL

    SELECT 'brown' UNION ALL

    SELECT 'fox' UNION ALL

    SELECT 'jumped' UNION ALL

    SELECT 'over' UNION ALL

    SELECT 'the' UNION ALL

    SELECT 'lazy' UNION ALL

    SELECT 'dog'

    ) Words ORDER BY NEWID()) u2

    FOR XML PATH('')

    ) iTVF(Sentence)

    RETURN 0

    SELECT * FROM TempDB.dbo.TransactionDetail


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • thanks a lot this will come in handy 🙂


    The Fastest Methods aren't always the Quickest Methods

  • what I did as a random test was the below 🙂

    INSERT INTO randomTest

    (Name, Surname, Age)

    values

    ('John'+ DATENAME(millisecond,getdate()),'Smith' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate())),

    ('Paul'+ DATENAME(millisecond,getdate()) ,'Foster' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate())),

    ('Lee' + DATENAME(millisecond,getdate()),'Carey' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate()))

    GO 9999999999

    add about 10 people and then each row should be unique 😀


    The Fastest Methods aren't always the Quickest Methods

  • Is 29,999,999,997 rows representative of the data you wish to model? Or, with 10 names to start, 99,999,999,990 rows? Is the data itself representative?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply