Loading dummy data

Question

Post reply

Loading dummy data

BinaryDigit

SSCrazy

Points: 2523
More actions
May 13, 2011 at 2:15 am

#237400

If I want to populate a table with dummy data to test procs and then add indexes we needed, do I have to add unique values in each row or will it be the same if the there was the same row 1 000 000 times. Basically is 1 000 000 unique rows processed differently from 1 000 000 unique rows.

The Fastest Methods aren't always the Quickest Methods

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 1

you need to try to generate different values for each row - or at least in the columns which are important. At best you need data representative of actual data if you want to test - I've seen many a failure due to not having "typical" data sets.

There's lots of ways to do this but usually a loop does the trick - in fact I'm just doing some testing of that type ( well generating data for testing of a specific kind ) I populate tables with 1 million rows each for this.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

BinaryDigit SSCrazy Points: 2523 More actions · Answer 2

BinaryDigit

SSCrazy

Points: 2523

May 13, 2011 at 3:32 am

#1324144

do you have any known methods to generate a lot of random rows, I could do it through a C# app but is there a good way SQL

The Fastest Methods aren't always the Quickest Methods

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 3

well I wrote up my benchmark tests so you can see how I attempted to generate data. I use this type of approach.

http://www.grumpyolddba.co.uk/infrastructure/sqlbenchmark.htm

I used this script to generate character data ( 17,500 rows unique ) which was used for testing foreign key lookups performance. ( I needed a reasonable amount of unique values for a lookup table ) you can also cast a guid to varchar - from an indexing point of view you just need the minimum uniqueness.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 4

sorry forgot to paste!



set nocount on

declare @scount tinyint=65,@scount2 tinyint=70,@scount3 tinyint=68

declare @loop int =1

while @loop<17577

begin

--insert into dbo.LookupCharOne(Squirrel)

select CHAR(@scount)+CHAR(@scount2)+CHAR(@scount3)+'sqlsquirrel'

set @scount+=1;

if @scount>90

BEGIN

	set @scount=65

	set @scount2+=1

END

if @scount2>90

BEGIN

set @scount2=65

set @scount3+=1

END

if @scount3>90 set @scount3=65

end

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

BinaryDigit SSCrazy Points: 2523 More actions · Answer 5

BinaryDigit

SSCrazy

Points: 2523

May 13, 2011 at 4:50 am

#1324184

thank you, so it doesn't have to be completely different values just that they are unique?

The Fastest Methods aren't always the Quickest Methods

Syed Jahanzaib Bin hassan SSChampion Points: 11410 More actions · Answer 6

INSERT INTO TAB1 values('Jahanzaib','Pakistan')

GO 400000

INSERT INTO TAB1 values('Colin Roberts','UK')

GO 400000

INSERT INTO TAB1 values('Binary','South Africa')

GO 100000

INSERT INTO TAB1 values('SuperMan','USA')

GO 100000

that is an example to load the dummy data just execute it and then apply queries to find the indexes requirement.Note It ,I assumed that there is an Identity column ID in this table,you can do same as I define with your table structure as well,create 5 to 10 different records and then iterate it with your requirement with GO operator

My Blog
www.aureus-salah.com

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 7

BinaryDigit (5/13/2011)

thank you, so it doesn't have to be completely different values just that they are unique?

One or the other will do just fine 😉

Jeff Moden has a sample data generator which has been used for years on this forum. Here's a very small adjustment to his procedure which generates 100,000 rows of mixed-datatype data:

USE [Testing]



GO

/****** Object:  StoredProcedure [dbo].[CreateSampleData]    Script Date: 05/13/2011 12:25:45 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

-- =============================================

-- Author:		Jeff Moden

-- Create date: <Create Date,,>

-- Description:	<Description,,>

-- =============================================

ALTER PROCEDURE [dbo].[CreateSampleData] 

AS

--===== Preset the environment for appearance and speed

SET NOCOUNT ON

--===== If the test table already exists, drop it in case we need to rerun.

-- The 3 part naming is overkill, but prevents accidents on real tables.

IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL

DROP TABLE TempDB.dbo.TransactionDetail

--===== Create the test table (TransactionDetail) with a NON clustered PK

CREATE TABLE TempDB.dbo.TransactionDetail        (

	TransactionDetailID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

	[Date]              DATETIME,

	AccountID           INT,

	Amount              MONEY,

	Sentence			VARCHAR(200),

	AccountRunningTotal MONEY,  --Running total across each account

	AccountRunningCount INT,    --Like "Rank" across each account

	NCID                INT)

INSERT INTO TempDB.dbo.TransactionDetail (Date, AccountID, Amount, Sentence)

     SELECT TOP 100000

     --10 years worth of dates with times from 1/1/2000 to 12/31/2009

     CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,

     --100 different account numbers

     ABS(CHECKSUM(NEWID()))%100+1,

     --Dollar amounts from -99.99 to + 99.99

     CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY),

     --Randomised number of words

     iTVF.Sentence

FROM Master.dbo.SysColumns sc1

CROSS JOIN Master.dbo.SysColumns sc2

CROSS APPLY (

	SELECT Word AS 'data()'

	FROM (

		SELECT TOP ((sc1.colorder*sc2.colorder)%9+1) word

		FROM (

		SELECT word = CAST('the' AS VARCHAR(200)) UNION ALL

		SELECT 'quick' UNION ALL

		SELECT 'brown' UNION ALL

		SELECT 'fox' UNION ALL

		SELECT 'jumped' UNION ALL

		SELECT 'over' UNION ALL

		SELECT 'the' UNION ALL

		SELECT 'lazy' UNION ALL

		SELECT 'dog'

		) Words ORDER BY NEWID()) u2

	FOR XML PATH('') 

) iTVF(Sentence)  

RETURN 0

SELECT * FROM TempDB.dbo.TransactionDetail

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

BinaryDigit SSCrazy Points: 2523 More actions · Answer 8

BinaryDigit

SSCrazy

Points: 2523

May 13, 2011 at 5:36 am

#1324207

thanks a lot this will come in handy 🙂

The Fastest Methods aren't always the Quickest Methods

BinaryDigit SSCrazy Points: 2523 More actions · Answer 9

what I did as a random test was the below 🙂



INSERT INTO randomTest

(Name, Surname, Age)

values

('John'+ DATENAME(millisecond,getdate()),'Smith' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate())),

('Paul'+ DATENAME(millisecond,getdate()) ,'Foster' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate())),

('Lee' + DATENAME(millisecond,getdate()),'Carey' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate()))

GO 9999999999

add about 10 people and then each row should be unique 😀

The Fastest Methods aren't always the Quickest Methods

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 10

Is 29,999,999,997 rows representative of the data you wish to model? Or, with 10 names to start, 99,999,999,990 rows? Is the data itself representative?

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]