The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

  • Sure,

    I'd like the output to be like this.

    Name | Oct 2010 | Nov 2010 | Dec 2010 | 2010 Total | Jan 2011 | Feb 2011 | Mar 2011 | ... | 2011 Total |

  • Tim-24860 (3/28/2011)


    Sure,

    I'd like the output to be like this.

    Name | Oct 2010 | Nov 2010 | Dec 2010 | 2010 Total | Jan 2011 | Feb 2011 | Mar 2011 | ... | 2011 Total |

    Perfect. I've got just the article for that...

    http://qa.sqlservercentral.com/articles/Crosstab/65048/

    For further understanding, especially when it comes to performance on the subject of Cross Tabs by using "pre-aggregation" as Peter Larsson calls it, see part one of that series...

    http://qa.sqlservercentral.com/articles/T-SQL/63681/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Great article Jeff!

    One situation I would like to see if it can be solved, is I wrote a script that will search my database char columns for 'funny' characters. Of course it does this rbar and takes a good 20 minutes to run (but does save me some head aches because our source system does NOT have anything in place to prevent these characters).

    I wanted to try and use this tally table method to see if I can improve performance but I have run into a problem. My loop iteratively builds the character field so for instance

    'test data' in the test varchar(100) column would be searched one character at a time and the output would be compared to the orginal (I have a lot of other code in between that makes this useful but the loop is the crux of the performance).

    Anyways when I use the tally method this spits the result out as rows instead of a singular column like I have in my loop so it would appear

    test

    1 t

    2 e

    3 s

    4 t

    5

    6 d

    7 a

    8 t

    9 a

    Is there a way using the tally method to roll this back up, or return it as just 'test data' from this point?

    Link to my blog http://notyelf.com/

  • Thanks for the feedback, Shannon.

    When you say...

    One situation I would like to see if it can be solved, is I wrote a script that will search my database char columns for 'funny' characters.

    I'm thinking that you may not need a loop or a Tally Table. Could you give an example of something with "funny" characters in it, the original, and explain what you want done one you find a "funny" character? Perhaps maybe even post your loop code comparsion?

    Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Not a problem. I know I didn't do too good of a job explaining myself. Basically we transfer some of our data to a 3rd party source for reporting that needs to be in ascii. Because our source is in unicode and not constrained to certain character times, we get a lot of diacritical marks and sometimes just some really strange characters (some I have no idea what they are!). My code at the moment doesn't remove them yet, just simply spits out a list of which row and column in which tables are offenders and I clean them up afterwards.

    Here is the function I use to do this

    CREATE FUNCTION dbo.RemoveSpecialChars (@s varchar(256)) returns varchar(256)

    with schemabinding

    BEGIN

    DECLARE @s2 varchar(256) = '' ,@p int = 1,@l int = len(@s),@c int

    IF @S is null

    RETURN null

    IF len(@s) = 0 AND @S is not null

    RETURN ''

    WHILE @p <= @l

    BEGIN

    SET @C = ascii(substring(@s, @p, 1))

    IF @C between 32 and 126 Or @C IN (220,146,252,167,214,246,196,228,145,147,148,150,180,188,189,190, 201,233,166, 195, 227, 193, 225, 199, 231, 209, 241)

    SET @s2 = @s2 + CHAR(@c)

    SET @p = @p + 1

    END

    RETURN @s2

    END

    GO

    From there I have a stored proc that simply loops through every column in every table and builds a sql statement that is executed via sp_executesql @SQL. So technically its 3 loops, which I have been trying to stray away from for a long time :-D.

    Anyways, any help on this is appreciated, I have been looking for an alternate solution in my spare time but as of yet to find anything. I have successfully added the tally table into some loops within my production code in the past though so thank you again!

    Link to my blog http://notyelf.com/

  • Ideas:

    1) Are you dumping your results into a text file of some sort? If so, it might be easier to this in a post-processing script on the text file.

    1b) dump it to a text file, process to strip the bad characters, import it back in.

    2) Use SSIS and use a script to do the processing.

    3) What I fundamentally want to do is something like:

    REPLACE(...(Replace(Column,CHAR(1),' '), Char(2), ' ')..., Char(255), ' ')

    Which is much easier (for me anyway) in a script, which I would build like this:

    // PseudoCode

    Dim BadChars as String, temp as string

    BadChars = Char(1) & Char(2) & ... // (All of the Characters that you do not want...)

    temp = inputstring

    For i = 1 to len(BadChars)

    temp = replace(temp,mid(BadChars,i,1),' ')

    next

    return temp

    Hopefully someone else has a better answer for you.

    --

    JimFive

    Edited to fix psuedocode error

  • shannonjk (5/4/2011)


    Not a problem. I know I didn't do too good of a job explaining myself. Basically we transfer some of our data to a 3rd party source for reporting that needs to be in ascii. Because our source is in unicode and not constrained to certain character times, we get a lot of diacritical marks and sometimes just some really strange characters (some I have no idea what they are!). My code at the moment doesn't remove them yet, just simply spits out a list of which row and column in which tables are offenders and I clean them up afterwards.

    Here is the function I use to do this

    (.... function removed so won't kick up a storm at work...)

    From there I have a stored proc that simply loops through every column in every table and builds a sql statement that is executed via sp_executesql @SQL. So technically its 3 loops, which I have been trying to stray away from for a long time :-D.

    Anyways, any help on this is appreciated, I have been looking for an alternate solution in my spare time but as of yet to find anything. I have successfully added the tally table into some loops within my production code in the past though so thank you again!

    That function does actually remove all but the "sanctioned" list of characters from the return variable. You may not be using it that way, but that's what it actually does. Anyway, we should be able to get rid of some of the loops no matter which way you need to have it. I'll see if I can demo for you after work.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Yes you are correct sorry it has been a while since I designed and/or even looked at this code. I use it as a comparative like

    select dbo.removespecialcharacters(column) from table where column != dbo.removespecialcharacters(column)

    The output shows only 'bad' data and I take the output from there to manipulate it.

    I have a master table I save to show me how to get to unique records on all my tables that is referenced in the stored proc loop so when it spits out the bad data its along with the unique row identifiers as well.

    Link to my blog http://notyelf.com/

  • Sorry... I didn't forget you... I just got busy. I'll be back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • No worries take your time. I have been looking for a solution as well, I will post something if I figure it out πŸ˜€

    Link to my blog http://notyelf.com/

  • shannonjk (5/5/2011)


    No worries take your time. I have been looking for a solution as well, I will post something if I figure it out πŸ˜€

    Shannon, to confirm, in the end, you wanted something like this:

    IF OBJECT_ID( 'tempdb..#tmp') is not null

    drop table #tmp

    CREATE table #tmp ( RowID INT IDENTITY( 1,1) NOT NULL, testfield nVARCHAR(256))

    INSERT INTO #tmp (testfield) VALUES

    ( N'abcdefghijklmnopqrstuvwxyz'),

    (N'abcdef' + nCHAR(300) + N'ghi'),

    (N'abcd' + nchar(301) + N'efg' + nchar(302) + N'hij' + nCHAR(303))

    SELECT * FROM #tmp

    --===== "Inline" CTE Driven "Tally Table” produces values up to

    -- 10,000... enough to cover VARCHAR(8000)

    ;WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)

    SELECT

    t.RowID,

    t.testfield AS BadField

    FROM

    #tmp AS t

    JOIN

    (SELECT DISTINCT

    t.RowID

    FROM

    cteTally AS ct,

    #tmp AS t

    WHERE

    ct.N <= LEN( t.testfield)

    AND UNICODE(substring( t.testfield, n, 1)) > 256

    ) AS drv

    ON t.RowID = drv.RowID


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (5/5/2011)


    Shannon, to confirm, in the end, you wanted something like this:

    What I was thinking is that no Tally Table is necessary for this. I believe a LIKE NOT (using [^]) would do it faster than a Tally Table implementation. I just haven't had the time to tweek it so it escapes embedded wildcard and reserved characters of _, %, and [ although I do believe I just had an epiphany in that area. πŸ™‚

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (5/7/2011)


    Craig Farrell (5/5/2011)


    Shannon, to confirm, in the end, you wanted something like this:

    What I was thinking is that no Tally Table is necessary for this. I believe a LIKE NOT (using [^]) would do it faster than a Tally Table implementation. I just haven't had the time to tweek it so it escapes embedded wildcard and reserved characters of _, %, and [ although I do believe I just had an epiphany in that area. πŸ™‚

    Most likely, and I'm sure you're right. This would allow for the stripping of the characters or whatnot as a baseline for a number of different tasks, but for simple location it's probably a bit overkill and slower. If I get a chance I'll see if I can determine if an accented A exists in [a-Z]. πŸ™‚


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • At this point, I thought Shannon didn't really want to strip characters... only find those "cells" that had unwanted characters. At the very least, LIKE NOT would likely make a decent filter for those "cells" that do need cleaning.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Ok, Shannon... here's part 1. No need for a Tally Table just to figure out if a "cell" has "weird" characters in it. The following code makes 100,000 rows with a column that may have occasional "wierd" characters in it (about 7,800 rows... I say "about" because they're randomly generated).

    --===== Create a test table where the last character

    -- may or may not be an illegal character

    SELECT TOP 100000

    RowNum = IDENTITY(INT,1,1),

    TestString =

    CAST(NEWID() AS VARCHAR(36))

    + CHAR(ABS(CHECKSUM(NEWID()))%90+25)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    To find them is easy. We'll build your string of "valid" characters like you did before but we'll add some special "escape" characters using a handful of nested replaces. These "escape" characters are important to "escape" characters normally used as WildCards and character range blocks so they'll be treated as normal characters. Then, it's just a matter of doing a SELECT with the right kind of "LIKE" in the WHERE clause with an ESCAPE clause to use the special "escape" characters we inserted before.

    --===== Build the "valid" character string.

    DECLARE @LegalCharacters VARCHAR(256)

    SELECT @LegalCharacters = ISNULL(@LegalCharacters,'')+CHAR(t.N)

    FROM dbo.Tally t

    WHERE t.N between 32 and 126

    OR t.N IN (220,146,252,167,214,246,196,228,145,147,148,150,180,188,189,190,201,233,166,195,227,193,225,199,231,209,241)

    ;

    --===== Add the "escape" characters to the characters normally used as WildCards and character range blocks.

    -- In this case, I've used an underscore as the "escape" character.

    SELECT @LegalCharacters = REPLACE(REPLACE(REPLACE(REPLACE(@LegalCharacters,'_','__'),'%','_%'),'[','_['),']','_]')

    ;

    --===== And now we find ALL of the "cells" in the column that have a "wierd" character.

    -- The "AsciiValue" column shows the decimal ASCII value of the "weird" character found.

    -- I didn't use every possible wierd character but it will work for all "weird" characters.

    SELECT RowNum, TestString, AsciiValue = ASCII(RIGHT(TestString,1))

    FROM #TestTable

    WHERE TestString LIKE '%[^'+@LegalCharacters+']%' ESCAPE '_'

    ;

    The SELECT to find the "wierd" characters finds all ~7,800 rows out of 100,000 rows in about 9 seconds on my 9 year old single 1.8GHz CPU. It should do a whole lot better on a server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 316 through 330 (of 497 total)

You must be logged in to reply to this topic. Login to reply