Want to create random alphanumeric characters for primary key values

  • having a random string, like a guid, will result in massive amount of page splits every time a new user is added.

    Well thats unexpected. I would think it would be more like say a page for each level of the b-tree, with the leaves splitting every n inserts, the parent bucket every n*n inserts, etc... Where does the massive number of page splits for each new key come from?

  • Hi carrieathomer,

    Be assured that there's absolutely nothing wrong with your requirement. Alphanumeric identifiers are extremely common, partly because they are so much easier for humans to use than numeric-only identifiers. Alphanumerics often make shorter keys than numerics and people usually find them easier to recognise, copy and type. That's why alphanumerics are often used for order numbers, serial numbers, document numbers and especially in ecommerce among many other applications - anywhere you need a more "user-friendly" key. Most of us have dealings with such keys almost daily, whether it's on utility bills, flight reservation numbers, vehicle licence plates or online shopping.

    So why are you being told otherwise? I think the critics here are assuming you will also use your primary key as a clustered index. You probably shouldn't do that. An arbitrary string isn't a good idea for a clustered index because it can cause page splits and fragmentation. If the strings are always allocated in sequence then it may be an option to use as a cluster key but cluster keys are best kept as small as possible. Your primary key does not have to be clustered and in this case it most likely shouldn't be.

    That being understood, here's an example function that will convert an arbitrary number to an alpha-numeric string. You can modify it to your preference:

    CREATE FUNCTION dbo.EncodeInt(@i INTEGER)

    RETURNS CHAR(8)

    AS

    BEGIN;

    RETURN

    SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ',FLOOR(@i/9/9/9/9/26/26/26)%26+1 ,1)+

    SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ',FLOOR(@i/9/9/9/9/26/26)%26+1 ,1)+

    SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ',FLOOR(@i/9/9/9/9/26)%26+1 ,1)+

    SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ',FLOOR(@i/9/9/9/9)%26+1 ,1)+

    SUBSTRING('123456789',FLOOR(@i/9/9/9)%9+1 ,1)+

    SUBSTRING('123456789',FLOOR(@i/9/9)%9+1 ,1)+

    SUBSTRING('123456789',FLOOR(@i/9)%9+1 ,1)+

    SUBSTRING('123456789',@i%9+1 ,1)

    END

    GO

    SELECT dbo.EncodeInt(0);

  • I have only scanned the thread so apologies if this has already been pointed out.

    As stated, your PK should probably be auto-increment integer; however presenting an alternate Alpha Numeric key to the users has a number of benefits:

    The main ones being that the keys are likely to be shorter as you are using base 22 (assuming you don't user IOQZ) or base 32 if numerics are also allowed, and it makes it (almost) impossible to 'guess' a correct alternate key so it is safer for record identification in a URL and the keys are generally easier to remember because there will be patterns to them.

  • CELKO (2/18/2015)


    I need to create random alphanumeric characters as primary key values when inserting a record [sic]. .. I don't want to use GUID or auto increment integer or sequence

    Look up additive congruence generators. They are like a RNG, but they do not repeat.

    That's just mean. :w00t:

    There's only small group that really understand something as esoteric as discontinuous piecewise linear equations (and I'm not in included).

    Don Simpson



    I'm not sure about Heisenberg.

  • CELKO (2/18/2015)


    I need to create random alphanumeric characters as primary key values when inserting a record [sic]. .. I don't want to use GUID or auto increment integer or sequence

    Look up additive congruence generators. They are like a RNG, but they do not repeat.

    I'd be interested to hear how you propose to implement an additive (or linear) congruential generator using a set-based algorithm that doesn't involve recursion. Unless there's some magical power series decomposition of the basic generator function available that I'm not aware of.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (2/19/2015)


    CELKO (2/18/2015)


    I need to create random alphanumeric characters as primary key values when inserting a record [sic]. .. I don't want to use GUID or auto increment integer or sequence

    Look up additive congruence generators. They are like a RNG, but they do not repeat.

    I'd be interested to hear how you propose to implement an additive (or linear) congruential generator using a set-based algorithm that doesn't involve recursion. Unless there's some magical power series decomposition of the basic generator function available that I'm not aware of.

    I'm guessing he'll push that functionality out to the app server!

  • patrickmcginnis59 10839 (2/19/2015)


    dwain.c (2/19/2015)


    CELKO (2/18/2015)


    I need to create random alphanumeric characters as primary key values when inserting a record [sic]. .. I don't want to use GUID or auto increment integer or sequence

    Look up additive congruence generators. They are like a RNG, but they do not repeat.

    I'd be interested to hear how you propose to implement an additive (or linear) congruential generator using a set-based algorithm that doesn't involve recursion. Unless there's some magical power series decomposition of the basic generator function available that I'm not aware of.

    I'm guessing he'll push that functionality out to the app server!

    There's certainly a good argument that that is where the logic should be.

    Don Simpson



    I'm not sure about Heisenberg.

  • CELKO (2/20/2015)


    I'm guessing he'll push that functionality out to the app server!

    There's certainly a good argument that that is where the logic should be.

    Here is the formula for a 31-bit additive congruential generator. It does not repeat until it has completed a full cycle.

    UPDATE generator

    SET keyval = keyval/2 + ((keyval %2) + (keyval/8 % 2) % 2) * 2^30;

    You can put this into a procedure or function in SQL

    Here is the same algorithm implemented in C.

    int asequence()

    {static int n = 1;

    n = n>>1 | (( n^n>>3 ) & 1) << 30;

    return n;}

    There are other formulas for different length integers.

    I am not sure about whether to put it in the DB or in an input layer myself. My thought is that there ought to be some way to use CREATE SEQUENCE, but I don't know what it is.

    Which data types are necessary? I ask because, as written, the SQL version (once the spreadsheet formula of (2^30 is replaced) creates many dupes.

    CREATE TABLE #MyHead (RowNum INT IDENTITY(1,1), KeyValue BIGINT)

    ;

    DECLARE @keyval BIGINT = 100

    ,@Counter INT = 1000000

    ;

    SET NOCOUNT ON;

    BEGIN TRANSACTION

    WHILE @Counter >= 0

    BEGIN

    SELECT @keyval = @keyval/2 + ((@keyval %2) + (@keyval/8 % 2) % 2) * 1073741824

    ,@Counter = @Counter-1

    ;

    INSERT INTO #MyHead (KeyValue)

    SELECT @keyval

    ;

    END

    ;

    COMMIT

    ;

    SELECT KeyValue

    ,DupeCount = COUNT(*)

    FROM #MyHead

    GROUP BY KeyValue

    HAVING COUNT(*) > 1

    ;

    DROP TABLE #MyHead

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (2/21/2015)


    CELKO (2/20/2015)


    I'm guessing he'll push that functionality out to the app server!

    There's certainly a good argument that that is where the logic should be.

    Here is the formula for a 31-bit additive congruential generator. It does not repeat until it has completed a full cycle.

    UPDATE generator

    SET keyval = keyval/2 + ((keyval %2) + (keyval/8 % 2) % 2) * 2^30;

    You can put this into a procedure or function in SQL

    Here is the same algorithm implemented in C.

    int asequence()

    {static int n = 1;

    n = n>>1 | (( n^n>>3 ) & 1) << 30;

    return n;}

    There are other formulas for different length integers.

    I am not sure about whether to put it in the DB or in an input layer myself. My thought is that there ought to be some way to use CREATE SEQUENCE, but I don't know what it is.

    Which data types are necessary? I ask because, as written, the SQL version (once the spreadsheet formula of (2^30 is replaced) creates many dupes.

    CREATE TABLE #MyHead (RowNum INT IDENTITY(1,1), KeyValue BIGINT)

    ;

    DECLARE @keyval BIGINT = 100

    ,@Counter INT = 1000000

    ;

    SET NOCOUNT ON;

    BEGIN TRANSACTION

    WHILE @Counter >= 0

    BEGIN

    SELECT @keyval = @keyval/2 + ((@keyval %2) + (@keyval/8 % 2) % 2) * 1073741824

    ,@Counter = @Counter-1

    ;

    INSERT INTO #MyHead (KeyValue)

    SELECT @keyval

    ;

    END

    ;

    COMMIT

    ;

    SELECT KeyValue

    ,DupeCount = COUNT(*)

    FROM #MyHead

    GROUP BY KeyValue

    HAVING COUNT(*) > 1

    ;

    DROP TABLE #MyHead

    ;

    Looks like the SQL formula is wrong, it's not the same as the C code.

    This works better

    SELECT @keyval = (@keyval/2 + ((@keyval %2) ^ (@keyval/8 % 2) % 2) * 1073741824) % 2147483648

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply