Having fun with PARSENAME (SQL Spackle)

  • Jeff Moden (1/18/2016)


    Eirikur Eiriksson (1/18/2016)


    Network endian-ness is predominantly big-endian and the method NetworkToHostOrder will swap the byte order to small-endian. This tells me that you are storing the IPs as big-endian bigint with the first four bytes set to zero which makes sense as otherwise they could not be sorted.

    😎

    I guess I'm missing something here, Eirikur. If you remove the dots from a properly formed IP address (each octet containing 3 display digits) and convert that string to an Integer value, you still end up with a 12 digit Integer value. A 4 digit Integer is only capable of a 10 digit value and, of course, the leading digit will not exceed "1". What conversion are you using where the first 4 bytes of a big endian integer would all be zeroed out for a 12 digit IP address that's been converted to an Integer value?

    Slight correction on the math here, unsigned each byte stores values from 0x00 to 0xFF or 0 to 256 255 decimal, that decimal is 3 digits and we have four of those bytes in an integer hence 12 digits.

    😎

    Simplest mean of conversion in T-SQL is simply to substring and prefix with the padding, here is a pseudo snip

    0x00000000 + SUBSTRING(CONVERT(BINARY(4),IA.FAKE_IP,0),4,1) +

    SUBSTRING(CONVERT(BINARY(4),IA.FAKE_IP,0),3,1) +

    SUBSTRING(CONVERT(BINARY(4),IA.FAKE_IP,0),2,1) +

    SUBSTRING(CONVERT(BINARY(4),IA.FAKE_IP,0),1,1) AS IP_BINTREV

    Edit: Typo

  • Jeff Moden (1/18/2016)


    thierry.vandurme (1/18/2016)


    I think these helper functions are now deprecated and you need to add a directive to the code to ignore the warnings about that.

    Any idea what they want you to use instead of what sounds like a wonderful helper function?

    Hi Jeff,

    no not really and I haven't really explored any alternatives either. I think if we ever need to rewrite the application, management will appoint a "proper" developer to do so (I wrote this small 'IP Database' application when I was part of the infrastructure team - not much later I moved to the database team)

    Cheers,

    Thierry

  • Eirikur Eiriksson (1/18/2016)


    Jeff Moden (1/18/2016)


    Eirikur Eiriksson (1/18/2016)


    Network endian-ness is predominantly big-endian and the method NetworkToHostOrder will swap the byte order to small-endian. This tells me that you are storing the IPs as big-endian bigint with the first four bytes set to zero which makes sense as otherwise they could not be sorted.

    😎

    I guess I'm missing something here, Eirikur. If you remove the dots from a properly formed IP address (each octet containing 3 display digits) and convert that string to an Integer value, you still end up with a 12 digit Integer value. A 4 digit Integer is only capable of a 10 digit value and, of course, the leading digit will not exceed "1". What conversion are you using where the first 4 bytes of a big endian integer would all be zeroed out for a 12 digit IP address that's been converted to an Integer value?

    Slight correction on the math here, unsigned each byte stores values from 0x00 to 0xFF or 0 to 256 decimal, that decimal is 3 digits and we have four of those bytes in an integer hence 12 digits.

    😎

    Simplest mean of conversion in T-SQL is simply to substring and prefix with the padding, here is a pseudo snip

    0x00000000 + SUBSTRING(CONVERT(BINARY(4),IA.FAKE_IP,0),4,1) +

    SUBSTRING(CONVERT(BINARY(4),IA.FAKE_IP,0),3,1) +

    SUBSTRING(CONVERT(BINARY(4),IA.FAKE_IP,0),2,1) +

    SUBSTRING(CONVERT(BINARY(4),IA.FAKE_IP,0),1,1) AS IP_BINTREV

    Ah... got it. Because of your references to Little Endian, I thought you were talking about converting 255.255.255.255 to 255255255255 as an Integer, which requires 5 bytes to support the equivalent value of 3B6E6618D716. Instead, you're using the 4 byte octet conversion we were talking of previously. And, to be sure, 0xFF is only 255 (or -1 if you observe the sign bit) although it does contain 256 unique values (0-255 at the byte level).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (1/18/2016)


    Ah... got it. Because of your references to Little Endian, I thought you were talking about converting 255.255.255.255 to 255255255255 as an Integer, which requires 5 bytes to support the equivalent value of 3B6E6618D716. Instead, you're using the 4 byte octet conversion we were talking of previously. And, to be sure, 0xFF is only 255 (or -1 if you observe the sign bit) although it does contain 256 unique values (0-255 at the byte level).

    He he, the 256 was a typo, thanks for correcting.

    😎

    Using 4 Byte int for storing IPv4 is very efficient in terms of storage but slightly cumbersome on SQL Server as it does not understand unsigned anything (similar to the Royal Mail) apart from the tinyint. Storing the decimal representation of the address in 5 Bytes (binary(5)) or larger would certainly beat the purpose because of all the multiple conversions needed in order to use the data and the wasted space. There is also the common misconception that one has to use bigint for IPv4 but that is simply wrong, 4 bytes will always fit 4 byte data type.

    I would be very interested in seeing for instance any other method of storing IPv4 that performs as well when doing the common matching and searching operations one does on such a data.

  • The spackle articles would do well on a wiki.

    412-977-3526 call/text

  • robert.sterbal 56890 (1/19/2016)


    The spackle articles would do well on a wiki.

    You know, that's a great idea. Steve Jones was talking about "exercises" for folks and that would fit the bill.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Good article, thanks.

Viewing 7 posts - 31 through 36 (of 36 total)

You must be logged in to reply to this topic. Login to reply