Simple LIKE with wildcard

  • malleswarareddy_m (5/25/2010)


    ...

    select string from #t where string COLLATE CZECH_CI_AS like '_'

    the above you have given returns gives

    CH

    Ch

    ch

    ¢h

    four rows.

    ...

    I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).

  • Michael Poppers (5/25/2010)


    malleswarareddy_m (5/25/2010)


    ...

    select string from #t where string COLLATE CZECH_CI_AS like '_'

    the above you have given returns gives

    CH

    Ch

    ch

    ¢h

    four rows.

    ...

    I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).

    Your idea with DISTINCT is great. It makes the things little bit more funny with case-insensitive collations.



    See, understand, learn, try, use efficient
    © Dr.Plch

  • Nice one, and you got me with that question. I absolutely forgot about "ch" and went with the popular "0" answer :crazy:

    So-called "letter" ch can create some confusion because of the special alphabetical order it has. Anyone expects words beginning with "ch..." under the letter C in a dictionary/encyclopedia - but if it is a Czech one, you'll find them under a separate heading "CH", not even close to "C" (it is G, H, Ch, I...). IMHO this should be addressed in some future release of Czech orthography and we should declare, that "CH" is in fact two letters and standard ordering rules would apply from that time (in fact, it is long overdue... but I'm afraid no one will bother with it and we will be stuck with this "letter" forever).

  • Vladan (5/25/2010)


    Nice one, and you got me with that question. I absolutely forgot about "ch" and went with the popular "0" answer :crazy:

    So-called "letter" ch can create some confusion because of the special alphabetical order it has. Anyone expects words beginning with "ch..." under the letter C in a dictionary/encyclopedia - but if it is a Czech one, you'll find them under a separate heading "CH", not even close to "C" (it is G, H, Ch, I...). IMHO this should be addressed in some future release of Czech orthography and we should declare, that "CH" is in fact two letters and standard ordering rules would apply from that time (in fact, it is long overdue... but I'm afraid no one will bother with it and we will be stuck with this "letter" forever).

    I don't know our linguists (ÚPJC) but this will be very hard to do. Orthographic rules are very rigid and cannot be changed so easy. I remember last great changes in czech orthography or several years ago changes Germans made. It was always painful.



    See, understand, learn, try, use efficient
    © Dr.Plch

  • Thanks for the question. I bombed this one, I should of looked at the category or went with my gut, everything always "depends".

    ---------------------------------------------------------------------
    Use Full Links:
    KB Article from Microsoft on how to ask a question on a Forum

  • Great question

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I think of instead of 'It Depends' the last choice should have been collation dependent. By itself 'It depends' is not much of an answer.

    -- Mark D Powell --

  • honza.mf (5/25/2010)


    Michael Poppers (5/25/2010)


    malleswarareddy_m (5/25/2010)


    ...

    select string from #t where string COLLATE CZECH_CI_AS like '_'

    the above you have given returns gives

    CH

    Ch

    ch

    ¢h

    four rows.

    ...

    I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).

    Your idea with DISTINCT is great. It makes the things little bit more funny with case-insensitive collations.

    If all characters were inserted into DB as upper case how come query is returning lower case?

  • jlennartz (5/25/2010)


    If all characters were inserted into DB as upper case how come query is returning lower case?

    It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!

    You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.

    If someone uses case sensitive collation, he will have all characters in uppercase.



    See, understand, learn, try, use efficient
    © Dr.Plch

  • honza.mf (5/25/2010)


    If someone uses case sensitive collation, he will have all characters in uppercase.

    Sorry, ignore this reply and see my next reply :-). Thanks.

  • honza.mf (5/25/2010)


    If someone uses case sensitive collation, he will have all characters in uppercase.

    Sounds reasonable, but use COLLATE Czech_CS_AS (instead of the case-insensitive collation) -- SELECT DISTINCT will return three rows (instead of one), and one of the result rows appears (to my non-Czech eyes) to not be what in the Roman alphabet would be called "uppercase"!

  • honza.mf (5/25/2010)


    jlennartz (5/25/2010)


    If all characters were inserted into DB as upper case how come query is returning lower case?

    It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!

    You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.

    If someone uses case sensitive collation, he will have all characters in uppercase.

    Thank you for the answer it make sense, now. You won't get A-Z but whatever the Check equivqlent of their ascii number is.

    Jerry

  • honza.mf (5/25/2010)


    ...

    I don't know our linguists (ÚPJC) but this will be very hard to do. Orthographic rules are very rigid and cannot be changed so easy. I remember last great changes in czech orthography or several years ago changes Germans made. It was always painful.

    First of all: very nice question!! I just "lost" one point, but learned something new. I was aware of the collation issue but I thought 'A' to 'Z' didn't make room for it.

    Regarding our orthography changes (I'm German): at least it allows me to write the way I'd like to: most probably there was/is/will_be a time and place where the spelling I use was/is/will_be valid 🙂

    But it definitely makes it really hard to help your kids through (school) homework... Ooops, getting off topic. Sorry. Again: Nice QOTD!



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Michael Poppers (5/25/2010)


    honza.mf (5/25/2010)


    If someone uses case sensitive collation, he will have all characters in uppercase.

    Sounds reasonable, but use COLLATE Czech_CS_AS (instead of the case-insensitive collation) -- SELECT DISTINCT will return three rows (instead of one), and one of the result rows appears (to my non-Czech eyes) to not be what in the Roman alphabet would be called "uppercase"!

    Sorry, it was my mistake as I haven't checked it. Case sensitivity of a collation is used probably in equality comparisons only. Loops in example are controled by less or equal operator, and it says 'a' <= 'Z'



    See, understand, learn, try, use efficient
    © Dr.Plch

  • honza.mf (5/25/2010)


    jlennartz (5/25/2010)


    If all characters were inserted into DB as upper case how come query is returning lower case?

    It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!

    You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.

    If someone uses case sensitive collation, he will have all characters in uppercase.

    Quite correct. The actual number of rows in #t with a default installation is 20449. To get the 26 by 26 result requireswhile ASCII(@a1) <= ASCII('Z') begin, etc.

    This QotD turned out to be much more interesting than I initially thought it would.

    Thanks for the lesson!

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers

Viewing 15 posts - 31 through 45 (of 55 total)

You must be logged in to reply to this topic. Login to reply