Create a user-defined function

  • Anyone knows how to create a mathematical function for the following purpose:

    Problem: A table contains rows of a single field. We wish to determine the number of combinations of these chosen 2 at a time.

    examples:

    if the table contained rows a, b and c the answer would be three rows: ab, ac, bc

    if the table contained rows a, b, c and d the answer would be six rows: ab, ac, ad, bc, bd, cd

    if the table contains n rows, the answer has (n!)/(2*(n-2)!)

    if I have a table of 26 letters, the result set should contain 325 rows.

    How to create a user-defined function to generate the combinations of any arbitrary populated table?

    Thanks!

    Frank

  • In general the formula for a combination of M things taken N at a time is:

    C(M,N)=M!/(N!)(M-N)!

     

    For example:

    In numbers, C(5,3) = (5*4*3*2*1)/(3*2*1)(2*1) = 60/6 = 10. These numbers mean that there are 60 ways to order any three of five objects, and for each particular group of three there are six ways to order them; so the number of distinct groups of three out of five objects is 60 divided by 6.  

    Francis

  • Thanks. You are right about the formula. Any ideas how to write it in sql?

  • You will find it difficult to create a UDF for this as there are some limitations in SQL data types. For example, 21! is creater than the largest number a biginit in sql can hold ; 20! = 2432902008176640000.  Also if you use recusion, you are limited to 32 calls so theoretically even if you could store a number bigger than 21! and you used recusion in the function you couldn't calculate anything bigger than 32!  I can probably come up with a function with a bit of thought..  stay tuned. 

    Francis

  • You don't need a function to do this, just a simple self-join.

    SELECT a.Col + b.Col

    FROM Table a JOIN Table b ON a.Col < b.Col



    --Jonathan

  • Thank you very much Jonathan!

  • I missread the question.  I thought you wanted the number, not the actual rows.  That is I thought you wanted to supply the numbers 5, 2 and get the answer 10 and opposed to supply a table with5 rows and get a result set with 10 rows.  This was like one of the QOD that you just don't get even after reading it a couple of times.    Of course now its obvious.

    Francis

  • Thanks fhanlon.

     

    Johnathan, There is still a problem with simple self-join. If the table has duplicates, the total number of return will not match the binomial coefficient sequence by self-join. In my case, aa is legitimate for two rows a and a. Any ideas to count dups and combine them?

     

    Thanks.

    Frank

  • The "problem," then is not with the code but instead with your examples.  How about:

    SELECT a.Col + b.Col

    FROM Table a JOIN Table b ON a.Col <= b.Col



    --Jonathan

  • You will miscount unique rows if you use this script. It can't distinguish unique rows from dups. For example, you have five rows with values a, b, c, d and d. The return should be ten rows as:

     

    ------

    ab

    ac

    bc

    ad

    bd

    cd

    ad

    bd

    cd

    dd

     

     

     

    Using SELECT a.Col + b.Col

    FROM Table a JOIN Table b ON a.Col <= b.Col, the return is

     

    ------

    aa

    ab

    bb

    ac

    bc

    cc

    ad

    bd

    cd

    dd

    dd

    ad

    bd

    cd

    dd

    dd

     

    (16 row(s) affected)

     

    Using SELECT a.Col + b.Col

    FROM Table a JOIN Table b ON a.Col < b.Col, the return is

     

    ------

    ab

    ac

    bc

    ad

    bd

    cd

    ad

    bd

    cd

     

    (9 row(s) affected)

  •   Okay, now I finally know what you're trying to do here.

    In SQL, it's a good idea to have a primary key... 

    create table tab(

    col char)

    insert tab select 'a' union all select 'b' union all select 'c' union all select 'd' union all select 'd'

    SELECT IDENTITY(int) Id, Col

    INTO #Tab

    FROM Tab

    SELECT a.Col + b.Col

    FROM #Tab a JOIN #Tab b ON a.Col <= b.Col and a.Id < b.Id



    --Jonathan

  • Thanks Johnathan.

    In general, the issue is related to a binomial coefficient theory. The return rows should be nCm, read as n choose m. Here is the list of values:

     

    m         n            nCm

    ------- --------- -------

    2          2            1

    3          2            3

    4          2            6

    5          2            10

    6          2            15

    7          2            21

     

    Using you script can’t get correct returns. For example if a table has 6 rows and two pairs of dups, then return rows should be 15 as

     

    col 

    ----

    a

    b

    c

    d

    d

    a

     

    (6 row(s) affected)

     

        

    ----

    ab

    ac

    bc

    ad

    bd

    cd

    ad

    bd

    cd

    dd

    aa

    ab

    ac

    ad

    ad

     

     

    (15 row(s) affected)

    However, your script only returns 11 rows as follows

        

    ----

    ab

    ac

    bc

    ad

    bd

    cd

    ad

    bd

    cd

    dd

    aa

     

    (11 row(s) affected)

     

  • Okay, how about something like this?

    SELECT a.Col + b.Col

    FROM #Tab a JOIN #Tab b ON a.Id <> b.Id AND a.Col <= b.Col AND NOT (a.Col = b.Col AND a.Id > b.Id)



    --Jonathan

  • Thank you very much! You got all right. Appreciate ...

  • SELECT a.Col + b.Col

    FROM Table a JOIN Table b ON a.Col < b.Col

    UNION

    SELECT a.Col + b.Col

    FROM Table a JOIN Table b ON a.Col = b.Col

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply