Query taking 6 hrs and counting (HELP!)- how to concatenate rows into single string

  • Here was the second part:

    Sergiy (6/8/2016)


    Declare @basedata ....

    ......

    select email ,

    STUFF((Select ','+project_name

    from @basedata as T1

    where T1.email=T2.email

    FOR XML PATH('')),1,1,'') from @basedata as T2

    GROUP BY email;

    _____________
    Code for TallyGenerator

  • And then was an instruction, right below the second part:

    Then run it altogether.

    Please note - I used GROUP BY instead of DISTINCT.

    You should do it too.

    "run it altogether" means place those 2 parts into 1 batch, one after another, and run it in 1 go.

    _____________
    Code for TallyGenerator

  • Luis Cazares (6/8/2016)


    There's something going on with your server. Taking 16 hours to complete a single query is insane.

    We're assuming this is a table, but if it's a view, that could be another problem. I just run the query on my laptop creating almost 8 million rows and the query completed in one minute.

    We need your help to get more information on where the problem might be. Phil asked for the execution plan and Kevin asked you to run sp_whoisactive. Those are good starts to find the root cause of your problem.

    There is nothing wrong with his server.

    His query completed in 46 seconds too.

    The problem is with DISTINCT.

    It's applied to the set returned by SELECT.

    It means - he's building an XML for all how many millions non-unique email addresses recorded in that "log" table, and then ask the server to return distinct strings.

    With GROUP BY XML's are built only once per each unique email.

    Much less work to do for the server.

    _____________
    Code for TallyGenerator

  • I mentioned that because he stated the following:

    Thanks, that works abit faster but its still very slow. 16hrs and no where near done

    Now, even with the DISTINCT, it shouldn't perform that bad. My bet is that the main problem is not in the code, but something we can't see.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (6/8/2016)


    I mentioned that because he stated the following:

    Thanks, that works abit faster but its still very slow. 16hrs and no where near done

    Now, even with the DISTINCT, it shouldn't perform that bad. My bet is that the main problem is not in the code, but something we can't see.

    Executing 2.6 millions correlated subqueries to build 2.6 millions XML strings?

    It can easily put the server out of memory. And after that everything becomes soooo slooooow.

    And assuming indexing is "not perfect" for this kind of queries, we're are looking on a table scan for each correlated subquery.

    2.6 millions scans against 2.6 millions records each.

    6 hours don't look so bad, really.

    _____________
    Code for TallyGenerator

  • Sergiy (6/8/2016)


    Luis Cazares (6/8/2016)


    I mentioned that because he stated the following:

    Thanks, that works abit faster but its still very slow. 16hrs and no where near done

    Now, even with the DISTINCT, it shouldn't perform that bad. My bet is that the main problem is not in the code, but something we can't see.

    Executing 2.6 millions correlated subqueries to build 2.6 millions XML strings?

    It can easily put the server out of memory. And after that everything becomes soooo slooooow.

    And assuming indexing is "not perfect" for this kind of queries, we're are looking on a table scan for each correlated subquery.

    2.6 millions scans against 2.6 millions records each.

    6 hours don't look so bad, really.

    Look at the original query:

    select distinct email ,

    STUFF((Select ','+project_name

    from dbo.[UMG sent 2016] as T1

    where T1.email=T2.email

    FOR XML PATH('')),1,1,'') from dbo.[UMG sent 2016] as T2;

    There is no distinction of any kind for project_name in the subquery.

    Same name will be added to the string as many times as how many times that project was reported to that email recipient over the year.

    Can you imagine the length of those strings?

    _____________
    Code for TallyGenerator

  • Sergiy (6/8/2016)


    Luis Cazares (6/8/2016)


    I mentioned that because he stated the following:

    Thanks, that works abit faster but its still very slow. 16hrs and no where near done

    Now, even with the DISTINCT, it shouldn't perform that bad. My bet is that the main problem is not in the code, but something we can't see.

    Executing 2.6 millions correlated subqueries to build 2.6 millions XML strings?

    It can easily put the server out of memory. And after that everything becomes soooo slooooow.

    And assuming indexing is "not perfect" for this kind of queries, we're are looking on a table scan for each correlated subquery.

    2.6 millions scans against 2.6 millions records each.

    6 hours don't look so bad, really.

    Except when my 4GB Intel Core i5 laptop would complete the 8 million rows query in 35 minutes. I would agree that sample data and workload aren't representative of the server, but neither should be the capabilities.

    In this same laptop, the GROUP BY option takes 4 minutes (the previous test was on my work laptop) just to give some perspective.

    I'm afraid that the table might be a complex view that would slow things down horribly. Another option is that the table is too wide and there's no proper index, so that causes slow table scans. Or something is taking all the memory from the server and the query won't finish. Or many other options that are just guesses right now.

    I understand your point, but this problem doesn't seem to be just on the code (at least not the code we have seen).

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (6/8/2016)


    but this problem doesn't seem to be just on the code (at least not the code we have seen).

    May be, may be not.

    Hard to tell from here.

    That's why I made up that query populating a table variable.

    It excludes all the text manipulations on denormalised data, and prepares the clean data set for further string concatenations.

    If it would be still slow - yes that would clearly indicate the issues with the server.

    But it took 46 seconds out of badly indexed 2.6 millions rows - well, could be better I must admit, but still acceptable.

    I'm pretty sure that the further string concatenations out of the table variable will take not more than several seconds. Dropping the execution time from 6 hours down to 1 minute without doing anything with the environment settings must be a good hint that the biggest part of the problem was actually in the code.

    _____________
    Code for TallyGenerator

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply