Query help - summing between dates ranges and calculations...

  • Hi all,

    Looking for assistance in a SQL query to use in a report.

    Finance people need to show “price variance” to our group owners and are currently spending a lot of time in Excel. I’m thinking/hoping I can get the data via a single SQL query.

    I can get the data out of only three tables -Customers, Invoices and Invoiceitems - but it’s the grouping and summing I’m not sure how to do.

    The output would be:

    Customer Number. From “Customers”

    Customer Name. From “Customers” and will join to “Invoices”

    Part Number. From “Invoiceditems” - which can join to "Invoices" - as is all of the data below…

    “Last year’s Price.” The sum of the qty * price within a date range

    “This year’s price” The sum of the qty * price within a different date range

    Qty shipped last year. Total qty of the part shipped grouped by customer within a date range

    Qty shipped this year. Total qty of the part shipped grouped by customer within a date range

    “A value for last year.” Last year’s price / by total qty within a date range

    “A Value for this year.” This year’s price / by total qty within a date range

    Example Output.

    Last year customer “A” bought part “ABC” 100 times on ten different orders and paid two different prices.

    This year customer “A” bought part “ABC” 75 times on 20 different orders and paid four different prices.

    I need it for ALL customers though. A nine column output for maybe a thousand customers and around 500 separate part numbers.

    Now, as I think about this and I have already discussed with colleagues, I see that the questions(s) are, er, not the ones necessarily I would want to ask. However, he who pays the piper, etc.

    Apologies in advance if I've put this in the wrong place and/or gone into too much detail.

  • This should work.

    Note that the code is untested since you did not provide actual table definitions and test data, but the principle should work.

    ;with

    thisYear as (

    -- Get summarized values for this year

    select

    c.CustomerID,

    c.CustomerName,

    ii.PartNumber,

    sum(ii.qty*ii.price) as SumPrice,

    sum(ii.qty) as SumQty,

    sum(ii.qty*ii.price) / sum(ii.qty) as SumValue

    from Customers c

    join Invoices i on i.CustomerID = c.CustomerID

    join InvoicedItems ii on ii.InvoiceID = i.InvoiceID

    where i.InvoiceDate between @StartDate and @EndDate

    group by CustomerID, CustomerName, PartNumber

    )

    , lastYear as (

    -- Get summarized values for last year

    select

    c.CustomerID,

    c.CustomerName,

    ii.PartNumber,

    sum(ii.qty*ii.price) as SumPrice,

    sum(ii.qty) as SumQty,

    sum(ii.qty*ii.price) / sum(ii.qty) as SumValue

    from Customers c

    join Invoices i on i.CustomerID = c.CustomerID

    join InvoicedItems ii on ii.InvoiceID = i.InvoiceID

    where i.InvoiceDate between dateadd(year, -1, @StartDate) and dateadd(year, -1, @EndDate)

    group by CustomerID, CustomerName, PartNumber

    )

    -- Join this year and last year to present corresponding values on the same line

    select

    coalesce(ty.CustomerID, ly.CustomerID) as CustomerID,

    coalesce(ty.CustomerName, ly.CustomerName) as CustomerName,

    coalesce(ty.PartNumber, ly.PartNumber) as PartNumber,

    ly.SumPrice as LastYearPrice,

    ty.SumPrice as ThisYearPrice,

    ly.SumQty as LastYearQty,

    ty.SumQty as ThisYearQty,

    ly.SumValue as LastYearValue,

    ty.SumValue as ThisYearValue

    from thisYear ty

    full join lastYear ly

    on ly.CustomerID = ty.CustomerID and ly.PartNumber = ty.PartNumber

    order by 1,2,3

  • Wow, Stefan, that was quick!

    Huge thanks for taking the time.

    I'll be looking at it over the next few days and will definitely report back.

    Thanks again. 🙂

  • Here's my version of it. I delayed the lookup of CustomerName to avoid having to GROUP on it -- GROUP BY is an especially expensive operation on varchar columns.

    I also get both years in a single SELECT to (try to) avoid double-reading of the tables.

    Finally, I added some code to get the distinct counts you mentioned in your original q.

    SELECT

    derived.customerId AS Customer_Id,

    c.name AS Customer_Name,

    derived.Last_Yr_Price,

    derived.Last_Yr_Qty_Shipped,

    derived.Last_Yr_Price / derived.Last_Yr_Qty_Shipped AS Last_Yr_Value,

    derived.Last_Yr_Times_Bought,

    derived.Last_Yr_Diff_Prices_Paid,

    derived.This_Yr_Price,

    derived.This_Yr_Qty_Shipped,

    derived.This_Yr_Price / derived.This_Yr_Qty_Shipped AS This_Yr_Value,

    derived.This_Yr_Times_Bought,

    derived.This_Yr_Diff_Prices_Paid

    FROM (

    SELECT

    i.customerId,

    ii.partNumber,

    SUM(CASE WHEN i.invoiceDate < @StartDate THEN ii.qty * ii.price ELSE 0 END) AS Last_Yr_Price,

    SUM(CASE WHEN i.invoiceDate < @StartDate THEN ii.qty ELSE 0 END) AS Last_Yr_Qty_Shipped,

    COUNT(DISTINCT CASE WHEN i.invoiceDate < @StartDate THEN i.invoiceId END) AS Last_Yr_Times_Bought,

    COUNT(DISTINCT CASE WHEN i.invoiceDate < @StartDate THEN ii.price END) AS Last_Yr_Diff_Prices_Paid,

    SUM(CASE WHEN i.invoiceDate >= @StartDate THEN ii.qty * ii.price ELSE 0 END) AS This_Yr_Price,

    SUM(CASE WHEN i.invoiceDate >= @StartDate THEN ii.qty ELSE 0 END) AS This_Yr_Qty_Shipped,

    COUNT(DISTINCT CASE WHEN i.invoiceDate >= @StartDate THEN i.invoiceId END) AS This_Yr_Times_Bought,

    COUNT(DISTINCT CASE WHEN i.invoiceDate >= @StartDate THEN ii.price END) AS This_Yr_Diff_Prices_Paid

    FROM dbo.Invoices i

    INNER JOIN dbo.InvoicedItems ii ON

    ii.invoiceId = i.invoiceId

    WHERE

    i.invoiceDate >= DATEADD(YEAR, -1, @StartDate) AND

    i.invoiceDate < DATEADD(DAY, 1, @EndDate)

    GROUP BY

    i.customerId,

    ii.partNumber

    ) AS derived

    INNER JOIN dbo.Customers c ON

    c.customerId = derived.customerId

    ORDER BY

    derived.customerId

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • I delayed the lookup of CustomerName to avoid having to GROUP on it -- GROUP BY is an especially expensive operation on varchar columns.

    OK, I suppose this could be a valid point. I dont think it makes much difference in this case though.

    I also get both years in a single SELECT to (try to) avoid double-reading of the tables.

    Unfortunately your code will read all data for all invoices from 1 year before @startdate to @enddate. This could be a big problem if you for example are looking at a single month.

    Assuming appropriate indexes my code will only look at invoices for the interesting time period this year and previous year.

    This could be a big performance win.

    Your code will also not work correctly when calculating values for the previous year.

    It is not enough to say WHEN i.invoiceDate < @StartDate.

    You should instead say WHEN i.invoiceDate < dateadd(year, -1, @EndDate)

    Your code will also produce a lot of lines with 0 values for products that where bought in the interval between the period this year, and the period previous year.

    In summary, I think my version will work a lot better in practice 🙂

  • Read the tables only once. Shamelessly nicked from both Scott's and Stefan's code:

    WITH Preagg AS (

    SELECT

    x.Period,

    c.CustomerID,

    c.CustomerName,

    ii.PartNumber,

    SumPrice = SUM(ii.qty*ii.price),

    SumQty = SUM(ii.qty),

    SumValue = SUM(ii.qty*ii.price) / SUM(ii.qty)

    FROM Customers c

    INNER JOIN Invoices i ON i.CustomerID = c.CustomerID

    INNER JOIN InvoicedItems ii ON ii.InvoiceID = i.InvoiceID

    CROSS APPLY (

    SELECT Period = CASE

    WHEN i.InvoiceDate BETWEEN @StartDate AND @EndDate THEN 'ThisYear'

    WHEN i.InvoiceDate BETWEEN

    dateadd(year, -1, @StartDate) AND dateadd(year, -1, @EndDate) THEN 'LastYear'

    ELSE NULL END

    ) x

    WHERE x.Period IS NOT NULL

    GROUP BY x.Period, CustomerID, CustomerName, PartNumber

    )

    SELECT CustomerID, CustomerName, PartNumber,

    LastYearPrice = SUM(CASE WHEN Period = 'LastYear' THEN SumPrice ELSE 0 END),

    ThisYearPrice = SUM(CASE WHEN Period = 'ThisYear' THEN SumPrice ELSE 0 END),

    LastYearQty = SUM(CASE WHEN Period = 'LastYear' THEN SumQty ELSE 0 END),

    ThisYearQty = SUM(CASE WHEN Period = 'ThisYear' THEN SumQty ELSE 0 END),

    LastYearValue = SUM(CASE WHEN Period = 'LastYear' THEN SumValue ELSE 0 END),

    ThisYearValue = SUM(CASE WHEN Period = 'ThisYear' THEN SumValue ELSE 0 END)

    FROM Preagg

    GROUP BY CustomerID, CustomerName, PartNumber

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Guys,

    Thanks for the continuing input. Really interesting.

    Stefan,

    Still at the testing stage, but using your script I now have a working solution.

    The "odd" column names are determined by our group owners.

    And my specific dates are for the testing phase. Once I move it into a report writer I'll add date parameters.

    -----------------------------------------

    ;with

    thisYear as (

    -- Get summarized values for this year

    select

    customers.id,

    customers.name,

    invoiceitems.partid,

    sum(invoiceitems.quantity * invoiceitems.currencyunitprice) as SumPrice,

    sum(invoiceitems.quantity) as SumQty,

    sum(invoiceitems.quantity * invoiceitems.currencyunitprice) / sum(invoiceitems.quantity) as SumValue

    from customers

    LEFT OUTER JOIN invoices ON

    customers.id = invoices.traderid

    LEFT OUTER JOIN invoiceitems ON

    invoices.id = invoiceitems.invoiceid

    WHERE invoices.tradertype = 'C'

    AND invoiceitems.partid NOT LIKE 'C%'

    AND invoices.id LIKE 'SI%'

    AND invoices.createddate BETWEEN '01 August 2011' AND '31 July 2012'

    Group By

    customers.id, customers.name, invoiceitems.partid

    )

    , lastYear as (

    -- Get summarized values for last year

    select

    customers.id,

    customers.name,

    invoiceitems.partid,

    sum(invoiceitems.quantity * invoiceitems.currencyunitprice) as SumPrice,

    sum(invoiceitems.quantity) as SumQty,

    sum(invoiceitems.quantity * invoiceitems.currencyunitprice) / sum(invoiceitems.quantity) as SumValue

    from customers

    LEFT OUTER JOIN invoices ON

    customers.id = invoices.traderid

    LEFT OUTER JOIN invoiceitems ON

    invoices.id = invoiceitems.invoiceid

    WHERE invoices.tradertype = 'C'

    AND invoiceitems.partid NOT LIKE 'C%'

    AND invoices.id LIKE 'SI%'

    AND invoices.createddate BETWEEN '01 August 2012' AND '31 July 2013'

    Group By

    customers.id, customers.name, invoiceitems.partid

    )

    -- Join this year and last year to present corresponding values on the same line

    select

    coalesce(ty.id, ly.id) as CustomerID,

    coalesce(ty.name, ly.name) as CustomerName,

    coalesce(ty.partid, ly.partid) as PartNumber,

    ly.SumPrice as PYTD_Ext_Price,

    ty.SumPrice as CYTD_Ext_Price,

    ly.SumQty as PYTD_Qty,

    ty.SumQty as CYTD_Qty,

    ly.SumValue as ASP_PYTD,

    ty.SumValue as ASP_CYTD

    from thisYear ty

    full join lastYear ly

    on ly.id = ty.id and ly.partid = ty.partid

    order by 1,2,3

  • Stefan_G (8/9/2013)

    I also get both years in a single SELECT to (try to) avoid double-reading of the tables.

    Unfortunately your code will read all data for all invoices from 1 year before @startdate to @enddate. This could be a big problem if you for example are looking at a single month.

    Assuming appropriate indexes my code will only look at invoices for the interesting time period this year and previous year.

    It is not enough to say WHEN i.invoiceDate < @StartDate.

    Easy enough to change the WHERE on the date to an OR with the two specific date ranges.

    If the @StartDate is the start of the current period -- which seemed to be what your code assumed -- why isn't that check enough to determine if the date is for the prior year or the current year??

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Easy enough to change the WHERE on the date to an OR with the two specific date ranges.

    If the @StartDate is the start of the current period -- which seemed to be what your code assumed -- why isn't that check enough to determine if the date is for the prior year or the current year??

    Well, your check WOULD have been enough if you had another WHERE condition so you only aggregated data from the interesting periods - not data between the period last year and this year. But the way your code was written, any data before the starting period this year would have been aggregated in the value for last year.

    After thinking about this a little more, I now actually think that your solution with a modified WHERE condition would be very good.

    (i.InvoiceDate >= @StartDate and i.InvoiceDate < @EndDate)

    or

    (i.InvoiceDate >= dateadd(year, -1, @StartDate) and i.InvoiceDate < dateadd(year, -1, @EndDate))

    Assuming there is an index on InvoiceDate, SQL server will translate a WHERE condition like this to two index seeks which is exactly what I was looking for with my solution.

  • Stefan_G (8/9/2013)


    Easy enough to change the WHERE on the date to an OR with the two specific date ranges.

    If the @StartDate is the start of the current period -- which seemed to be what your code assumed -- why isn't that check enough to determine if the date is for the prior year or the current year??

    Well, your check WOULD have been enough if you had another WHERE condition so you only aggregated data from the interesting periods - not data between the period last year and this year. But the way your code was written, any data before the starting period this year would have been aggregated in the value for last year.

    After thinking about this a little more, I now actually think that your solution with a modified WHERE condition would be very good.

    (i.InvoiceDate >= @StartDate and i.InvoiceDate < @EndDate)

    or

    (i.InvoiceDate >= dateadd(year, -1, @StartDate) and i.InvoiceDate < dateadd(year, -1, @EndDate))

    Assuming there is an index on InvoiceDate, SQL server will translate a WHERE condition like this to two index seeks which is exactly what I was looking for with my solution.

    I wouldn't expect the Invoices table to be clustered on InvoiceDate; if it's not, SQL may well can the table anyway unless there's a covering index.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply