How to combine two rows as one row for transactional Data

  • Hi,

    Because of the application issue, we are seeing partial duplicate row, how to combine duplicate row to original row.

    Ex
    Transactional Sales Table:

    Create table Sales
    (
    Id int identity (1,1),

    OrderID int,

    ProductName varchar(50),

    Date datetime default getdate(),

    SalesAmount int
    )

    Sample Data:

    Insert into dbo.Sales (OrderId,ProductName,SalesAmount)

    select 1001,'Computer',50 union all

    select 1002, 'Computer',0.1 union all    ----- It should have been 50.1 for item 1001 

    select 1003 ,'KeyBoard', 15 union all

    Select 1004 , 'Mouse', 8 union all

    Select 1005,'Computer',80 union all

    Select 1006,'computer',0.5 ----- It should have been 80.5 for item 1005 

    select * from Sales

    -- Cross check 

    select ProductName,Count(ProductName) as NumberofItemsSold,Sum(SalesAmount) TotalSales FROM sales
    GROUP BY ProductName

    From the above query the company had sold only two computers, but there was issue with front end systems,  the fraaction value of salesamount for computer product consider as two sales instead of one sale item, so Ideally we are looking to cleanse the data as follows 

    1001,'Computer',50.1 
    1003 ,'KeyBoard', 15 
     1004 , 'Mouse', 8 
     1005,'Computer',80.5 

    How do I combine rows to achieve the above output, It would be great if someone had dealt with this kind of issues.

    Many Thanks

  • How do you tell that the value is duplicated, is it only is the following ID has a decimal value between >0 and <1?

    If so, you declare you column, SalesAmount, as an INT and then insert a decimal value. This will result in the value 0. I this correct? If so, where is the actual decimal value stored? If not, please update your statement and check it works as you expect.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Assuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:
    WITH NextRow AS (
      SELECT *,
       LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount
      FROM Sales)
    UPDATE Sales
    SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount
    FROM NextRow NR
    WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1
    AND NR.OrderID = Sales.OrderID;
    GO
    SELECT *
    FROM Sales;
    GO
    DELETE FROM Sales
    WHERE SalesAmount > 0 AND SalesAmount < 1;
    GO
    SELECT *
    FROM Sales;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • sorry the table creation script is bit wrong, it should have been float instead of int.
    Create table Sales
    (
    Id int identity (1,1),

    OrderID int,

    ProductName varchar(50),

    Date datetime default getdate(),

    SalesAmount float
    )

    We identify its is  duplicate when there the value is between 0 and 1 and also using lag function whether the previous row product name is same as current as current row. 

    With following query I could able to identify which rows are duplicate or not

    Query to Identify Duplicates;
    select case when SCAS_DW.dbo.Sales.SalesAmount between 0 and 1 and lag(ProductName,1) over (order by ORDERid) = ProductName then 'Duplicate' else 'Not Duplicate' End as Validation
    ,*
    from Sales

    But I am struggling how to join two rows as one row sum sales amount and ignore the row of duplicate.

  • Thom A - Monday, July 17, 2017 4:00 AM

    Assuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:
    WITH NextRow AS (
      SELECT *,
       LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount
      FROM Sales)
    UPDATE Sales
    SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount
    FROM NextRow NR
    WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1
    AND NR.OrderID = Sales.OrderID;
    GO
    SELECT *
    FROM Sales;
    GO
    DELETE FROM Sales
    WHERE SalesAmount > 0 AND SalesAmount < 1;
    GO
    SELECT *
    FROM Sales;

    you are star

  • Thom A - Monday, July 17, 2017 4:00 AM

    Assuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:
    WITH NextRow AS (
      SELECT *,
       LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount
      FROM Sales)
    UPDATE Sales
    SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount
    FROM NextRow NR
    WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1
    AND NR.OrderID = Sales.OrderID;
    GO
    SELECT *
    FROM Sales;
    GO
    DELETE FROM Sales
    WHERE SalesAmount > 0 AND SalesAmount < 1;
    GO
    SELECT *
    FROM Sales;

    Hi Thom,

    I am glad for your post, If we want the same output using above table as view instead of updating and deleting actual table, could you refine you query and provide the answer, Before I post it now, I have tried using CTE's and subqueries but I am not able to achieve it. Could you please help me here. 

    Many Thanks

  • Sangeeth878787 - Monday, July 17, 2017 6:46 AM

    Hi Thom,

    I am glad for your post, If we want the same output using above table as view instead of updating and deleting actual table, could you refine you query and provide the answer, Before I post it now, I have tried using CTE's and subqueries but I am not able to achieve it. Could you please help me here. 

    Many Thanks

    One method (excuse my paste from VS Code):

    WITH NextRow AS (
      SELECT *,
       LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount
      FROM Sales)
    --UPDATE Sales
    --SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount
    SELECT NR.ID, NR.OrderID,
       NR.ProductName, NR.[Date],
       NR.SalesAmount + CASE WHEN NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1 THEN NR.NextSalesAmount ELSE 0.00 END AS SalesAmount
    FROM NextRow NR
    WHERE NOT (NR.SalesAmount > 0 AND NR.SalesAmount < 1)
    ORDER BY OrderID ASC;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply