Problems with Group BY Clause

  • Hi,

    I'm querying a database for flights to a specific destination, but I want only 1 record for each departure airport to be displayed. As opposed to multiple fares from a departure airport.

    For e.g.

    This is what I want to return:

    Glasgow - Barcelona £...

    Manchester - Barcelona £...

    Birmingham - Barcelona £...

    As opposed to :

    Glasgow - Barcelona £...

    Glasgow - Barcelona £...

    Manchester - Barcelona £...

    Manchester - Barcelona £...

    The group by clause doesn't allow me to be distinct with the departure airports. I have posted my SQL query below, if anyone has any tips, pointers or can point me in the right direction it would be greatly appreciated.

    Many thanks

  • One or more of the columns in your group by has different values for one departure airport. Can't say which one without seeing a sample output from your query.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    Many Thanks

  • Ok, so if we look at your Gatwick - Amsterdam lines. They differ in the DEP_DATE and the RtnDEP_DATE. Those two columns are included in the group by. Hence the group by returns 2 rows, because some of the data you're grouping by is different.

    If you only want 1 line per airport, which dates do you want to see? The highest, lowest, one corresonding to the cheapest price, any one at random?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • It would probably be the one corresponding to the cheapest price, but it would be handy to know how to display any one at random also.

    Thanks again

  • I'll have a go at something for you a bit later. Can you post the schema of the tables please?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    Many Thanks

  • Could you possibly post it as a create table please? I want to have a test table to try the query out on, and it's going to take too long to turn that list into a table def.

    Thanks.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • No problem Gail,

    Thanks

  • How does this look?

    SELECT depAir.Name AS departureAirportName,

    dest.Name AS destination,

    TotalFlights,

    Offers.ARR_AIRPORT,

    Offers.DEP_AIRPORT,

    LowestPrice,

    LowestPrice/NO_ADULTS AS Price,

    Offers.DEP_DATE,

    Offers.RtnDEP_DATE,

    Offers.NO_ADULTS,

    DATEDIFF(d, Offers.DEP_DATE, Offers.RtnDEP_DATE) AS duration

    FROM cached_offers_flights Offers INNER JOIN

    (SELECT DEP_AIRPORT, ARR_AIRPORT, COUNT(*) AS TotalFlights,

    MIN(CAST(STANDARD_PRICE AS NUMERIC(7,2))) AS LowestPrice

    FROM cached_offers_flights GROUP BY DEP_AIRPORT, ARR_AIRPORT) CheapestOffer

    ON Offers.DEP_AIRPORT = CheapestOffer.DEP_AIRPORT AND Offers.Arr_AIRPORT = CheapestOffer.Arr_AIRPORT

    AND Offers.STANDARD_PRICE = CheapestOffer.LowestPrice

    INNER JOIN Airport AS depAir ON Offers.DEP_AIRPORT = depAir.Code

    INNER JOIN Airport AS destAir ON Offers.ARR_AIRPORT = destAir.Code

    INNER JOIN City AS dest ON destAir.fkCity = dest.ID

    WHERE (dest.Name LIKE 'a%')

    AND (Offers.DATETIME > '2008/05/08 12:00:00')

    AND (CAST(Offers.DEP_DATE AS datetime) >= '2008/05/16')

    AND (Offers.Website = 'www.test.com')

    ORDER BY Price

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The two Stansteds are because they have different destinations. You want only 1 row per departure airport, regardless of where the flight is going?

    I think the 2 rows for Gatwick are because there are 2 entries with same, lowest, price. If that case does happen, do you want to show the earliest one?

    Edit: To confirm, you are using SQL server 2005?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The two Stansteds are because they have different destinations. You want only 1 row per departure airport, regardless of where the flight is going?

    Yes that would be great.

    I think the 2 rows for Gatwick are because there are 2 entries with same, lowest, price. If that case does happen, do you want to show the earliest one?

    Yes please 🙂

    Thanks again Gail

  • Ok, maybe this one...

    SELECT depAir.Name AS departureAirportName,

    dest.Name AS destination,

    TotalFlights,

    Offers.ARR_AIRPORT,

    Offers.DEP_AIRPORT,

    LowestPrice,

    LowestPrice/NO_ADULTS AS Price,

    Offers.DEP_DATE,

    Offers.RtnDEP_DATE,

    Offers.NO_ADULTS,

    DATEDIFF(d, Offers.DEP_DATE, Offers.RtnDEP_DATE) AS duration

    FROM (SELECT ARR_AIRPORT, cached_offers_flights.DEP_AIRPORT, NO_ADULTS, DEP_DATE,

    RtnDEP_DATE, [DATETIME], Website, TotalFlights, LowestPrice,

    ROW_NUMBER() OVER (PARTITION BY cached_offers_flights.DEP_AIRPORT ORDER BY DEP_DATE) RowNo

    FROM cached_offers_flights INNER JOIN

    (SELECT DEP_AIRPORT, COUNT(*) AS TotalFlights,

    MIN(CAST(STANDARD_PRICE AS NUMERIC(7,2))) AS LowestPrice

    FROM cached_offers_flights GROUP BY DEP_AIRPORT) CheapestOffer

    ON cached_offers_flights.DEP_AIRPORT = CheapestOffer.DEP_AIRPORT AND cached_offers_flights.STANDARD_PRICE = CheapestOffer.LowestPrice) Offers

    INNER JOIN Airport AS depAir ON Offers.DEP_AIRPORT = depAir.Code

    INNER JOIN Airport AS destAir ON Offers.ARR_AIRPORT = destAir.Code

    INNER JOIN City AS dest ON destAir.fkCity = dest.ID

    WHERE (dest.Name LIKE 'a%')

    AND (Offers.DATETIME > '2008/05/08 12:00:00')

    AND (CAST(Offers.DEP_DATE AS datetime) >= '2008/05/16')

    AND (Offers.Website = 'www.test.com')

    AND RowNo = 1

    ORDER BY Price

    If this doesn't do what you want, please give me some sample data for the cached_offers_flights table (in the form of insert statements)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for posting that, gives me the results that I want.

    I've never used the sql snippet below before:

    ROW_NUMBER() OVER PARTITION BY

    so it's gave me something to look into as well...

    Thanks again for your help Gail

    Cheers Mark 😀

  • Hi Gail,

    Unfortunatley the query I thought was working the way I wanted it seems to not pull out all the flights available to the various destinations for each departure airport.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply