Counting distinct row numbers

  • At first this seemed simple but I have not been able to figure this out. I need to count the distinct city rows for all the cities offices currently in each state. Every month there are changes as offices close or new offices open. The sale data is aggregated into a flatened table with many columns including state and city. When I:

    SELECT [State]

    ,[City]

    ,COUNT(DISTINCT[City]) AS City_Cnt

    FROM [dbo].[MySalesDataTableAggregatedMonthly]

    WHERE [State] IS NOT NULL AND [City] IS NOT NULL

    GROUP BY [State], [City]

    ORDER BY [State], [City]

    I get as expected:

    State City City_Cnt

    AK anchorage 1

    AK fairbanks 1

    AK juneau 1

    AL anniston 1

    AL birmingham 1

    AL florence 1

    AL gadsen 1

    AL huntsville 1

    AL jasper 1

    AL tuscaloosa 1

    AZ flagstaff 1

    AZ phoenix 1

    AZ prescott 1

    AZ tucson 1

    AZ yuma 1

    .....

    What I need and can't figure out is a table returned that counts the distinct city rows

    State City City_Cnt

    AK anchorage 1

    AK fairbanks 2

    AK juneau 3

    AL anniston 1

    AL birmingham 2

    AL florence 3

    AL gadsen 4

    AL huntsville 5

    AL jasper 6

    AL tuscaloosa 7

    AZ flagstaff 1

    AZ phoenix 2

    AZ prescott 3

    AZ tucson 4

    AZ yuma 5

    .......

  • Looks like you want the number of rows in which (e.g.) Fairbanks appears? Then remove the DISTINCT.

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • No I need to count each city in each state. Removing distinct just gives the total count of cities. Nice guess.

  • Its not really counting distinct, so what I though you're looking to do is something like

    Select

    State

    ,City

    ROW_NUMBER() OVER(Order by City PARTITION BY State)

    From [dbo].[MySalesDataTableAggregatedMonthly]

    Where State is NOT NULL and City is NOT NULL

    Order by State,City

    I might have got the syntax wrong.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • dataman777 (10/10/2012)


    No I need to count each city in each state. Removing distinct just gives the total count of cities. Nice guess.

    create table dbo.state_city (us_state char(2), city varchar(20));

    insert into dbo.state_city

    values

    ('AK', 'Fairbanks'),

    ('AK', 'Fairbanks'),

    ('AK', 'Fairbanks'),

    ('CA', 'San Francisco'),

    ('CA', 'San Francisco'),

    ('MO', 'St. Louis');

    select

    us_state,

    city,

    count(city)

    from

    dbo.state_city

    group by

    us_state,

    city;

    Results:

    AK Fairbanks 6

    CA San Francisco 4

    MO St. Louis 2

    This is not what you're looking for?

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Oops - double inserted those rows 🙂

    But the idea's the same.

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Can we have some sample data, as we dont know what it looks like.

    As the requested output looks like a numerical increment based on the position in the set.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • To get the count of cities for each state you need to remove City from the select list.

    SELECT [State]

    ,COUNT(DISTINCT[City]) AS City_Cnt

    FROM [dbo].[MySalesDataTableAggregatedMonthly]

    WHERE [State] IS NOT NULL AND [City] IS NOT NULL

    GROUP BY [State]

    ORDER BY [State]

    This will give you each state and the count of cities in that state.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Jason's got it right. Look at his first post, that's the code you need.

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Actually the ROW_NUMBER() was the key. This worked correctly:

    SELECT [State],

    [City],

    ROW_NUMBER() OVER(PARTITION BY [State] ORDER BY [City]) AS [City_Cnt]

    FROM [dbo].[MySalesDataTableAggregatedMonthly]

    WHERE [State] IS NOT NULL AND [City] IS NOT NULL

    GROUP BY [State], [City]

    ORDER BY [State], [City_Cnt]

    Producing the needed result:

    State City City_Cnt

    AK anchorage 1

    AK fairbanks 2

    AK juneau 3

    AL anniston 1

    AL birmingham 2

    AL florence 3

    AL gadsen 4

    AL huntsville 5

    AL jasper 6

    AL tuscaloosa 7

    AZ flagstaff 1

    AZ phoenix 2

    AZ prescott 3

    AZ tucson 4

    AZ yuma 5

    .......

    Thanks for pointing me in the right direction and helping me figure this out!

    http://msdn.microsoft.com/en-us/library/ms186734.aspx

    C. Using ROW_NUMBER() with PARTITION

    The following example uses the PARTITION BY argument to partition the query result set by the column TerritoryName. The ORDER BY clause specified in the OVER clause orders the rows in each partition by the column SalesYTD. The ORDER BY clause in the SELECT statement orders the entire query result set by TerritoryName.

    USE AdventureWorks2012;

    GO

    SELECT FirstName, LastName, TerritoryName, ROUND(SalesYTD,2,1),

    ROW_NUMBER() OVER(PARTITION BY TerritoryName ORDER BY SalesYTD DESC) AS Row

    FROM Sales.vSalesPerson

    WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0

    ORDER BY TerritoryName;

    Here is the result set.

    FirstName LastName TerritoryName SalesYTD Row

    --------- -------------------- ------------------ ------------ ---

    Lynn Tsoflias Australia 1421810.92 1

    José Saraiva Canada 2604540.71 1

    Garrett Vargas Canada 1453719.46 2

    Jillian Carson Central 3189418.36 1

    Ranjit Varkey Chudukatil France 3121616.32 1

    Rachel Valdez Germany 1827066.71 1

    Michael Blythe Northeast 3763178.17 1

    Tete Mensa-Annan Northwest 1576562.19 1

    David Campbell Northwest 1573012.93 2

    Pamela Ansman-Wolfe Northwest 1352577.13 3

    Tsvi Reiter Southeast 2315185.61 1

    Linda Mitchell Southwest 4251368.54 1

    Shu Ito Southwest 2458535.61 2

    Jae Pak United Kingdom 4116871.22 1

  • I always get the Partition and Order by the wrong way round, so even when I get them the right way round I second guess myself and reverse them.

    This isnt a count, its the sequence number of the City within a state based on the order of the city.

    Id like to say its called a Partitioned Sequence but I could be making that up.

    anywho, glad to help.

    PS: In future if you could post sample DDL for the raw data it would help us a understand what you want a little better.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply