Distinct Count for Data Mart

  • note: I posted this on the DW forum, but it seems like that forums is not that popular.. Im posting here trying to get an advice from the Experts.. Thank you

    I have a process in SQL 2005 that runs every night. This process populates my dimensional database (4 fact tables and 9 dimensional tables). Also, this process does all the aggregates that I will need for reporting. I decided to do it on SQL because I’m faster coding this way that in SSIS (for now).

    I found a “bug” in my process where I’m not sure how to fix it. One of my Calculations during the process is to get the distinct count of a product during the entire period of time where the product was present. Other words, if my HD Box was present (or running) during the period between April 1st to April 30th I should have a distinct count of 1 time, even if the HD Box run every day. But, the dimensional database has a dimensional date.

    My fact Table will have a total of distinct count by day. If my HD Box runs every day it will count a distinct by day. When I run the report to get the sum of the distinct count by the period of time given; I will get count duplicates because my HD Box was count by day.

    Is there anyone can advice me about this kind of matter?

    Thank you in advance

  • I think I am following you but to be sure would the count ever be more than 1? if so can you give an example of when it woul dnot be 1. It seems to me that if you are only checking for a count over a period of time and counting if it showed up during that time then the count would always be 1. so I think I am missing something in the example.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • You are right. I'm missing one part of the problem. Each HD Box can be in different category. so the daily process "group by" de boxes in different categories., so a box can repeat in different group.. But in a daily basis the number is no the same if I do the same for a period of time.. am I clear ?

  • so would it be fair to say you are looking for the number of groups a box is in during a time period?

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • Exactly.. how can I do it in daily process to populate my fact table..

  • one last question are you updating the fact table from a fact table or are you pulling the data from a SQL database and then inserting into the data wharehouse. If it is the latter then I can provide hopefully a SQL script that would do what you need.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • Im pulling my data from SQL database inserting into the data warehouse. I update if the total change if i process the same day.

    Thank you 🙂

  • Assuming I am not over simplifing what you are looking for and assuming I have mocked up data that is a representation of what you are looking for then this should work for you.

    declare @data table

    (boxID varchar(2),GroupID varchar(1), date datetime)

    insert into @data (boxID,GroupID,date) values('A1',1,'06/01/10')

    insert into @data (boxID,GroupID,date) values('A2',1,'06/01/10')

    insert into @data (boxID,GroupID,date) values('A1',2,'06/01/10')

    insert into @data (boxID,GroupID,date) values('A2',2,'06/01/10')

    insert into @data (boxID,GroupID,date) values('A1',3,'06/01/10')

    insert into @data (boxID,GroupID,date) values('A2',3,'06/01/10')

    insert into @data (boxID,GroupID,date) values('A1',1,'06/02/10')

    insert into @data (boxID,GroupID,date) values('A2',1,'06/02/10')

    insert into @data (boxID,GroupID,date) values('A1',2,'06/02/10')

    insert into @data (boxID,GroupID,date) values('A2',2,'06/02/10')

    insert into @data (boxID,GroupID,date) values('A1',3,'06/02/10')

    insert into @data (boxID,GroupID,date) values('A2',3,'06/02/10')

    select t.boxID, count(t.boxID) from

    (Select boxID,groupID from @data

    group by boxid,groupID) t

    group by t.boxID

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • My problem is that in my FactTable I dont have the BoxId, because there are million of boxes present for category. In my Fact Table I have my categories or Keys like:

    DateKey

    ClientKey

    ZoneKey

    ChannelKey

    Total_DistinctCount

    so my daily process populate the Total_DistinctCount by the Keys the Fact Table has.

    :crying:

  • Sorry I'm confused. What would you be storing in the fact table? If you do not have a boxID then how are you storing the fact? do you just need a total count of all boxes? that coul be produced with a simple modification to the query in my example but I am not sure what you need.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • Yes, in my Fact Table I only storage totals by category (or Keys).. My Report is coming from the FactTable. am I clear?

  • So you have a fact table with a set of totals by day. you want to get a count over time based on that data. So using my example you would have a count of 6 for each day. If I understnad you correctly you want to get a count across a time period from that information but since server a could be in each total that would not be accurrate across a time range.

    If this is correct then I would have to say you can't. Each total could store any number servers and combinations of ID's and you would not be able to determine an accurrate information without storing the box id as part of the fact.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • that what I thought ! :crying:

    I was trying to avoid to storage the BoxId becasue I will have millions of rows..

    thank you tho 🙂

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply