Large Chunks of Data

  • The reason I remembered these incidents is because we used to deal with huge stacks of paper as thick as the one shown in the image of the line printer on the editorial. Those printers sure made a lot of noise, but were lightning fast!

    I happen to sit next to one of these. Maybe I'm picky but after awhile with it buzzing it feels like Chinese water torture :w00t: I remember a colleague once printing hundreds of pages for a report. When I asked him about it he apologized and said he mistakenly printed a 35 page report 30 times :hehe:

    Ken

  • AndrewJacksonZA (9/17/2010)


    tim.kay (9/17/2010)A nice graphic helps - though not too busy.

    Must... resist... must... fight it... must not post...

    You mean like this:

    C'mon, it's Friday! 🙂

    Business Intelligence WIN!

  • The only practical way to report off of 5 million rows is to pull it into an OLAP cube, preferably on a different server or their local PC, and then use something like a pivot table. Most moderately powered servers can handle a request like this occasionally, but you definately don't want the reporting tool to re-query the same 5 million rows again every time the user wants to sort, filter, or pivot the data.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Our reporting tool regularly produces reports over 1,000 pages. The largest we have seen at client sites are 2,400 pages. However, these are not printed reports - that is, most clients use them through our report viewer which allows a user to quickly get to any granular detail they need - and indeed, most need that ability to hunt for one or two lines of detail.

    As well, we separate reports into two blocks; User and Executive. These reports are pretty much identical, but the Executive reports skip all the detail and just show totals and percentages.

    This all said, one thing I have learned about most reports is that as soon as you print them (if you do) they are outdated when you have applications where the source data for the report is changing all the time (as it is with our apps).

    When you are working with massive datasets, the question of "How long can a report be?" has to be contingent on whether or not you are going to print or just view report data. If its just being viewed, something like a subset of much larger data that a user can then query - the answer is infinite I guess.

    I think overall, the days of presuming that the term "report" means "printed report" are long gone. With today's often massive datasets, viewing reports as a well-presented summary of data is often the case. Not to mention one can feel good about saving a few trees in the bargain...

    There's no such thing as dumb questions, only poorly thought-out answers...
  • I think it's important to distinguish between reports which are meant to be looked at as a unit and possibly printed, as opposed to a dump of data that finance is going to use for some auditing purpose. I also have finance customers who demand reports that have a ridiculous number of rows and columns.These get exported to Excel and then they have at them. I spend no time in formatting or making anything look pretty or readable (for all they care the font can be 2 points). For these wide reports I use Crystal with a plotter printer that can essentially be infinitely wide.

  • 5million rows is not a report - that is a hard copy of the table(s).

    Now, if there are calculations on those rows and a summary of those 5 million rows is presented - that sounds like a report.

    Reports should be summaries - not novels.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Old style reports such as many of us are used to in crystal or reporting services etc. generally have no business dealing with 5mm rows.

    HOWEVER, there are many newer bi tools (data visualization tools), one of which we are using, where we regularly deal with sets larger than 5mm rows. The questions people are exploring are generally not sales etc. questions, though. The tools permit users to visually explore datasets in realtime, and are for us a very definite step forward in technology. I am finding that as regards these tools, 5mm rows is sometimes very much on the small side.

  • dld (9/17/2010)


    I think it's important to distinguish between reports which are meant to be looked at as a unit and possibly printed, as opposed to a dump of data that finance is going to use for some auditing purpose. I also have finance customers who demand reports that have a ridiculous number of rows and columns.These get exported to Excel and then they have at them. I spend no time in formatting or making anything look pretty or readable (for all they care the font can be 2 points). For these wide reports I use Crystal with a plotter printer that can essentially be infinitely wide.

    You have a lot of good points, but what do they need hard copies for? I used to work for a company that had huge clients and, while I won't mention the company's name, their report was a few hundred pages long every month and for years we had to send them paper reports. They finally changed to a file, but some companies just don't trust computer files yet.

    Also, if the files are this big and a hard copy isn't needed, I don't believe that Crystal or SSRS should be involved. In a data dump situation, it would be best to create a SP that just spits out the data into a file.

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • Reports can have different purposes, but one thing I haven't seen in the replies is the use of visualization to detect patterns and outliers. This is at the heart of much preliminary statistical analysis. Our eyes can quickly detect complex patterns in multivariate data if it presented in a visual way. Of course, the row- and column- summaries will always have a place. Two good modern references are

    Cleveland, W.S. 1993. Visualizing Data. Hobart Press.

    Tufte, E.R. 1983. The Visual Display of Quantitative Information. Graphics Press.

  • There probably is a better tool than Crystal, but we use that simply because it interfaces nicely with our ERP software and we know how to use it. I must admit, I don't know what an SP is.

  • dld (9/17/2010)


    There probably is a better tool than Crystal, but we use that simply because it interfaces nicely with our ERP software and we know how to use it. I must admit, I don't know what an SP is.

    Sorry, SP - abbreviation for SQL Server Stored Procedure.

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • Also, remember that there are legal requirements for may reports that have to be printed and kept in hard copy for years. Ask any pharmacist or payroll accountant...

  • @AndrewJacksonZA - I don't know what is more disturbing, just seeing the name "Rick Astley" :sick:, or that you know the words? 🙂

    Anyway, this is an old problem. When faced with these requests I always ask, "what will 50,000+ row, 50+ column report tell you?" That is just too much data to be useful, not to mention it is difficult to display on a screen or print, if not impossible.

    There is a difference between a report and an export. Sometimes, users just want to bring the 50,000+ rows into Excel and use it there. That is great. Excel is a powerful BI tool. I believe in pushing the data out to the people who need it, but this is an export and does not need to be an on demand query to that drags down the db and the network to deliver a "report" that is meaningless.

    Reports should aggregate many data points to fewer meaningful data points. In other words, reports need to tell you something with as little data as possible. Aggregate reports should also not contain little detail data, if any at all. As Steve points out, "that's why we have drill-down." Often I find users wanting the "Report that tells you everything."

    As a DBA, if find it it useful to ask more questions about what the user is trying to accomplish with the report. Sometimes giving them what they need, rather than what they want, produces a better report.

  • We have the same situtation with a large reporting database and we have a large subscription service that typically sends out large data files to user. This happens for two reasons: one the system is not giving the users what they need. Two the users do not always know what they need but want the data readily available in a format that allows them to easily manuipulate it - usually Excel. I believe both of them can be solved by IT folks making the time to sit with the users and try to uncover what it is they want to standardize some of their reporting needs and build it into their systems. However, since I have a finance background I can say there are two problems with that. One is that reason people don't know what they need is because the nature of the business world is always changing and what is in focus on any given Monday morning may change and be entirely different the following week. That means a system has to be flexible enough to give the users what they need in the format they need it and that usually involves some type of report generator that the users can build their own reports, SSRS or some other vendors product. There are issues with that too as someone pointed out with regards to inexperienced users creating inefficient reports. But if that is a problem then educate them or do it for them. Doing it for them means that if a manager comes to a financial analyst on Monday morning and wants information about something that the reporting system or database cannot provide easily IT has to be responsive as most senior managers don't want to wait and don't want to hear excuses aboutt why it is taking so long to get the information. So you end up with users wanting large amounts of data at their fingertips for those types of situations. The problem is not the users but how businesses are run and the expectations of managers. Many folks in IT have to manage the system to keep them running efficiently and it is not an easy task but expecting users to manage down their expectations or adjust their needs to suit IT is wrong. Many people have the word analyst in their job title or job description. IT folks need to spend more time understanding the needs of the business and how it works and working with the users to provide what they need. It isn't easy but the users don't exist to give purpose to the lives and work of IT folks. Perhaps it is time to sit down and figure out what their real information needs are and find a way to be responsive when those needs change.

    Steve

  • Since we seem to have a consensus, let me try to take this a different direction (sorry Steve)

    All of you who are using online tools instead of printed reports, which tool(s) are your favorite?

    My holy grail would combine pivot tables with drill down in a Web front end to our SQL data. I'd be able to create a flat view on data, open it up to users, let them pivot, query and click as they wished. It would support row-level security, perhaps at the view level, so users would only see their "world" in the report. It would let users pull exports of subsets of data offline as they wish. It would take me small amounts of effort to create the views for users. And it would cost in the $10K range... not $100K.

    I haven't found it yet. I've played with SharePoint and Report Server, and Excel front end to OLAP, but they don't quite do the job. Tableau seems awfully close but I haven't had the chance to install and try to make it work. What else should I be looking at?

    PS: User configurable Dashboards would be nice too. Maybe I need two products?

Viewing 15 posts - 16 through 30 (of 49 total)

You must be logged in to reply to this topic. Login to reply