Checking data from long query

  • Hi,

    I'm pretty newbies for SQL Querying, would like to know, when we have a very long SQL query with multiple table, and then we have some data missing which is expected to shown but not, how to check this ? I mean how to break this very long query but all inter-connected each other.

    I have a very long and complex query, by using SQL Formatter, I would have almost 900 lines (if I copied the query to Excel, just to see it more clear). It is contain many tables so I'm wondering how to check all this when I have some problem with the data. Like currently I have missing data, which in the main table (the primary table) it is exists, but with this very long query with join to many tables, the data is not coming out.

    And what is the technique or the activity of looking at this problem ? cause I want to take a look whether there is a specific courses about this only. If I search courses online, about SQL  or data analyst, typically I get the full course, from beginning like how to create table, how to join, sub querying, filtering, aggregates, stuff like that, which I know it a bit much. What I don't know is how to handle an issue when there is no data come out or the result is wrong from a very complex query. How to get the exact place within the long query, is the main problem that my data is not come out or wrong.

    Any advice is much appreciated. Also about what and where to look for such courses maybe in Udemy or any such online training.

    Thanks in advance.

     

     

     

     

     

     

     

  • Be methodical. Understand the relationships and data.

    Look for obvious contradictions in the where clauses -- where StartDate > '2021-08-19' AND StartDate < '2021-08-01' will filter out everything.

    Then look for typos -- mis-assignment of variables or parameters, misspelling of string constants or variables, use of > when it should have been <, etc. Look for use of wrong table. Look for use of wrong columns, particularly if there are many similarly-named columns. Invoice.OrderDate > Invoice.OrderDate will filter out everything -- Was it supposed to be Invoice.OrderDate > Product.AvailabilityDate?

    If there aren't obvious bugs, I recommend walking through the query one join and one where clause at a time. Inner join, cross join, and cross apply act like where clauses -- if the foreign key isn't in the joined table, the corresponding row in the other table won't be returned. Where clauses, obviously, can filter out data.

    I frequently start by commenting out pieces of the query.

    • Comment out the select list and select all (or preferably, if table is very big, select top N where value you're looking for exists)
    • Comment out joins
    • Comment out where clauses (maybe specifically leaving or adding where clause for the value you're looking for in the "main" table.
    • Familiarize yourself with the "main" table.
    • Run the query. Does it return the missing row?  If not, data isn't there, period.
    • Uncomment a join. Familiarize yourself with the  joined table. Does it return the missing row?  If not, that's the problem. If so, uncomment the next and repeat.
    • Uncomment first where clause. Does it return the missing row?  If not, that's the problem.  If so, uncomment the next and repeat.
  • Hmm.. yes for the typo error, I'm guessing no, because the query is came from my ERP, so it is being made by its UI then in the background I will get the underlying query. But I will take a look on that.

    About commenting is the part where I found difficulty, since it is connecting each other, with some of them actually same tables but with different aliases.

    There are also some sub query that has its own where clause, and as mentioned the where condition is linked or refer to other part of the query. That is why I wonder where to begin to commenting, break apart this giant query.

    Run the query just fine, in terms of there are some data (row) come out, but there are some were missing as well.

    By looking at your method, I supposed there is no other technique or some helper tools, then.

    Thanks

     

     

     

     

     

     

  • I use very similar methodology to what ratbak described. 9 times out of 10 (probably more) the reason the data is missing in my line of work is because it's being filtered out by an INNER JOIN, and the reason that JOIN is returning no rows is because someone forgot to fill something is that is "mandatory". I normally start by doing something along the lines of:

    1. Comment out the entire SELECT and put in a small one returning maybe 1 or 2 columns, such as the "main" Primary Key.
    2. Comment out everything in the FROM apart from the first table, and perhaps a couple others (if i know that they're definitely not going to affect the result)
    3. Comment out the entirety of the WHERE and add a clause specifically for the row I am looking for.
    4. Run the query, made sure the row returns (if not you've likely not commented out enough, or you have found your problem).
    5. Start un-commenting, one by one, clauses in the FROM and running the query again. Specifically you want to pay attention to INNER JOINs and CROSS APPLYs as these filter the data. If the row "disappears" after you un-comment a JOIN/CROSS APPLY then inspect the ON/WHERE for that one, as it's filtering out the row. Perhaps, like me, mandatory data is missing and needs to be completed.
    6. If you end up un-commenting all the clauses in the FROM, start with the clauses in the WHERE. Eventually the row will disappear, and you have your candidate.

    This can be quite lengthy, but it's not awful. All you're really doing is moving the block comment start point and rerunning the query over and over; after a few times of doing it you get quick fast. You also, if it's something you end up doing "often" get used to the usual candidates and check those first.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • A 900-line query? Generated from a system? I would say that more than one person on this list, would simply give up. I would certainly be inclined. ratbak and Thom A gave some good advice on methodology, but a 900-line query you did not write yourself?

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Yes 🙂

    My ERP's development has UI for building query, so we can just add table by drag and drop, then add its child, provided it is same level or hierarchy with all their relation and range condition. Then in the background, the result is a view in SQL DB.

    900 lines is after I put the view's query in some SQL formatter website, basically it will make the fields vertically align one by one, some with not only a single field, but a sub query or switch formula that also will vertical align. Stuff like that.

    Fortunately however I manage to find the cause, it's because I knew the system operates and table's name already, so I can guess. Just wondering if I'm want to be further as Data Analyst with any kind of ERP, because by then, I may not be familiar with the table's name and how it operates, so I cannot guess. I thought there is some kind of helper tools or method to identify the problem easily (more easy at least, in my case of 900 lines.. haha). But I guess what ratbak and Thom A mentioned is the only approach.

    Thank you for the response and the methods guidance, btw.

     

     

  • Just wondering if I'm want to be further as Data Analyst with any kind of ERP, because by then, I may not be familiar with the table's name and how it operates, so I cannot guess.

    I have think you have the most important tool right there. Without knowledge of the tables and the business domains, these problems are always a lot more challenging.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply