Performance improves after reboot

  • RVO - Thursday, February 23, 2017 1:47 PM

    Jeff,
    "...SQL will probably be exclusively hitting that one - RVO could check by looking at the virtual file stats DMV to see where all the IO is...."

    Can you please give me a query to find virtual file stats for tempdb files?

    At a basic level on that DMV you can just to:
    select * from sys.dm_io_virtual_file_stats(2,NULL) ;

    The first argument is the database id, If you want to capture for all DBs, just set it to NULL. 2 is tempdb.

    Usually you want to get baseline and then capture. You can just insert into a table( I'd do a static table for now just to stay out of tempdb)  for the baseline and then insert the same into the table, whenever. If you want to see over a half hour, run the baseline and then just insert again 30 mins later...that kind of thing.
    I just did this one for you which should work...find some database to create the table, and then just execute the insert into. Play around with it, see if that works for you and let us know if it doesn't. You can just truncate the table after playing around before you start doing someone's job for them...oh sorry I mean before you want to capture for analyzing. No doubt you'll get the gist - it's just grabbing tempdb which you can change:

    CREATE TABLE TempDB_IO
    (DBName sysname,
    file_id smallint,
    num_of_reads bigint,
    num_of_bytes_read bigint,
    io_stall_read_ms bigint,
    num_of_writes bigint,
    num_of_bytes_written bigint,
    io_stall_write_ms bigint,
    io_stall bigint,
    size_on_disk_bytes bigint,
    CaptureDateTime datetime)

    INSERT INTO TempDB_IO
    SELECT db_name(database_id) as DBName,
    file_id,
    num_of_reads,
    num_of_bytes_read,
    io_stall_read_ms,
    num_of_writes,
    num_of_bytes_written,
    io_stall_write_ms,
    io_stall,
    size_on_disk_bytes,
    GETDATE()as CaptureDateTime
    FROM sys.dm_io_virtual_file_stats(2,NULL) ;

    Sue

  • cphite - Thursday, February 23, 2017 1:48 PM

    RVO - Wednesday, February 22, 2017 8:01 AM

    Performance did not get substantially worse yet since last reboot on January 29th.
    My guess one of these things could help:
    ----We ran UPDATE STATISTICS 2-3 times since last reboot
    ----Moved Transaction Log file of one of busy databases to a proper drive (before it was sitting together with data files)

    Maybe I was wrong when said performance gradually gets worse.
    I'm afraid it might be pretty sudden.
    Like 2-3 days of slower than usual and then big slowdown.
    We don't see a big Disk latency now. Like we saw last time before reboot.
    Overall it stays on the same level it was right after reboot.

    Noticed the thing about performance being suddenly worse instead of gradually...  

    We had an issue a few months back with one of our servers, where the VM host was writing a journal file - it basically tracks all of the changes that have occurred on the disk since the last snapshot.  Anyway, once the file would get to a certain size, things would very abruptly slow to a crawl on the SQL instance.  Our hardware team explained that it was because on the storage it was having to scan through this huge file for basically every transaction.  Sorry if that's vague but I'm not a hardware storage guy.  In any event, they disabled the journal and we haven't had the problem since.

    That's a good point, as well.  I've seen some might strange things along that line.  We had a tried and true job on an AS 400 that normally only took an hour to run.  One morning, it changed to 6 hours.  No code changes were made.  Nothing was rebooted.  No large amounts of data were imported, It was crazy... after a week, they gave up and called IBM for some help.  To make a longer story shorter, the battery on the cache for the box had finally given up the ghost.  They replaced it without even taking the machine down and performance instantly returned to normal.

    So, yeah... I agree.  It could be something like you say.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks for a script guys.
    Here are the results.
    Should I focus on columns like
    num_of_bytes_read ?
    What exactly these columns represent?

  • Your eye is on the wrong ball here. Not sure how long this server has been up, but queries have waited almost ONE FULL DAY PER TEMPDB DATA FILE!!!! You simply cannot be successful when your CPUs (which do BILLIONS of things per second) are waiting DAYS to process data.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Friday, February 24, 2017 9:19 AM

    Your eye is on the wrong ball here. Not sure how long this server has been up, but queries have waited almost ONE FULL DAY PER TEMPDB DATA FILE!!!! You simply cannot be successful when your CPUs (which do BILLIONS of things per second) are waiting DAYS to process data.

    @TheSQLGuru - I'm not 100% familiar with how that would all work but is that transactions are waiting days or queries or both?
    I am just wondering if maybe that server has open transactions that never get closed and it could show up as running for days?

    @RVO - any chance you can archive some of that data?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Yes, bmg002.
    I am storing the results in "tempdb_usage" table in utility database.

    Kevin / bmg002,
    It very well might be the case that we have a lot of open transactions.
    I use this Paul Randal query to find opened transactions.
    I don't see any transaction where [Begin Time] is greater than 1 minutes ago.


    /***** http://www.sqlskills.com/blogs/paul/script-open-transactions-with-text-and-plans/  ****/
    use tempdb

    SELECT
      [s_tst].[session_id],
      [s_es].[login_name] AS [Login Name],
      DB_NAME (s_tdt.database_id) AS [Database],
      [s_tdt].[database_transaction_begin_time] AS [Begin Time],
      [s_tdt].[database_transaction_log_bytes_used] AS [Log Bytes],
      [s_tdt].[database_transaction_log_bytes_reserved] AS [Log Rsvd],
      [s_est].text AS [Last T-SQL Text],
      [s_eqp].[query_plan] AS [Last Plan]
    FROM
      sys.dm_tran_database_transactions [s_tdt]
    JOIN
      sys.dm_tran_session_transactions [s_tst]
    ON
      [s_tst].[transaction_id] = [s_tdt].[transaction_id]
    JOIN
      sys.[dm_exec_sessions] [s_es]
    ON
      [s_es].[session_id] = [s_tst].[session_id]
    JOIN
      sys.dm_exec_connections [s_ec]
    ON
      [s_ec].[session_id] = [s_tst].[session_id]
    LEFT OUTER JOIN
      sys.dm_exec_requests [s_er]
    ON
      [s_er].[session_id] = [s_tst].[session_id]
    CROSS APPLY
      sys.dm_exec_sql_text ([s_ec].[most_recent_sql_handle]) AS [s_est]
    OUTER APPLY
      sys.dm_exec_query_plan ([s_er].[plan_handle]) AS [s_eqp]
    ORDER BY
      [Begin Time] ASC;
    GO



    Typically I get 1-2 records.
    Sometimes 10-15 (every 1 minute) like below:

  • What does DBCC OPENTRAN tell you?  anything open?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I don't have permissions for most of DBCC commands like this one.
    Remember i am an ETL Developer with slightly elevated access permissions.
    Is there any other way to catch opened transactions?

  • I tried Adam Mechanic sp_WhoIsActive.
    It returns a column open_tran_count
    Total count is 23.

  • Paul Randal script
    sometimes returns transactions with [Begin Time] = NULL
    What does it mean?

  • Guys,
    Just wanted to mention that our concern has been always about Disk Latency.
    We have SOLARWINDS tool.
    Here is a screenshot from today's SOLARWINDS "Files" perfromance.
    As you can see, sometimes we spend 2000-2500 seconds on file access.

    But honestly I don't understand the first diagram
    because it does not correlate with 
    Disk Read/Write Latency that SOLARWINDS shows on the same screen
    for the same time period (below)

  • 1) Any time I see nvarchar(4000) parameters I think ORM, mismatched data types, and a butt-ton of CONVERT_IMPLICITs with their correspondingly horrible performance and concurrency issues.

    2) You clearly have repeated complete buffer flushes. That's brutal.

    3) Solarwinds is aggregating all IO stalls into 15 minute buckets. Your graphs just show latency, not how many things were running concurrently and collecting fileIO stalls. Just yesterday I captured a 180 SECOND run of IO stalls on a client and they totaled a mind-blowing 2.5 MILLION MILLISECONDS in duration!!! That is 28.94 DAYS of IO stalls in 180 clock seconds!! Guess what, you can't run almost 2TB of BUSY and REALLY BADLY DESIGNED AND CODED OLTP databases with 2 synchronous AGs for each database on a four 6-disk RAID10 sets. :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Friday, February 24, 2017 3:18 PM

    1) Any time I see nvarchar(4000) parameters I think ORM, mismatched data types, and a butt-ton of CONVERT_IMPLICITs with their correspondingly horrible performance and concurrency issues.

    2) You clearly have repeated complete buffer flushes. That's brutal.

    3) Solarwinds is aggregating all IO stalls into 15 minute buckets. Your graphs just show latency, not how many things were running concurrently and collecting fileIO stalls. Just yesterday I captured a 180 SECOND run of IO stalls on a client and they totaled a mind-blowing 2.5 MILLION MILLISECONDS in duration!!! That is 28.94 DAYS of IO stalls in 180 clock seconds!! Guess what, you can't run almost 2TB of BUSY and REALLY BADLY DESIGNED AND CODED OLTP databases with 2 synchronous AGs for each database on a four 6-disk RAID10 sets. :hehe:

    Hey Kevin,

    Just wondering where I can get my hands on the hardware you used to stretch 2.5 million milliseconds (exactly 2,500 seconds, or 41 and 2/3 minutes) into almost 29 years?   That kind of improvement in life expectancy is worth $BILLION$, if not $TRILLION$ .... 😉 😉 😉

  • sgmunson - Friday, February 24, 2017 3:30 PM

    Hey Kevin,

    Just wondering where I can get my hands on the hardware you used to stretch 2.5 million milliseconds (exactly 2,500 seconds, or 41 and 2/3 minutes) into almost 29 years?   That kind of improvement in life expectancy is worth $BILLION$, if not $TRILLION$ .... 😉 😉 😉

    HAH! Clearly the recent 2-week trip to Australia, and it's drastic effect on my sleep and mental state, is still having effects! Not only did I not divide by 1000 I took the 28.9 DAYS and said YEARS. Oy vey!  :blink: I need to go check if I told the client it was 29 years...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Friday, February 24, 2017 4:56 PM

    sgmunson - Friday, February 24, 2017 3:30 PM

    Hey Kevin,

    Just wondering where I can get my hands on the hardware you used to stretch 2.5 million milliseconds (exactly 2,500 seconds, or 41 and 2/3 minutes) into almost 29 years?   That kind of improvement in life expectancy is worth $BILLION$, if not $TRILLION$ .... 😉 😉 😉

    HAH! Clearly the recent 2-week trip to Australia, and it's drastic effect on my sleep and mental state, is still having effects! Not only did I not divide by 1000 I took the 28.9 DAYS and said YEARS. Oy vey!  :blink: I need to go check if I told the client it was 29 years...

    Apparently, your trip affected me as well...  I somehow saw YEARS where the word DAYS appeared.   Geez Louise!   This gettin' old sh_ _ is gettin' really old...

Viewing 15 posts - 76 through 90 (of 114 total)

You must be logged in to reply to this topic. Login to reply