I THINK my job is failing....?

  • I have a transaction log backup job that runs hourly. From time to time, this job fails consistently numerous times in a row, and I don't know why. This is what I see in the Log File Viewer (cleaned up for easier viewing):

    Executing query "DECLARE @GUID UNIQUEIDENTIFIER

    EXECUTE msdb..sp".: 100% complete

    End Progress Progress: 2010-07-30 08:10:03.98

    Source: Back Up Database Task

    Executing query "EXECUTE master.dbo.xp_create_subdir N'\\10.10.80.1".: 100% complete

    End Progress Progress: 2010-07-30 08:10:04.01

    Source: Back Up Database Task

    Executing query "EXECUTE master.dbo.xp_create_subdir N'\\10.10.80.1".: 100% complete

    End Progress Progress: 2010-07-30 08:10:09.91

    Source: Back Up Database Task

    Executing query "BACKUP LOG [bug] TO DISK = N'\\10.10.80.15\volume".: 100% complete

    End Progress Progress: 2010-07-30 08:10:13.91

    Source: Back Up Database Task

    Executing query "declare @backupSetId as ... The package execution fa... The step failed.

    However, I also see this in the Event Log (also cleaned up):

    Log was backed up.

    Database: bug,

    creation date(time): 2008/05/07(10:57:15),

    first LSN: 5689:7926:1,

    last LSN: 5695:2454:1,

    number of dump devices: 1,

    device information: (FILE=1, TYPE=DISK: {'\\10.10.80.15\volume01\backups\SQL\transactionlog\bug\bug_backup_201007300810.trn'}).

    This is an informational message only. No user action is required.

    AND

    Log was backed up.

    Database: ShareRW,

    creation date(time): 2008/05/07(10:55:30),

    first LSN: 1114565:255337:1,

    last LSN: 1114571:35624:1,

    number of dump devices: 1,

    device information: (FILE=1, TYPE=DISK: {'\\10.10.80.15\volume01\backups\SQL\transactionlog\ShareRW\ShareRW_backup_201007300810.trn'}).

    This is an informational message only. No user action is required.

    The size of the backups does not appear to be much different from the norm nor does the length time to execute, so I have no idea what is going on. Other than Log Viewer telling the job failed, everything seems totally normal to me. How can I find more information on what caused the failure, assuming that the job really did fail?

  • Add an output file to the job step --> advanced. With this the job execution will be fully logged in a text file.

    Pradeep Adiga
    Blog: sqldbadiaries.com
    Twitter: @pradeepadiga

  • Ok, that failed with an error, but I am getting a logfile created for this job. I can see from that log that the problem is my step to shrink the transaction log. When my transaction log backup job runs simultaneously to the database backup job, the shrink fails, because obviously I can't do the shrink while the backup is running. But the backup of the transaction log is working fine, so it is kind of a false positive, as I suspected.

    So, the step to shrink the transaction log is T-SQL. Is there a way within T-SQL that I can test for the execution of the database backup? Then I can write the shrink statement to run only if the database backup is not running. And that would actually make my life a little easier, because then I could let the transaction log backup run every hour all day long and not try to juggle its schedule to not step on the backup job's schedule.

  • There are lots of posts on this forum as to why shrinking the log file is not advised. What is your reason for wanting to shrink it?

    Cindy

  • Because when I do not shrink the log file, it continues to request space until it reaches its limit, and then my database completely hangs. I've had this occur on several occasions, and last week it happened twice in three days.

  • IMHO, your bigger problem is why your log file keeps growing to an unmanageable size and not the shrink running concurrently with the backup.

    The log backup is going to "empty" the log, and that space can be reused. By shrinking the log, the space is being deallocated needlessly, then immediately reallocated. Try running this command before the log backup occurs and after the log backup occurs (without a shrink). It will tell you how much space is being used. If your log is always going to grow to xx MB, that's its size.

    dbcc sqlperf (logspace)

    If you don't shrink the database, the space in the log will be re-used - it doesn't continue to grow. You might also want to consider running the log backup more frequently than every hour.

    Maybe someone can help you write code to prevent the shrink and backup from running concurrently, but I still believe the effort is better spent in preventing the log from filling or your problems are just going to keep continuing - shrink or not.

    Cindy

  • You can check in sys.processes to see if any backup job is running against a database. Put this logic in the Shrink TLog job step

    Pradeep Adiga
    Blog: sqldbadiaries.com
    Twitter: @pradeepadiga

  • If your log continues to grow - even after transaction log backups are run, then you have an long running open transaction that is preventing the space in the log file from being reused.

    You can run DBCC OPENTRANS on the database that you are having problems with to see what the oldest open transaction is. Once you have that identified, you'll be able to figure out what the required size of the log is going to be - or you can fix that process so it doesn't hold an open transaction.

    As to your original failure - instead of viewing the job history, you need to view the maintenance plan history. This will identify which step in the maintenance plan is failing and show you the actual error.

    One thing I noticed is it appears that you are running a maintenance cleanup task in the same plan as your transaction log backups. This is not recommended - at least I don't recommend it. For a restore, you need to have a continuous unbroken chain of log backups. By removing them every hour you end up with a window of transaction log backups - and could end up missing the ones you would need to restore the system.

    If you are copying those backup files to tape - which you should be doing, when does that occur? You really need to make sure the files have been copied off before you remove them. This is usually the biggest problem with removing transaction log backups in the same step as backing up the transaction log. Those backups don't end up going to tape - and you now have no way to recover the system.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Latest update: I haven't gotten a chance to really dive into the root cause of this problem as some of you have suggested, so bare with me. I am running my transaction log "backup then shrink" job every hour between 7:00 AM and 11:00 PM and my full backup at 1:20 AM. The full backup usually completes before 7:00 AM. Last night, due to reasons I have been unable to determine, the full backup generated approximately 120G of transaction log data, filling the drive to 93% capacity before I caught it. If it had reached 100%, my website would have crashed again. Currently, my transaction log has allocated space of 129G with 4.6G free (3%).

    I have absolutely no idea why this happens, but like I said earlier, it is happening much more frequently in the last couple of months. This schedule that I have described above has been generating a transaction log backup file of roughly 6G the first time it runs after the completion of the nightly full backup (the 7:00 AM one). This morning, the 7:00 AM transaction log backup file is >30G and growing, and the last time this happened, the 7:00 AM transaction log backup file was >75G. Again, I have no idea what transpired on 7/12 or last night to produce such huge amounts of transactions.

  • Adiga (7/30/2010)


    You can check in sys.processes to see if any backup job is running against a database. Put this logic in the Shrink TLog job step

    I am now using this logic in my job:

    IF (SELECT COUNT(*) FROM sys.dm_tran_active_transactions WITH (NOLOCK)

    WHERE name = 'Backup/Restore') = 0

    BEGIN

    USE ShareRW;

    DBCC SHRINKFILE (ShareLog);

    USE bug;

    DBCC SHRINKFILE (bugLog);

    END

  • Full backups don't generate log volume. If you're seeing the log grow then something else is happening during the full log backup. You're not going to fix this without digging into long-running transactions I don't think.

  • If you have something on your system that's running away and using up more than 120Gb of transaction log space, you need to find out what's causing that as your number one priority--unless you like things crashing on you!

  • I'm curious. What recovery mode is your database in? FULL or Bulk-Logged?

    If it's in FULL and you have a bunch of Bulk Inserts for your data, that could be the problem with your log file. You might consider a temporary switch over to BL mode while doing the inserts, then flipping back to FULL when they're done.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (8/6/2010)


    I'm curious. What recovery mode is your database in? FULL or Bulk-Logged?

    I have no idea, and I've never heard of bulk-logged. How can I determine what mode I am in?

  • Latest update: I moved the execution time of the full backup from 1:20 AM to 12:00 PM which is just before the lowest usage period of my site. I thought that should reduce the time that it took the backup to run, because least usage should equal least database access. Didn't happen, but I am going to keep the new schedule anyway.

    However, I also discovered something else interesting. I added the logic to my transaction log backup to only attempt the log file shrink when the full backup is not running. That is working fine. What I found is that the size of the transaction log backup at 4:10 AM is roughly 200-300X the size of the backups for the other 23 hours, or least it has been that way for four consecutive days running. Only four other hours' backups were even 25% the size of the smallest 4:10 AM backup. Further investigation revealed that there is a full index rebuild/defrag occurring nightly at 3:10 AM. Why anyone would schedule a full index rebuild only two hours after a full back is beyond me, but I can only think that the schedule is such because when they first set up this environment, the full backup took less than 2 hours to complete, so there was never a conflict. No that the backup takes 6+ hours, it is painful. We had to endure all the problems of these two processes bumping into each other on a daily basis until the non-DBA (me) stumbled on to the cause.

    For now, I am leaving things the way they are, because things seem to be working, and logically my changes make sense. If anyone objects or thinks otherwise, I'm open to recommendations.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply