|
exchange
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Exchange 2003 server bluescreen/rebootingand I'm not sure what's causing it. We have a Windows 2003 A/D environment with three Exchange 2003 sp2 servers. Everything has been running fairly smooth since we set it up 2-1/2 years ago. Our environment is a single forest/single domain/single site with three domain controllers, two of them GC's. The Exchange and DC's are in the same subnet, all connected with 1Gb ethernet. About 3 months ago I set up a new server. This was new hardware, an HP ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for the O/S, 12-72G 15K drives in a RAID1 for the info store, and another RAID1 set for the logs. It is running on Windows 2003 R2 sp2 Enterprise edition, with Exchange 2003 enterprise Sp2. It has McAfee GroupShield 6.02 for antivirus. It's current with patches, security updates, etc., as about 1 month ago (the Antivirus is updated every 2 hours). It's not exposed directly to the internet; external SMTP mail is going through a different server. We brought this into our exchange org and migrated about 20 users to it. It had been running running with no problems since then. About a week ago we started migrating public folders and mailboxes from one of our other servers. I did it in stages over several nights. We've moved about 1,200 mailboxes and about 100 public folders as of Wednesday night. The private info store is now 70GB, the public store is about 11GB. Wednesday morning I ran the Exchange "Best Practices Analyzer" (apparently I forgot to run it when I set up the system..). It found three critical items that I had missed. The "/3GB" and /Userva=3030 weren't set (I can't believe I missed those!). Also the "SystemPages" and "HeapDecommitFreeBlockThreshold" were not set properly (they were at their default values, they should be 0 and 262144). Also there was an informational note recommending to set the A/D "msExchESEParamLogBuffers" to 9000. I made those changes but did not reboot, since we were planning on rebooting this weekend as part of our maintenance window. I'm also planning an offline defragment due to all the mailbox moves. Thursday morning, around 11:15am, the server did a bugcheck/blue screen and rebooted. Fortunately it was only down for about 2 minutes. There was no indication of problems in the app or system logs, just the system dump error (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000, 0xB9C91819)) followed by the reboot and the standard "The last shutdown was unexpected.". Everything seemed to be OK, then around 2:45 it did it again. Again, it came back up in a couple of minutes, and checking the log showed no errors or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034, 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot. As of 9:00pm it hasn't done it again.. Since I had the memory parameters not set properly, I can understand the first bugcheck/reboot, but why would it do it a second time three hours later? I would think the settings would have been applied after the first reboot. This weekend I'm scheduling an offline defrag, and will be doing a Windows update for all "critical" updates, as well as updating the various HP hardware drivers (this was scheduled before we started having problems). I think I'm also going to reset the memory modules, just in case there's a bad memory connection. I think I'm also going to start preparing another Exchange server (unfortunately it would not be with new hardware) just in case.. Sorry about the length of this post. Any suggestions on other stuff I can check, I would appreciate it. Mike O. both bug codes are an IRQL not less or equal which leads me in the direction
of a kernal mode device driver. Have you updated any drivers lately? On HP I think there was an issue with the HpCISSs2 driver a while back. Have you tried: http://h18007.www1.hp.com/support/files/server/us/download/27349.html or http://h18007.www1.hp.com/support/files/server/us/download/27348.html Show quote "Mike O" <put_the_spam@the.can> wrote in message news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl... > I'm having a problem with one of our Exchange 2003 servers blue screening > and I'm not sure what's causing it. > We have a Windows 2003 A/D environment with three Exchange 2003 sp2 > servers. > Everything has been running fairly smooth since we set it up 2-1/2 years > ago. Our environment is a single forest/single domain/single site with > three domain controllers, two of them GC's. The Exchange and DC's are in > the same subnet, all connected with 1Gb ethernet. > > About 3 months ago I set up a new server. This was new hardware, an HP > ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for > the O/S, 12-72G 15K drives in a RAID1 for the info store, and another > RAID1 > set for the logs. It is running on Windows 2003 R2 sp2 Enterprise > edition, > with Exchange 2003 enterprise Sp2. It has McAfee GroupShield 6.02 for > antivirus. It's current with patches, security updates, etc., as about 1 > month ago (the Antivirus is updated every 2 hours). It's not exposed > directly to the internet; external SMTP mail is going through a different > server. We brought this into our exchange org and migrated about 20 users > to it. It had been running running with no problems since then. > > About a week ago we started migrating public folders and mailboxes from > one > of our other servers. I did it in stages over several nights. We've > moved > about 1,200 mailboxes and about 100 public folders as of Wednesday night. > The private info store is now 70GB, the public store is about 11GB. > > Wednesday morning I ran the Exchange "Best Practices Analyzer" (apparently > I > forgot to run it when I set up the system..). It found three critical > items that I had missed. The "/3GB" and /Userva=3030 weren't set (I > can't > believe I missed those!). Also the "SystemPages" and > "HeapDecommitFreeBlockThreshold" were not set properly (they were at their > default values, they should be 0 and 262144). Also there was an > informational note recommending to set the A/D "msExchESEParamLogBuffers" > to > 9000. I made those changes but did not reboot, since we were planning on > rebooting this weekend as part of our maintenance window. I'm also > planning > an offline defragment due to all the mailbox moves. > > Thursday morning, around 11:15am, the server did a bugcheck/blue screen > and > rebooted. Fortunately it was only down for about 2 minutes. There was no > indication of problems in the app or system logs, just the system dump > error > (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000, > 0xB9C91819)) followed by the reboot and the standard "The last shutdown > was > unexpected.". > > Everything seemed to be OK, then around 2:45 it did it again. Again, it > came back up in a couple of minutes, and checking the log showed no errors > or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034, > 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot. As of 9:00pm it hasn't > done > it again.. > > Since I had the memory parameters not set properly, I can understand the > first bugcheck/reboot, but why would it do it a second time three hours > later? I would think the settings would have been applied after the first > reboot. > > This weekend I'm scheduling an offline defrag, and will be doing a Windows > update for all "critical" updates, as well as updating the various HP > hardware drivers (this was scheduled before we started having problems). > I think I'm also going to reset the memory modules, just in case there's a > bad memory connection. > > I think I'm also going to start preparing another Exchange server > (unfortunately it would not be with new hardware) just in case.. > > Sorry about the length of this post. Any suggestions on other stuff I can > check, I would appreciate it. > > Mike O. > both bug codes are an IRQL not less or equal which leads me in the direction
of a kernal mode device driver. Have you updated any drivers lately? On HP I think there was an issue with the HpCISSs2 driver a while back. Have you tried: http://h18007.www1.hp.com/support/files/server/us/download/27349.html or http://h18007.www1.hp.com/support/files/server/us/download/27348.html Show quote "Mike O" <put_the_spam@the.can> wrote in message news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl... > I'm having a problem with one of our Exchange 2003 servers blue screening > and I'm not sure what's causing it. > We have a Windows 2003 A/D environment with three Exchange 2003 sp2 > servers. > Everything has been running fairly smooth since we set it up 2-1/2 years > ago. Our environment is a single forest/single domain/single site with > three domain controllers, two of them GC's. The Exchange and DC's are in > the same subnet, all connected with 1Gb ethernet. > > About 3 months ago I set up a new server. This was new hardware, an HP > ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for > the O/S, 12-72G 15K drives in a RAID1 for the info store, and another > RAID1 > set for the logs. It is running on Windows 2003 R2 sp2 Enterprise > edition, > with Exchange 2003 enterprise Sp2. It has McAfee GroupShield 6.02 for > antivirus. It's current with patches, security updates, etc., as about 1 > month ago (the Antivirus is updated every 2 hours). It's not exposed > directly to the internet; external SMTP mail is going through a different > server. We brought this into our exchange org and migrated about 20 users > to it. It had been running running with no problems since then. > > About a week ago we started migrating public folders and mailboxes from > one > of our other servers. I did it in stages over several nights. We've > moved > about 1,200 mailboxes and about 100 public folders as of Wednesday night. > The private info store is now 70GB, the public store is about 11GB. > > Wednesday morning I ran the Exchange "Best Practices Analyzer" (apparently > I > forgot to run it when I set up the system..). It found three critical > items that I had missed. The "/3GB" and /Userva=3030 weren't set (I > can't > believe I missed those!). Also the "SystemPages" and > "HeapDecommitFreeBlockThreshold" were not set properly (they were at their > default values, they should be 0 and 262144). Also there was an > informational note recommending to set the A/D "msExchESEParamLogBuffers" > to > 9000. I made those changes but did not reboot, since we were planning on > rebooting this weekend as part of our maintenance window. I'm also > planning > an offline defragment due to all the mailbox moves. > > Thursday morning, around 11:15am, the server did a bugcheck/blue screen > and > rebooted. Fortunately it was only down for about 2 minutes. There was no > indication of problems in the app or system logs, just the system dump > error > (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000, > 0xB9C91819)) followed by the reboot and the standard "The last shutdown > was > unexpected.". > > Everything seemed to be OK, then around 2:45 it did it again. Again, it > came back up in a couple of minutes, and checking the log showed no errors > or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034, > 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot. As of 9:00pm it hasn't > done > it again.. > > Since I had the memory parameters not set properly, I can understand the > first bugcheck/reboot, but why would it do it a second time three hours > later? I would think the settings would have been applied after the first > reboot. > > This weekend I'm scheduling an offline defrag, and will be doing a Windows > update for all "critical" updates, as well as updating the various HP > hardware drivers (this was scheduled before we started having problems). > I think I'm also going to reset the memory modules, just in case there's a > bad memory connection. > > I think I'm also going to start preparing another Exchange server > (unfortunately it would not be with new hardware) just in case.. > > Sorry about the length of this post. Any suggestions on other stuff I can > check, I would appreciate it. > > Mike O. > The drivers haven't been changed since around mid-late June. That's what
seems so odd about this, why did it take so long for the errors to pop up? I looked into the HPCISS driver issue. It was released early 2006, but when I set this server up last May I applied all the driver and BIOS updates at that time, so I don't think that's an issue. We haven't had any reboots since the 2nd one on Thursday. We have our monthly maintenance window tonight and tomorrow. I'm currently running an offline defrag on the server due to all the mailbox moves and tomorrow morning I'm going to update all the HP drivers & do a "Windows Update" for anything critical. Show quote "John Fullbright" <fjohn@donotspamenetappdotcom> wrote in message news:%23YMfPk09HHA.5404@TK2MSFTNGP02.phx.gbl... > both bug codes are an IRQL not less or equal which leads me in the > direction of a kernal mode device driver. Have you updated any drivers > lately? On HP I think there was an issue with the HpCISSs2 driver a while > back. Have you tried: > > http://h18007.www1.hp.com/support/files/server/us/download/27349.html > > or > > http://h18007.www1.hp.com/support/files/server/us/download/27348.html > > > "Mike O" <put_the_spam@the.can> wrote in message > news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl... >> I'm having a problem with one of our Exchange 2003 servers blue screening >> and I'm not sure what's causing it. >> We have a Windows 2003 A/D environment with three Exchange 2003 sp2 >> servers. >> Everything has been running fairly smooth since we set it up 2-1/2 years >> ago. Our environment is a single forest/single domain/single site with >> three domain controllers, two of them GC's. The Exchange and DC's are in >> the same subnet, all connected with 1Gb ethernet. >> >> About 3 months ago I set up a new server. This was new hardware, an HP >> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for >> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another >> RAID1 >> set for the logs. It is running on Windows 2003 R2 sp2 Enterprise >> edition, >> with Exchange 2003 enterprise Sp2. It has McAfee GroupShield 6.02 for >> antivirus. It's current with patches, security updates, etc., as about 1 >> month ago (the Antivirus is updated every 2 hours). It's not exposed >> directly to the internet; external SMTP mail is going through a different >> server. We brought this into our exchange org and migrated about 20 >> users >> to it. It had been running running with no problems since then. >> >> About a week ago we started migrating public folders and mailboxes from >> one >> of our other servers. I did it in stages over several nights. We've >> moved >> about 1,200 mailboxes and about 100 public folders as of Wednesday night. >> The private info store is now 70GB, the public store is about 11GB. >> >> Wednesday morning I ran the Exchange "Best Practices Analyzer" >> (apparently I >> forgot to run it when I set up the system..). It found three critical >> items that I had missed. The "/3GB" and /Userva=3030 weren't set (I >> can't >> believe I missed those!). Also the "SystemPages" and >> "HeapDecommitFreeBlockThreshold" were not set properly (they were at >> their >> default values, they should be 0 and 262144). Also there was an >> informational note recommending to set the A/D "msExchESEParamLogBuffers" >> to >> 9000. I made those changes but did not reboot, since we were planning on >> rebooting this weekend as part of our maintenance window. I'm also >> planning >> an offline defragment due to all the mailbox moves. >> >> Thursday morning, around 11:15am, the server did a bugcheck/blue screen >> and >> rebooted. Fortunately it was only down for about 2 minutes. There was >> no >> indication of problems in the app or system logs, just the system dump >> error >> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000, >> 0xB9C91819)) followed by the reboot and the standard "The last shutdown >> was >> unexpected.". >> >> Everything seemed to be OK, then around 2:45 it did it again. Again, it >> came back up in a couple of minutes, and checking the log showed no >> errors >> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034, >> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot. As of 9:00pm it hasn't >> done >> it again.. >> >> Since I had the memory parameters not set properly, I can understand the >> first bugcheck/reboot, but why would it do it a second time three hours >> later? I would think the settings would have been applied after the >> first >> reboot. >> >> This weekend I'm scheduling an offline defrag, and will be doing a >> Windows >> update for all "critical" updates, as well as updating the various HP >> hardware drivers (this was scheduled before we started having problems). >> I think I'm also going to reset the memory modules, just in case there's >> a >> bad memory connection. >> >> I think I'm also going to start preparing another Exchange server >> (unfortunately it would not be with new hardware) just in case.. >> >> Sorry about the length of this post. Any suggestions on other stuff I >> can >> check, I would appreciate it. >> >> Mike O. >> > > If you have the dump file (memory.dmp) download the debugging tools. run
windbg and set the symbol path to the symsvr, and load the dump file. Do a !Analyze, and it'll tell you the most likely source of the issue (driver that caused the problem). Show quote "Mike O" <put_the_spam@the.can> wrote in message news:OCDPj509HHA.484@TK2MSFTNGP06.phx.gbl... > The drivers haven't been changed since around mid-late June. That's what > seems so odd about this, why did it take so long for the errors to pop up? > > I looked into the HPCISS driver issue. It was released early 2006, but > when I set this server up last May I applied all the driver and BIOS > updates at that time, so I don't think that's an issue. > > We haven't had any reboots since the 2nd one on Thursday. We have our > monthly maintenance window tonight and tomorrow. I'm currently running an > offline defrag on the server due to all the mailbox moves and tomorrow > morning I'm going to update all the HP drivers & do a "Windows Update" for > anything critical. > > > "John Fullbright" <fjohn@donotspamenetappdotcom> wrote in message > news:%23YMfPk09HHA.5404@TK2MSFTNGP02.phx.gbl... >> both bug codes are an IRQL not less or equal which leads me in the >> direction of a kernal mode device driver. Have you updated any drivers >> lately? On HP I think there was an issue with the HpCISSs2 driver a >> while back. Have you tried: >> >> http://h18007.www1.hp.com/support/files/server/us/download/27349.html >> >> or >> >> http://h18007.www1.hp.com/support/files/server/us/download/27348.html >> >> >> "Mike O" <put_the_spam@the.can> wrote in message >> news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl... >>> I'm having a problem with one of our Exchange 2003 servers blue >>> screening >>> and I'm not sure what's causing it. >>> We have a Windows 2003 A/D environment with three Exchange 2003 sp2 >>> servers. >>> Everything has been running fairly smooth since we set it up 2-1/2 years >>> ago. Our environment is a single forest/single domain/single site with >>> three domain controllers, two of them GC's. The Exchange and DC's are >>> in >>> the same subnet, all connected with 1Gb ethernet. >>> >>> About 3 months ago I set up a new server. This was new hardware, an HP >>> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 >>> for >>> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another >>> RAID1 >>> set for the logs. It is running on Windows 2003 R2 sp2 Enterprise >>> edition, >>> with Exchange 2003 enterprise Sp2. It has McAfee GroupShield 6.02 for >>> antivirus. It's current with patches, security updates, etc., as about >>> 1 >>> month ago (the Antivirus is updated every 2 hours). It's not exposed >>> directly to the internet; external SMTP mail is going through a >>> different >>> server. We brought this into our exchange org and migrated about 20 >>> users >>> to it. It had been running running with no problems since then. >>> >>> About a week ago we started migrating public folders and mailboxes from >>> one >>> of our other servers. I did it in stages over several nights. We've >>> moved >>> about 1,200 mailboxes and about 100 public folders as of Wednesday >>> night. >>> The private info store is now 70GB, the public store is about 11GB. >>> >>> Wednesday morning I ran the Exchange "Best Practices Analyzer" >>> (apparently I >>> forgot to run it when I set up the system..). It found three critical >>> items that I had missed. The "/3GB" and /Userva=3030 weren't set (I >>> can't >>> believe I missed those!). Also the "SystemPages" and >>> "HeapDecommitFreeBlockThreshold" were not set properly (they were at >>> their >>> default values, they should be 0 and 262144). Also there was an >>> informational note recommending to set the A/D >>> "msExchESEParamLogBuffers" to >>> 9000. I made those changes but did not reboot, since we were planning >>> on >>> rebooting this weekend as part of our maintenance window. I'm also >>> planning >>> an offline defragment due to all the mailbox moves. >>> >>> Thursday morning, around 11:15am, the server did a bugcheck/blue screen >>> and >>> rebooted. Fortunately it was only down for about 2 minutes. There was >>> no >>> indication of problems in the app or system logs, just the system dump >>> error >>> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000, >>> 0xB9C91819)) followed by the reboot and the standard "The last shutdown >>> was >>> unexpected.". >>> >>> Everything seemed to be OK, then around 2:45 it did it again. Again, it >>> came back up in a couple of minutes, and checking the log showed no >>> errors >>> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034, >>> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot. As of 9:00pm it hasn't >>> done >>> it again.. >>> >>> Since I had the memory parameters not set properly, I can understand the >>> first bugcheck/reboot, but why would it do it a second time three hours >>> later? I would think the settings would have been applied after the >>> first >>> reboot. >>> >>> This weekend I'm scheduling an offline defrag, and will be doing a >>> Windows >>> update for all "critical" updates, as well as updating the various HP >>> hardware drivers (this was scheduled before we started having problems). >>> I think I'm also going to reset the memory modules, just in case there's >>> a >>> bad memory connection. >>> >>> I think I'm also going to start preparing another Exchange server >>> (unfortunately it would not be with new hardware) just in case.. >>> >>> Sorry about the length of this post. Any suggestions on other stuff I >>> can >>> check, I would appreciate it. >>> >>> Mike O. >>> >> >> > The drivers haven't been changed since around mid-late June. That's what
seems so odd about this, why did it take so long for the errors to pop up? I looked into the HPCISS driver issue. It was released early 2006, but when I set this server up last May I applied all the driver and BIOS updates at that time, so I don't think that's an issue. We haven't had any reboots since the 2nd one on Thursday. We have our monthly maintenance window tonight and tomorrow. I'm currently running an offline defrag on the server due to all the mailbox moves and tomorrow morning I'm going to update all the HP drivers & do a "Windows Update" for anything critical. Show quote "John Fullbright" <fjohn@donotspamenetappdotcom> wrote in message news:%23YMfPk09HHA.5404@TK2MSFTNGP02.phx.gbl... > both bug codes are an IRQL not less or equal which leads me in the > direction of a kernal mode device driver. Have you updated any drivers > lately? On HP I think there was an issue with the HpCISSs2 driver a while > back. Have you tried: > > http://h18007.www1.hp.com/support/files/server/us/download/27349.html > > or > > http://h18007.www1.hp.com/support/files/server/us/download/27348.html > > > "Mike O" <put_the_spam@the.can> wrote in message > news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl... >> I'm having a problem with one of our Exchange 2003 servers blue screening >> and I'm not sure what's causing it. >> We have a Windows 2003 A/D environment with three Exchange 2003 sp2 >> servers. >> Everything has been running fairly smooth since we set it up 2-1/2 years >> ago. Our environment is a single forest/single domain/single site with >> three domain controllers, two of them GC's. The Exchange and DC's are in >> the same subnet, all connected with 1Gb ethernet. >> >> About 3 months ago I set up a new server. This was new hardware, an HP >> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for >> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another >> RAID1 >> set for the logs. It is running on Windows 2003 R2 sp2 Enterprise >> edition, >> with Exchange 2003 enterprise Sp2. It has McAfee GroupShield 6.02 for >> antivirus. It's current with patches, security updates, etc., as about 1 >> month ago (the Antivirus is updated every 2 hours). It's not exposed >> directly to the internet; external SMTP mail is going through a different >> server. We brought this into our exchange org and migrated about 20 >> users >> to it. It had been running running with no problems since then. >> >> About a week ago we started migrating public folders and mailboxes from >> one >> of our other servers. I did it in stages over several nights. We've >> moved >> about 1,200 mailboxes and about 100 public folders as of Wednesday night. >> The private info store is now 70GB, the public store is about 11GB. >> >> Wednesday morning I ran the Exchange "Best Practices Analyzer" >> (apparently I >> forgot to run it when I set up the system..). It found three critical >> items that I had missed. The "/3GB" and /Userva=3030 weren't set (I >> can't >> believe I missed those!). Also the "SystemPages" and >> "HeapDecommitFreeBlockThreshold" were not set properly (they were at >> their >> default values, they should be 0 and 262144). Also there was an >> informational note recommending to set the A/D "msExchESEParamLogBuffers" >> to >> 9000. I made those changes but did not reboot, since we were planning on >> rebooting this weekend as part of our maintenance window. I'm also >> planning >> an offline defragment due to all the mailbox moves. >> >> Thursday morning, around 11:15am, the server did a bugcheck/blue screen >> and >> rebooted. Fortunately it was only down for about 2 minutes. There was >> no >> indication of problems in the app or system logs, just the system dump >> error >> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000, >> 0xB9C91819)) followed by the reboot and the standard "The last shutdown >> was >> unexpected.". >> >> Everything seemed to be OK, then around 2:45 it did it again. Again, it >> came back up in a couple of minutes, and checking the log showed no >> errors >> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034, >> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot. As of 9:00pm it hasn't >> done >> it again.. >> >> Since I had the memory parameters not set properly, I can understand the >> first bugcheck/reboot, but why would it do it a second time three hours >> later? I would think the settings would have been applied after the >> first >> reboot. >> >> This weekend I'm scheduling an offline defrag, and will be doing a >> Windows >> update for all "critical" updates, as well as updating the various HP >> hardware drivers (this was scheduled before we started having problems). >> I think I'm also going to reset the memory modules, just in case there's >> a >> bad memory connection. >> >> I think I'm also going to start preparing another Exchange server >> (unfortunately it would not be with new hardware) just in case.. >> >> Sorry about the length of this post. Any suggestions on other stuff I >> can >> check, I would appreciate it. >> >> Mike O. >> > > If you have the dump file (memory.dmp) download the debugging tools. run
windbg and set the symbol path to the symsvr, and load the dump file. Do a !Analyze, and it'll tell you the most likely source of the issue (driver that caused the problem). Show quote "Mike O" <put_the_spam@the.can> wrote in message news:OCDPj509HHA.484@TK2MSFTNGP06.phx.gbl... > The drivers haven't been changed since around mid-late June. That's what > seems so odd about this, why did it take so long for the errors to pop up? > > I looked into the HPCISS driver issue. It was released early 2006, but > when I set this server up last May I applied all the driver and BIOS > updates at that time, so I don't think that's an issue. > > We haven't had any reboots since the 2nd one on Thursday. We have our > monthly maintenance window tonight and tomorrow. I'm currently running an > offline defrag on the server due to all the mailbox moves and tomorrow > morning I'm going to update all the HP drivers & do a "Windows Update" for > anything critical. > > > "John Fullbright" <fjohn@donotspamenetappdotcom> wrote in message > news:%23YMfPk09HHA.5404@TK2MSFTNGP02.phx.gbl... >> both bug codes are an IRQL not less or equal which leads me in the >> direction of a kernal mode device driver. Have you updated any drivers >> lately? On HP I think there was an issue with the HpCISSs2 driver a >> while back. Have you tried: >> >> http://h18007.www1.hp.com/support/files/server/us/download/27349.html >> >> or >> >> http://h18007.www1.hp.com/support/files/server/us/download/27348.html >> >> >> "Mike O" <put_the_spam@the.can> wrote in message >> news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl... >>> I'm having a problem with one of our Exchange 2003 servers blue >>> screening >>> and I'm not sure what's causing it. >>> We have a Windows 2003 A/D environment with three Exchange 2003 sp2 >>> servers. >>> Everything has been running fairly smooth since we set it up 2-1/2 years >>> ago. Our environment is a single forest/single domain/single site with >>> three domain controllers, two of them GC's. The Exchange and DC's are >>> in >>> the same subnet, all connected with 1Gb ethernet. >>> >>> About 3 months ago I set up a new server. This was new hardware, an HP >>> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 >>> for >>> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another >>> RAID1 >>> set for the logs. It is running on Windows 2003 R2 sp2 Enterprise >>> edition, >>> with Exchange 2003 enterprise Sp2. It has McAfee GroupShield 6.02 for >>> antivirus. It's current with patches, security updates, etc., as about >>> 1 >>> month ago (the Antivirus is updated every 2 hours). It's not exposed >>> directly to the internet; external SMTP mail is going through a >>> different >>> server. We brought this into our exchange org and migrated about 20 >>> users >>> to it. It had been running running with no problems since then. >>> >>> About a week ago we started migrating public folders and mailboxes from >>> one >>> of our other servers. I did it in stages over several nights. We've >>> moved >>> about 1,200 mailboxes and about 100 public folders as of Wednesday >>> night. >>> The private info store is now 70GB, the public store is about 11GB. >>> >>> Wednesday morning I ran the Exchange "Best Practices Analyzer" >>> (apparently I >>> forgot to run it when I set up the system..). It found three critical >>> items that I had missed. The "/3GB" and /Userva=3030 weren't set (I >>> can't >>> believe I missed those!). Also the "SystemPages" and >>> "HeapDecommitFreeBlockThreshold" were not set properly (they were at >>> their >>> default values, they should be 0 and 262144). Also there was an >>> informational note recommending to set the A/D >>> "msExchESEParamLogBuffers" to >>> 9000. I made those changes but did not reboot, since we were planning >>> on >>> rebooting this weekend as part of our maintenance window. I'm also >>> planning >>> an offline defragment due to all the mailbox moves. >>> >>> Thursday morning, around 11:15am, the server did a bugcheck/blue screen >>> and >>> rebooted. Fortunately it was only down for about 2 minutes. There was >>> no >>> indication of problems in the app or system logs, just the system dump >>> error >>> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000, >>> 0xB9C91819)) followed by the reboot and the standard "The last shutdown >>> was >>> unexpected.". >>> >>> Everything seemed to be OK, then around 2:45 it did it again. Again, it >>> came back up in a couple of minutes, and checking the log showed no >>> errors >>> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034, >>> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot. As of 9:00pm it hasn't >>> done >>> it again.. >>> >>> Since I had the memory parameters not set properly, I can understand the >>> first bugcheck/reboot, but why would it do it a second time three hours >>> later? I would think the settings would have been applied after the >>> first >>> reboot. >>> >>> This weekend I'm scheduling an offline defrag, and will be doing a >>> Windows >>> update for all "critical" updates, as well as updating the various HP >>> hardware drivers (this was scheduled before we started having problems). >>> I think I'm also going to reset the memory modules, just in case there's >>> a >>> bad memory connection. >>> >>> I think I'm also going to start preparing another Exchange server >>> (unfortunately it would not be with new hardware) just in case.. >>> >>> Sorry about the length of this post. Any suggestions on other stuff I >>> can >>> check, I would appreciate it. >>> >>> Mike O. >>> >> >> > Thank you everyone for all the information.
I did the HP driver update, Windows update, and Exchange BPA tuning and there's not been any problems since the reboot on 9/13 (prior to my first posting). We've sucessfully migrated the remaining mailboxes and public folders from one of our production Exchange servers. I'll go through the process to verify everything is off the system then will be decomissioning the old server next week. I will check into the comments regarding the memory dump, but hopefully it was just a unique combination of misconfigured memory & drivers. My next phase of this project (mentioned in another post) is to do the same kind of migration with our second old production Exchange. Unfortunately, I don't have new hardware to migrate this to, so I'll have to settle with slightly upgraded hardware with a clean install & new configuration. Thanks again. Mike O. Thank you everyone for all the information.
I did the HP driver update, Windows update, and Exchange BPA tuning and there's not been any problems since the reboot on 9/13 (prior to my first posting). We've sucessfully migrated the remaining mailboxes and public folders from one of our production Exchange servers. I'll go through the process to verify everything is off the system then will be decomissioning the old server next week. I will check into the comments regarding the memory dump, but hopefully it was just a unique combination of misconfigured memory & drivers. My next phase of this project (mentioned in another post) is to do the same kind of migration with our second old production Exchange. Unfortunately, I don't have new hardware to migrate this to, so I'll have to settle with slightly upgraded hardware with a clean install & new configuration. Thanks again. Mike O. |
|||||||||||||||||||||||