Home All Groups Group Topic Archive Search About

Exchange 2003 server bluescreen/rebooting

Author
14 Sep 2007 1:15 AM
Mike O
I'm having a problem with one of our Exchange 2003 servers blue screening
and I'm not sure what's causing it.
We have a Windows 2003 A/D environment with three Exchange 2003 sp2 servers.
Everything has been running fairly smooth since we set it up 2-1/2 years
ago.  Our environment is a single forest/single domain/single site with
three domain controllers, two of them GC's.  The Exchange and DC's are in
the same subnet, all connected with 1Gb ethernet.

About 3 months ago I set up a new server.  This was new hardware, an HP
ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for
the O/S, 12-72G 15K drives in a RAID1 for the info store, and another RAID1
set for the logs.   It is running on Windows 2003 R2 sp2 Enterprise edition,
with Exchange 2003 enterprise Sp2.   It has McAfee GroupShield 6.02 for
antivirus.  It's current with patches, security updates, etc., as about 1
month ago (the Antivirus is updated every 2 hours).  It's not exposed
directly to the internet; external SMTP mail is going through a different
server.  We brought this into our exchange org and migrated about 20 users
to it.   It had been running running with no problems since then.

About a week ago we started migrating public folders and mailboxes from one
of our other servers.  I did it in stages over several nights.   We've moved
about 1,200 mailboxes and about 100 public folders as of Wednesday night.
The private info store is now 70GB, the public store is about 11GB.

Wednesday morning I ran the Exchange "Best Practices Analyzer" (apparently I
forgot to run it when I set up the system..).    It found three critical
items that I had missed.   The "/3GB" and /Userva=3030 weren't set (I can't
believe I missed those!).  Also the "SystemPages" and
"HeapDecommitFreeBlockThreshold" were not set properly (they were at their
default values, they should be 0 and 262144).  Also there was an
informational note recommending to set the A/D "msExchESEParamLogBuffers" to
9000.  I made those changes but did not reboot, since we were planning on
rebooting this weekend as part of our maintenance window.  I'm also planning
an offline defragment due to all the mailbox moves.

Thursday morning, around 11:15am, the server did a bugcheck/blue screen and
rebooted.  Fortunately it was only down for about 2 minutes.  There was no
indication of problems in the app or system logs, just the system dump error
(BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000,
0xB9C91819)) followed by the reboot and the standard "The last shutdown was
unexpected.".

Everything seemed to be OK, then around 2:45 it did it again.  Again, it
came back up in a couple of minutes, and checking the log showed no errors
or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034,
0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot.  As of 9:00pm it hasn't done
it again..

Since I had the memory parameters not set properly, I can understand the
first bugcheck/reboot, but why would it do it a second time three hours
later?  I would think the settings would have been applied after the first
reboot.

This weekend I'm scheduling an offline defrag, and will be doing a Windows
update for all "critical" updates, as well as updating the various HP
hardware drivers (this was scheduled before we started having problems).
I think I'm also going to reset the memory modules, just in case there's a
bad memory connection.

I think I'm also going to start preparing another Exchange server
(unfortunately it would not be with new hardware) just in case..

Sorry about the length of this post.  Any suggestions on other stuff I can
check, I would appreciate it.

Mike O.

Author
15 Sep 2007 3:32 AM
John Fullbright
both bug codes are an IRQL not less or equal which leads me in the direction
of a kernal mode device driver.  Have you updated any drivers lately?  On HP
I think there was an issue with the HpCISSs2 driver a while back.  Have you
tried:

http://h18007.www1.hp.com/support/files/server/us/download/27349.html

or

http://h18007.www1.hp.com/support/files/server/us/download/27348.html


Show quote
"Mike O" <put_the_spam@the.can> wrote in message
news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl...
> I'm having a problem with one of our Exchange 2003 servers blue screening
> and I'm not sure what's causing it.
> We have a Windows 2003 A/D environment with three Exchange 2003 sp2
> servers.
> Everything has been running fairly smooth since we set it up 2-1/2 years
> ago.  Our environment is a single forest/single domain/single site with
> three domain controllers, two of them GC's.  The Exchange and DC's are in
> the same subnet, all connected with 1Gb ethernet.
>
> About 3 months ago I set up a new server.  This was new hardware, an HP
> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for
> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another
> RAID1
> set for the logs.   It is running on Windows 2003 R2 sp2 Enterprise
> edition,
> with Exchange 2003 enterprise Sp2.   It has McAfee GroupShield 6.02 for
> antivirus.  It's current with patches, security updates, etc., as about 1
> month ago (the Antivirus is updated every 2 hours).  It's not exposed
> directly to the internet; external SMTP mail is going through a different
> server.  We brought this into our exchange org and migrated about 20 users
> to it.   It had been running running with no problems since then.
>
> About a week ago we started migrating public folders and mailboxes from
> one
> of our other servers.  I did it in stages over several nights.   We've
> moved
> about 1,200 mailboxes and about 100 public folders as of Wednesday night.
> The private info store is now 70GB, the public store is about 11GB.
>
> Wednesday morning I ran the Exchange "Best Practices Analyzer" (apparently
> I
> forgot to run it when I set up the system..).    It found three critical
> items that I had missed.   The "/3GB" and /Userva=3030 weren't set (I
> can't
> believe I missed those!).  Also the "SystemPages" and
> "HeapDecommitFreeBlockThreshold" were not set properly (they were at their
> default values, they should be 0 and 262144).  Also there was an
> informational note recommending to set the A/D "msExchESEParamLogBuffers"
> to
> 9000.  I made those changes but did not reboot, since we were planning on
> rebooting this weekend as part of our maintenance window.  I'm also
> planning
> an offline defragment due to all the mailbox moves.
>
> Thursday morning, around 11:15am, the server did a bugcheck/blue screen
> and
> rebooted.  Fortunately it was only down for about 2 minutes.  There was no
> indication of problems in the app or system logs, just the system dump
> error
> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000,
> 0xB9C91819)) followed by the reboot and the standard "The last shutdown
> was
> unexpected.".
>
> Everything seemed to be OK, then around 2:45 it did it again.  Again, it
> came back up in a couple of minutes, and checking the log showed no errors
> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034,
> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot.  As of 9:00pm it hasn't
> done
> it again..
>
> Since I had the memory parameters not set properly, I can understand the
> first bugcheck/reboot, but why would it do it a second time three hours
> later?  I would think the settings would have been applied after the first
> reboot.
>
> This weekend I'm scheduling an offline defrag, and will be doing a Windows
> update for all "critical" updates, as well as updating the various HP
> hardware drivers (this was scheduled before we started having problems).
> I think I'm also going to reset the memory modules, just in case there's a
> bad memory connection.
>
> I think I'm also going to start preparing another Exchange server
> (unfortunately it would not be with new hardware) just in case..
>
> Sorry about the length of this post.  Any suggestions on other stuff I can
> check, I would appreciate it.
>
> Mike O.
>
Author
15 Sep 2007 3:32 AM
John Fullbright
both bug codes are an IRQL not less or equal which leads me in the direction
of a kernal mode device driver.  Have you updated any drivers lately?  On HP
I think there was an issue with the HpCISSs2 driver a while back.  Have you
tried:

http://h18007.www1.hp.com/support/files/server/us/download/27349.html

or

http://h18007.www1.hp.com/support/files/server/us/download/27348.html


Show quote
"Mike O" <put_the_spam@the.can> wrote in message
news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl...
> I'm having a problem with one of our Exchange 2003 servers blue screening
> and I'm not sure what's causing it.
> We have a Windows 2003 A/D environment with three Exchange 2003 sp2
> servers.
> Everything has been running fairly smooth since we set it up 2-1/2 years
> ago.  Our environment is a single forest/single domain/single site with
> three domain controllers, two of them GC's.  The Exchange and DC's are in
> the same subnet, all connected with 1Gb ethernet.
>
> About 3 months ago I set up a new server.  This was new hardware, an HP
> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for
> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another
> RAID1
> set for the logs.   It is running on Windows 2003 R2 sp2 Enterprise
> edition,
> with Exchange 2003 enterprise Sp2.   It has McAfee GroupShield 6.02 for
> antivirus.  It's current with patches, security updates, etc., as about 1
> month ago (the Antivirus is updated every 2 hours).  It's not exposed
> directly to the internet; external SMTP mail is going through a different
> server.  We brought this into our exchange org and migrated about 20 users
> to it.   It had been running running with no problems since then.
>
> About a week ago we started migrating public folders and mailboxes from
> one
> of our other servers.  I did it in stages over several nights.   We've
> moved
> about 1,200 mailboxes and about 100 public folders as of Wednesday night.
> The private info store is now 70GB, the public store is about 11GB.
>
> Wednesday morning I ran the Exchange "Best Practices Analyzer" (apparently
> I
> forgot to run it when I set up the system..).    It found three critical
> items that I had missed.   The "/3GB" and /Userva=3030 weren't set (I
> can't
> believe I missed those!).  Also the "SystemPages" and
> "HeapDecommitFreeBlockThreshold" were not set properly (they were at their
> default values, they should be 0 and 262144).  Also there was an
> informational note recommending to set the A/D "msExchESEParamLogBuffers"
> to
> 9000.  I made those changes but did not reboot, since we were planning on
> rebooting this weekend as part of our maintenance window.  I'm also
> planning
> an offline defragment due to all the mailbox moves.
>
> Thursday morning, around 11:15am, the server did a bugcheck/blue screen
> and
> rebooted.  Fortunately it was only down for about 2 minutes.  There was no
> indication of problems in the app or system logs, just the system dump
> error
> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000,
> 0xB9C91819)) followed by the reboot and the standard "The last shutdown
> was
> unexpected.".
>
> Everything seemed to be OK, then around 2:45 it did it again.  Again, it
> came back up in a couple of minutes, and checking the log showed no errors
> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034,
> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot.  As of 9:00pm it hasn't
> done
> it again..
>
> Since I had the memory parameters not set properly, I can understand the
> first bugcheck/reboot, but why would it do it a second time three hours
> later?  I would think the settings would have been applied after the first
> reboot.
>
> This weekend I'm scheduling an offline defrag, and will be doing a Windows
> update for all "critical" updates, as well as updating the various HP
> hardware drivers (this was scheduled before we started having problems).
> I think I'm also going to reset the memory modules, just in case there's a
> bad memory connection.
>
> I think I'm also going to start preparing another Exchange server
> (unfortunately it would not be with new hardware) just in case..
>
> Sorry about the length of this post.  Any suggestions on other stuff I can
> check, I would appreciate it.
>
> Mike O.
>
Author
15 Sep 2007 4:10 AM
Mike O
The drivers haven't been changed since around mid-late June.  That's what
seems so odd about this, why did it take so long for the errors to pop up?

I looked into the HPCISS driver issue.  It was released early 2006, but when
I set this server up last May I applied all the driver and BIOS updates at
that time, so I don't think that's an issue.

We haven't had any reboots since the 2nd one on Thursday.   We have our
monthly maintenance window tonight and tomorrow.  I'm currently running an
offline defrag on the server due to all the mailbox moves and tomorrow
morning I'm going to update all the HP drivers & do a "Windows Update" for
anything critical.


Show quote
"John Fullbright" <fjohn@donotspamenetappdotcom> wrote in message
news:%23YMfPk09HHA.5404@TK2MSFTNGP02.phx.gbl...
> both bug codes are an IRQL not less or equal which leads me in the
> direction of a kernal mode device driver.  Have you updated any drivers
> lately?  On HP I think there was an issue with the HpCISSs2 driver a while
> back.  Have you tried:
>
> http://h18007.www1.hp.com/support/files/server/us/download/27349.html
>
> or
>
> http://h18007.www1.hp.com/support/files/server/us/download/27348.html
>
>
> "Mike O" <put_the_spam@the.can> wrote in message
> news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl...
>> I'm having a problem with one of our Exchange 2003 servers blue screening
>> and I'm not sure what's causing it.
>> We have a Windows 2003 A/D environment with three Exchange 2003 sp2
>> servers.
>> Everything has been running fairly smooth since we set it up 2-1/2 years
>> ago.  Our environment is a single forest/single domain/single site with
>> three domain controllers, two of them GC's.  The Exchange and DC's are in
>> the same subnet, all connected with 1Gb ethernet.
>>
>> About 3 months ago I set up a new server.  This was new hardware, an HP
>> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for
>> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another
>> RAID1
>> set for the logs.   It is running on Windows 2003 R2 sp2 Enterprise
>> edition,
>> with Exchange 2003 enterprise Sp2.   It has McAfee GroupShield 6.02 for
>> antivirus.  It's current with patches, security updates, etc., as about 1
>> month ago (the Antivirus is updated every 2 hours).  It's not exposed
>> directly to the internet; external SMTP mail is going through a different
>> server.  We brought this into our exchange org and migrated about 20
>> users
>> to it.   It had been running running with no problems since then.
>>
>> About a week ago we started migrating public folders and mailboxes from
>> one
>> of our other servers.  I did it in stages over several nights.   We've
>> moved
>> about 1,200 mailboxes and about 100 public folders as of Wednesday night.
>> The private info store is now 70GB, the public store is about 11GB.
>>
>> Wednesday morning I ran the Exchange "Best Practices Analyzer"
>> (apparently I
>> forgot to run it when I set up the system..).    It found three critical
>> items that I had missed.   The "/3GB" and /Userva=3030 weren't set (I
>> can't
>> believe I missed those!).  Also the "SystemPages" and
>> "HeapDecommitFreeBlockThreshold" were not set properly (they were at
>> their
>> default values, they should be 0 and 262144).  Also there was an
>> informational note recommending to set the A/D "msExchESEParamLogBuffers"
>> to
>> 9000.  I made those changes but did not reboot, since we were planning on
>> rebooting this weekend as part of our maintenance window.  I'm also
>> planning
>> an offline defragment due to all the mailbox moves.
>>
>> Thursday morning, around 11:15am, the server did a bugcheck/blue screen
>> and
>> rebooted.  Fortunately it was only down for about 2 minutes.  There was
>> no
>> indication of problems in the app or system logs, just the system dump
>> error
>> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000,
>> 0xB9C91819)) followed by the reboot and the standard "The last shutdown
>> was
>> unexpected.".
>>
>> Everything seemed to be OK, then around 2:45 it did it again.  Again, it
>> came back up in a couple of minutes, and checking the log showed no
>> errors
>> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034,
>> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot.  As of 9:00pm it hasn't
>> done
>> it again..
>>
>> Since I had the memory parameters not set properly, I can understand the
>> first bugcheck/reboot, but why would it do it a second time three hours
>> later?  I would think the settings would have been applied after the
>> first
>> reboot.
>>
>> This weekend I'm scheduling an offline defrag, and will be doing a
>> Windows
>> update for all "critical" updates, as well as updating the various HP
>> hardware drivers (this was scheduled before we started having problems).
>> I think I'm also going to reset the memory modules, just in case there's
>> a
>> bad memory connection.
>>
>> I think I'm also going to start preparing another Exchange server
>> (unfortunately it would not be with new hardware) just in case..
>>
>> Sorry about the length of this post.  Any suggestions on other stuff I
>> can
>> check, I would appreciate it.
>>
>> Mike O.
>>
>
>
Author
17 Sep 2007 10:05 PM
John Fullbright
If you have the dump file (memory.dmp) download the debugging tools.  run
windbg and set the symbol path to the symsvr, and load the dump file.  Do a
!Analyze, and it'll tell you the most likely source of the issue (driver
that caused the problem).


Show quote
"Mike O" <put_the_spam@the.can> wrote in message
news:OCDPj509HHA.484@TK2MSFTNGP06.phx.gbl...
> The drivers haven't been changed since around mid-late June.  That's what
> seems so odd about this, why did it take so long for the errors to pop up?
>
> I looked into the HPCISS driver issue.  It was released early 2006, but
> when I set this server up last May I applied all the driver and BIOS
> updates at that time, so I don't think that's an issue.
>
> We haven't had any reboots since the 2nd one on Thursday.   We have our
> monthly maintenance window tonight and tomorrow.  I'm currently running an
> offline defrag on the server due to all the mailbox moves and tomorrow
> morning I'm going to update all the HP drivers & do a "Windows Update" for
> anything critical.
>
>
> "John Fullbright" <fjohn@donotspamenetappdotcom> wrote in message
> news:%23YMfPk09HHA.5404@TK2MSFTNGP02.phx.gbl...
>> both bug codes are an IRQL not less or equal which leads me in the
>> direction of a kernal mode device driver.  Have you updated any drivers
>> lately?  On HP I think there was an issue with the HpCISSs2 driver a
>> while back.  Have you tried:
>>
>> http://h18007.www1.hp.com/support/files/server/us/download/27349.html
>>
>> or
>>
>> http://h18007.www1.hp.com/support/files/server/us/download/27348.html
>>
>>
>> "Mike O" <put_the_spam@the.can> wrote in message
>> news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl...
>>> I'm having a problem with one of our Exchange 2003 servers blue
>>> screening
>>> and I'm not sure what's causing it.
>>> We have a Windows 2003 A/D environment with three Exchange 2003 sp2
>>> servers.
>>> Everything has been running fairly smooth since we set it up 2-1/2 years
>>> ago.  Our environment is a single forest/single domain/single site with
>>> three domain controllers, two of them GC's.  The Exchange and DC's are
>>> in
>>> the same subnet, all connected with 1Gb ethernet.
>>>
>>> About 3 months ago I set up a new server.  This was new hardware, an HP
>>> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1
>>> for
>>> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another
>>> RAID1
>>> set for the logs.   It is running on Windows 2003 R2 sp2 Enterprise
>>> edition,
>>> with Exchange 2003 enterprise Sp2.   It has McAfee GroupShield 6.02 for
>>> antivirus.  It's current with patches, security updates, etc., as about
>>> 1
>>> month ago (the Antivirus is updated every 2 hours).  It's not exposed
>>> directly to the internet; external SMTP mail is going through a
>>> different
>>> server.  We brought this into our exchange org and migrated about 20
>>> users
>>> to it.   It had been running running with no problems since then.
>>>
>>> About a week ago we started migrating public folders and mailboxes from
>>> one
>>> of our other servers.  I did it in stages over several nights.   We've
>>> moved
>>> about 1,200 mailboxes and about 100 public folders as of Wednesday
>>> night.
>>> The private info store is now 70GB, the public store is about 11GB.
>>>
>>> Wednesday morning I ran the Exchange "Best Practices Analyzer"
>>> (apparently I
>>> forgot to run it when I set up the system..).    It found three critical
>>> items that I had missed.   The "/3GB" and /Userva=3030 weren't set (I
>>> can't
>>> believe I missed those!).  Also the "SystemPages" and
>>> "HeapDecommitFreeBlockThreshold" were not set properly (they were at
>>> their
>>> default values, they should be 0 and 262144).  Also there was an
>>> informational note recommending to set the A/D
>>> "msExchESEParamLogBuffers" to
>>> 9000.  I made those changes but did not reboot, since we were planning
>>> on
>>> rebooting this weekend as part of our maintenance window.  I'm also
>>> planning
>>> an offline defragment due to all the mailbox moves.
>>>
>>> Thursday morning, around 11:15am, the server did a bugcheck/blue screen
>>> and
>>> rebooted.  Fortunately it was only down for about 2 minutes.  There was
>>> no
>>> indication of problems in the app or system logs, just the system dump
>>> error
>>> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000,
>>> 0xB9C91819)) followed by the reboot and the standard "The last shutdown
>>> was
>>> unexpected.".
>>>
>>> Everything seemed to be OK, then around 2:45 it did it again.  Again, it
>>> came back up in a couple of minutes, and checking the log showed no
>>> errors
>>> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034,
>>> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot.  As of 9:00pm it hasn't
>>> done
>>> it again..
>>>
>>> Since I had the memory parameters not set properly, I can understand the
>>> first bugcheck/reboot, but why would it do it a second time three hours
>>> later?  I would think the settings would have been applied after the
>>> first
>>> reboot.
>>>
>>> This weekend I'm scheduling an offline defrag, and will be doing a
>>> Windows
>>> update for all "critical" updates, as well as updating the various HP
>>> hardware drivers (this was scheduled before we started having problems).
>>> I think I'm also going to reset the memory modules, just in case there's
>>> a
>>> bad memory connection.
>>>
>>> I think I'm also going to start preparing another Exchange server
>>> (unfortunately it would not be with new hardware) just in case..
>>>
>>> Sorry about the length of this post.  Any suggestions on other stuff I
>>> can
>>> check, I would appreciate it.
>>>
>>> Mike O.
>>>
>>
>>
>
Author
15 Sep 2007 4:10 AM
Mike O
The drivers haven't been changed since around mid-late June.  That's what
seems so odd about this, why did it take so long for the errors to pop up?

I looked into the HPCISS driver issue.  It was released early 2006, but when
I set this server up last May I applied all the driver and BIOS updates at
that time, so I don't think that's an issue.

We haven't had any reboots since the 2nd one on Thursday.   We have our
monthly maintenance window tonight and tomorrow.  I'm currently running an
offline defrag on the server due to all the mailbox moves and tomorrow
morning I'm going to update all the HP drivers & do a "Windows Update" for
anything critical.


Show quote
"John Fullbright" <fjohn@donotspamenetappdotcom> wrote in message
news:%23YMfPk09HHA.5404@TK2MSFTNGP02.phx.gbl...
> both bug codes are an IRQL not less or equal which leads me in the
> direction of a kernal mode device driver.  Have you updated any drivers
> lately?  On HP I think there was an issue with the HpCISSs2 driver a while
> back.  Have you tried:
>
> http://h18007.www1.hp.com/support/files/server/us/download/27349.html
>
> or
>
> http://h18007.www1.hp.com/support/files/server/us/download/27348.html
>
>
> "Mike O" <put_the_spam@the.can> wrote in message
> news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl...
>> I'm having a problem with one of our Exchange 2003 servers blue screening
>> and I'm not sure what's causing it.
>> We have a Windows 2003 A/D environment with three Exchange 2003 sp2
>> servers.
>> Everything has been running fairly smooth since we set it up 2-1/2 years
>> ago.  Our environment is a single forest/single domain/single site with
>> three domain controllers, two of them GC's.  The Exchange and DC's are in
>> the same subnet, all connected with 1Gb ethernet.
>>
>> About 3 months ago I set up a new server.  This was new hardware, an HP
>> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1 for
>> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another
>> RAID1
>> set for the logs.   It is running on Windows 2003 R2 sp2 Enterprise
>> edition,
>> with Exchange 2003 enterprise Sp2.   It has McAfee GroupShield 6.02 for
>> antivirus.  It's current with patches, security updates, etc., as about 1
>> month ago (the Antivirus is updated every 2 hours).  It's not exposed
>> directly to the internet; external SMTP mail is going through a different
>> server.  We brought this into our exchange org and migrated about 20
>> users
>> to it.   It had been running running with no problems since then.
>>
>> About a week ago we started migrating public folders and mailboxes from
>> one
>> of our other servers.  I did it in stages over several nights.   We've
>> moved
>> about 1,200 mailboxes and about 100 public folders as of Wednesday night.
>> The private info store is now 70GB, the public store is about 11GB.
>>
>> Wednesday morning I ran the Exchange "Best Practices Analyzer"
>> (apparently I
>> forgot to run it when I set up the system..).    It found three critical
>> items that I had missed.   The "/3GB" and /Userva=3030 weren't set (I
>> can't
>> believe I missed those!).  Also the "SystemPages" and
>> "HeapDecommitFreeBlockThreshold" were not set properly (they were at
>> their
>> default values, they should be 0 and 262144).  Also there was an
>> informational note recommending to set the A/D "msExchESEParamLogBuffers"
>> to
>> 9000.  I made those changes but did not reboot, since we were planning on
>> rebooting this weekend as part of our maintenance window.  I'm also
>> planning
>> an offline defragment due to all the mailbox moves.
>>
>> Thursday morning, around 11:15am, the server did a bugcheck/blue screen
>> and
>> rebooted.  Fortunately it was only down for about 2 minutes.  There was
>> no
>> indication of problems in the app or system logs, just the system dump
>> error
>> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000,
>> 0xB9C91819)) followed by the reboot and the standard "The last shutdown
>> was
>> unexpected.".
>>
>> Everything seemed to be OK, then around 2:45 it did it again.  Again, it
>> came back up in a couple of minutes, and checking the log showed no
>> errors
>> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034,
>> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot.  As of 9:00pm it hasn't
>> done
>> it again..
>>
>> Since I had the memory parameters not set properly, I can understand the
>> first bugcheck/reboot, but why would it do it a second time three hours
>> later?  I would think the settings would have been applied after the
>> first
>> reboot.
>>
>> This weekend I'm scheduling an offline defrag, and will be doing a
>> Windows
>> update for all "critical" updates, as well as updating the various HP
>> hardware drivers (this was scheduled before we started having problems).
>> I think I'm also going to reset the memory modules, just in case there's
>> a
>> bad memory connection.
>>
>> I think I'm also going to start preparing another Exchange server
>> (unfortunately it would not be with new hardware) just in case..
>>
>> Sorry about the length of this post.  Any suggestions on other stuff I
>> can
>> check, I would appreciate it.
>>
>> Mike O.
>>
>
>
Author
17 Sep 2007 10:05 PM
John Fullbright
If you have the dump file (memory.dmp) download the debugging tools.  run
windbg and set the symbol path to the symsvr, and load the dump file.  Do a
!Analyze, and it'll tell you the most likely source of the issue (driver
that caused the problem).


Show quote
"Mike O" <put_the_spam@the.can> wrote in message
news:OCDPj509HHA.484@TK2MSFTNGP06.phx.gbl...
> The drivers haven't been changed since around mid-late June.  That's what
> seems so odd about this, why did it take so long for the errors to pop up?
>
> I looked into the HPCISS driver issue.  It was released early 2006, but
> when I set this server up last May I applied all the driver and BIOS
> updates at that time, so I don't think that's an issue.
>
> We haven't had any reboots since the 2nd one on Thursday.   We have our
> monthly maintenance window tonight and tomorrow.  I'm currently running an
> offline defrag on the server due to all the mailbox moves and tomorrow
> morning I'm going to update all the HP drivers & do a "Windows Update" for
> anything critical.
>
>
> "John Fullbright" <fjohn@donotspamenetappdotcom> wrote in message
> news:%23YMfPk09HHA.5404@TK2MSFTNGP02.phx.gbl...
>> both bug codes are an IRQL not less or equal which leads me in the
>> direction of a kernal mode device driver.  Have you updated any drivers
>> lately?  On HP I think there was an issue with the HpCISSs2 driver a
>> while back.  Have you tried:
>>
>> http://h18007.www1.hp.com/support/files/server/us/download/27349.html
>>
>> or
>>
>> http://h18007.www1.hp.com/support/files/server/us/download/27348.html
>>
>>
>> "Mike O" <put_the_spam@the.can> wrote in message
>> news:%23aZcPzm9HHA.4712@TK2MSFTNGP04.phx.gbl...
>>> I'm having a problem with one of our Exchange 2003 servers blue
>>> screening
>>> and I'm not sure what's causing it.
>>> We have a Windows 2003 A/D environment with three Exchange 2003 sp2
>>> servers.
>>> Everything has been running fairly smooth since we set it up 2-1/2 years
>>> ago.  Our environment is a single forest/single domain/single site with
>>> three domain controllers, two of them GC's.  The Exchange and DC's are
>>> in
>>> the same subnet, all connected with 1Gb ethernet.
>>>
>>> About 3 months ago I set up a new server.  This was new hardware, an HP
>>> ML570 server, four 2.8Ghz dual core CPU's, 4G of RAM, two drive RAID1
>>> for
>>> the O/S, 12-72G 15K drives in a RAID1 for the info store, and another
>>> RAID1
>>> set for the logs.   It is running on Windows 2003 R2 sp2 Enterprise
>>> edition,
>>> with Exchange 2003 enterprise Sp2.   It has McAfee GroupShield 6.02 for
>>> antivirus.  It's current with patches, security updates, etc., as about
>>> 1
>>> month ago (the Antivirus is updated every 2 hours).  It's not exposed
>>> directly to the internet; external SMTP mail is going through a
>>> different
>>> server.  We brought this into our exchange org and migrated about 20
>>> users
>>> to it.   It had been running running with no problems since then.
>>>
>>> About a week ago we started migrating public folders and mailboxes from
>>> one
>>> of our other servers.  I did it in stages over several nights.   We've
>>> moved
>>> about 1,200 mailboxes and about 100 public folders as of Wednesday
>>> night.
>>> The private info store is now 70GB, the public store is about 11GB.
>>>
>>> Wednesday morning I ran the Exchange "Best Practices Analyzer"
>>> (apparently I
>>> forgot to run it when I set up the system..).    It found three critical
>>> items that I had missed.   The "/3GB" and /Userva=3030 weren't set (I
>>> can't
>>> believe I missed those!).  Also the "SystemPages" and
>>> "HeapDecommitFreeBlockThreshold" were not set properly (they were at
>>> their
>>> default values, they should be 0 and 262144).  Also there was an
>>> informational note recommending to set the A/D
>>> "msExchESEParamLogBuffers" to
>>> 9000.  I made those changes but did not reboot, since we were planning
>>> on
>>> rebooting this weekend as part of our maintenance window.  I'm also
>>> planning
>>> an offline defragment due to all the mailbox moves.
>>>
>>> Thursday morning, around 11:15am, the server did a bugcheck/blue screen
>>> and
>>> rebooted.  Fortunately it was only down for about 2 minutes.  There was
>>> no
>>> indication of problems in the app or system logs, just the system dump
>>> error
>>> (BugCheck, STOP: 0x000000D1 (0x0000000C, 0xD0000002, 0x00000000,
>>> 0xB9C91819)) followed by the reboot and the standard "The last shutdown
>>> was
>>> unexpected.".
>>>
>>> Everything seemed to be OK, then around 2:45 it did it again.  Again, it
>>> came back up in a couple of minutes, and checking the log showed no
>>> errors
>>> or warnings, just the bugcheck error (STOP: 0x0000000A (0x00000034,
>>> 0xD0000002, 0x00000001, 0xE0A7E4A9) and reboot.  As of 9:00pm it hasn't
>>> done
>>> it again..
>>>
>>> Since I had the memory parameters not set properly, I can understand the
>>> first bugcheck/reboot, but why would it do it a second time three hours
>>> later?  I would think the settings would have been applied after the
>>> first
>>> reboot.
>>>
>>> This weekend I'm scheduling an offline defrag, and will be doing a
>>> Windows
>>> update for all "critical" updates, as well as updating the various HP
>>> hardware drivers (this was scheduled before we started having problems).
>>> I think I'm also going to reset the memory modules, just in case there's
>>> a
>>> bad memory connection.
>>>
>>> I think I'm also going to start preparing another Exchange server
>>> (unfortunately it would not be with new hardware) just in case..
>>>
>>> Sorry about the length of this post.  Any suggestions on other stuff I
>>> can
>>> check, I would appreciate it.
>>>
>>> Mike O.
>>>
>>
>>
>
Author
21 Sep 2007 1:01 AM
Mike O
Thank you everyone for all the information.

I did the HP driver update, Windows update, and Exchange BPA tuning and
there's not been any problems since the reboot on 9/13 (prior to my first
posting).  We've sucessfully migrated the remaining mailboxes and public
folders from one of our production Exchange servers.   I'll go through the
process to verify everything is off the system then will be decomissioning
the old server next week.  I will check into the comments regarding the
memory dump, but hopefully it was just a unique combination of misconfigured
memory & drivers.

My next phase of this project (mentioned in another post) is to do the same
kind of migration with our second old production Exchange.  Unfortunately, I
don't have new hardware to migrate this to, so I'll have to settle with
slightly upgraded hardware with a clean install & new configuration.

Thanks again.

Mike O.
Author
21 Sep 2007 1:01 AM
Mike O
Thank you everyone for all the information.

I did the HP driver update, Windows update, and Exchange BPA tuning and
there's not been any problems since the reboot on 9/13 (prior to my first
posting).  We've sucessfully migrated the remaining mailboxes and public
folders from one of our production Exchange servers.   I'll go through the
process to verify everything is off the system then will be decomissioning
the old server next week.  I will check into the comments regarding the
memory dump, but hopefully it was just a unique combination of misconfigured
memory & drivers.

My next phase of this project (mentioned in another post) is to do the same
kind of migration with our second old production Exchange.  Unfortunately, I
don't have new hardware to migrate this to, so I'll have to settle with
slightly upgraded hardware with a clean install & new configuration.

Thanks again.

Mike O.

AddThis Social Bookmark Button