I still have to setup the sensors to read the CPU temperature,
but I got some more additional output that seems to be linked to the
problem:
Sep 4 16:29:32 camera1 kernel: e1000: eth0: e1000_clean_tx_irq:
Detected Tx Unit Hang
Sep 4 16:29:32 camera1 kernel: Tx Queue <0>
Sep 4 16:29:32 camera1 kernel: TDH <6>
Sep 4 16:29:32 camera1 kernel: TDT <49>
Sep 4 16:29:32 camera1 kernel: next_to_use <49>
Sep 4 16:29:32 camera1 kernel: next_to_clean <6>
Sep 4 16:29:32 camera1 kernel: buffer_info[next_to_clean]
Sep 4 16:29:32 camera1 kernel: time_stamp <66014>
Sep 4 16:29:32 camera1 kernel: next_to_watch <b>
Sep 4 16:29:32 camera1 kernel: jiffies <667eb>
Sep 4 16:29:32 camera1 kernel: next_to_watch.status <0>
Sep 4 16:29:34 camera1 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Sep 4 16:29:36 camera1 kernel: e1000: eth0: e1000_watchdog_task: NIC
Link is Up 100 Mbps Full Duplex
I have no idea if it's bad or very bad.
--
Mitja
Tony Nelson wrote:
At 4:24 PM +0200 9/4/06, Mitja Mihelic wrote:
We're running FC4 with the 2.6.16-1.2115_FC4 kernel on an Intel
based server with a 3GHz P4 HT CPU.
It runs OK for a while and then it starts to output messages at the
rate of about 65 per second, and then it freezes.
Here's a sample output:
Sep 4 10:39:26 camserver kernel: Do you have a strange power saving
mode enabled?
Sep 4 10:39:26 camserver kernel: Uhhuh. NMI received for unknown
reason 29 on CPU 0.
Sep 4 10:39:26 camserver kernel: Dazed and confused, but trying to
continue
Sep 4 10:39:26 camserver kernel: Do you have a strange power saving
mode enabled?
Sep 4 10:39:26 camserver kernel: Uhhuh. NMI received for unknown
reason 39 on CPU 0.
Sep 4 10:39:26 camserver kernel: Dazed and confused, but trying to
continue
Sep 4 10:39:26 camserver kernel: Do you have a strange power saving
mode enabled?
The obvious question is: what is this thing anyway, and how do we
make it go away ?
WAG: is the CPU too hot?