To make the long story short – it was a dead disk in the RAID array. After 20 hours it’s finally replaced, so I hope it will take 6-8 more hours to synchronize.
Then we can finally get back to the original problem – periodic disk overload.
See also Server error, address not available forum thread.
Sleepless nights, wasted hours… Yeah back to the demotivators.
The main Idea of a raid is that you can switch a disk without loosing any data nor uptime of the server.
And the main idea of the admins and monitoring is to notice in time that the RAID is in the degraded state.
Well the RAID @my work does phone home by itself…
Someday ago a siemens technician arrived and told us that he has to switch a disk… Our EMC did phone home – there had been a message on a console but no one noticed it 😀
.
“krinlyc ” wrote:
My point. In a 24x7x365 datacenter everything must be proper monitored.