Outage

2009.04.06

At roughly midnight on Apr. 4, melon.org stopped responding to pings and other network traffic. Physical access to the server was delayed until late on the night of Apr. 6. After a few false starts bringing it back up, it was determined that the filesystem journal in the root filesystem was corrupt, and that this failure was crashing the system (ironically, filesystem journals are intended to hasten recovery of a downed system after a crash). The corrupt journal has been disabled and will remain so until it can be determined whether there are underlying hardware problems (and they are fixed, if they exist).

All services were resumed at around 3:40am on Apr. 6.

If the cause is a failing disk in the root filesystem’s array, there will be some scheduled downtime in the near future to replace that disk.