Board logo

标题: [网普科技]关于2011-04-19 Seattle机房网络故障的说明&致歉 [打印本页]

作者: 网普科技     时间: 2011-4-19 09:46 PM    标题: [网普科技]关于2011-04-19 Seattle机房网络故障的说明&致歉

简要说明

时间窗:北京时间2011-04-19 12:00-16:00
网络中断:约1.5~2小时
受影响服务器:Seattle机房服务器及其上站点
未受影响服务:Dallas所有服务器以及其上站点未受影响
故障原因:机房内部网络设备故障


详细信息

约北京时间12:25我们接到报告Seattle服务器上的主机无法访问。
经测试后确认此问题。

与机房联系后,机房反馈为机房内部网络设备故障
CODE:  [Copy to clipboard]
Hello,

We have identified a forwarding issue for customers behind fcr01.sea01 that is causing intermittent connectivity problems. Network Engineering is applying a manual fix in attempt to resolve.

The server will be accessible soon.
同时机房也发布了关于此问题的公告
CODE:  [Copy to clipboard]
SoftLayer Engineers are aware of the sporadic packetloss and/or connectivity to FCR01.SEA01. Currently engineers are working on resolving this issue; however, there is a chance a reboot to the router will be required. In the event this needs to happen, a notice will be posted here along with any other information gathered.
机房工程师对此问题进行了处理
CODE:  [Copy to clipboard]
-- Update --
Service has been restored to customers behind FCR01.SEA01. During the process of working on the router, the issue manifested itself into 100% CPU resulting in upstream and downstream links going down along with routing protocols. Engineers were able to stabilize the router without a reload, and are currently monitoring it to determine if the fix is permanent.
但是不久后,接着又升级了IOS version
[/code]
-- UPDATE --
Engineers have determined this is not a permanent fix for the FCR01.SEA01 issue. During the course of troubleshooting this issue with Cisco, it has been determined that the best course of action is to upgrade the router to the latest IOS version. This will be happening at approximately 01:30 CDT.
[/code]

上述操作过程中,主机访问中断两大次(期间有短暂的恢复)
我们的日志以及流量图上估算,造成了服务器大致1.5~2个小时无法访问。

上述操作后,主机的访问回复正常。

机房最新更新
CODE:  [Copy to clipboard]
--UPDATE--
Engineers are continuing to work with Cisco TAC on this issue. At this point, the router has been restored to service at approximately 03:20 CDT. Some customers may continue to experience intermittent packetloss behind FCR01.SEA01.
但是经过我们的多方测试,我们的seattle主机已经恢复正常。
(国内部分地区访问丢包是今天国内网络的问题,国外用户访问不受影响)



对由此给广大客户带来的不便表示诚挚的歉意。
我们将继续监控和关注此问题,并及时更新。
用户有问题可以及时与我们取得联系,紧急问题随时电话联系。

Dallas所有服务器以及其上站点未受影响,用户不必担心。
作者: 网普科技     时间: 2011-4-19 10:24 PM
用户现在访问我们的主机可能依旧会存在丢包、速度慢、无法连接等问题
但是,这并非因为此次Seattle机房故障导致。

今天从国内访问很多国外站点都存在此丢包、速度慢、无法连接等问题
国外测试则均正常,国外客户客户访问不受影响。
请用户放心。
作者: 网普科技     时间: 2011-4-21 09:24 AM
经过两日来的观测,Seattle DC的网络很稳定很快速
没有再发生任何异常。




欢迎光临 网普技术论坛 (http://bbs.netpu.net/) Powered by Discuz! 2.5