The city of Calgary is recovering from the impact of a data center fire that crippled city services and delayed hundreds of surgeries at local hospitals. The explosion and fire last Wednesday in a Shaw Communications facility knocked out both the primary and backup systems that supported key public services for local government and medical institutions.
The incident serves as a wake-up call for government agencies to ensure that the data centers that manage emergency services have recovery and failover systems that can survive as series of adversities - the "perfect storm of impossible events" that combine to defeat disaster management plans.
The outage took out the city's 311 emergency services, and Alberta's property and vehicle information databases, which were maintained in an IBM data center at the Shaw building. The outage also knocked out a medical computer network for Alberta Health Services, forcing the postponement of hundreds of elective surgical procedures. Service was restored late Friday, but the service said it would take some time to work through the backlog of procedures.
The problems began when a transformer exploded at Shaw's downtown Calgary headquarters on Wednesday afternoon. The fire set off the building's sprinkler system, which took out the backup systems, which were housed on site. On Saturday morning, Shaw officials said service had been restored to all customers.
IBM Canada, which provides many services for the province of Alberta, reportedly had to fly backup tapes holding vehicle and property registration data to a backup facility in Markham, Ontario.
Calgary family doctor Dr. John Fernandes was among those whose practice was affected by the 36-hour outage. "It's daunting because, in my practice, I have a lot of frail, sick, elderly folks. I have organ transplant recipients and people on dialysis," Fernandes told the Calgary Herald. The surgeon said his office maintains a server that mirrors the off-site server, and wonders why the province's health superboard didn't take similar precautions for a system failure of this magnitude.
But it's not the only recent example of an incident taking out key government resources.
IT systems in Dallas County were offline for more than three days last week after a water main breakflooded the basement of the Dallas County Records Building, which houses the UPS systems and other electrical equipment supporting the data center on the fifth floor of the building. The county did not have a backup data center, despite warnings that it faced the risk of service disruption without one.
By Matt McClure, Calgary Herald July 18, 2012
CALGARY - As the province's land and vehicle registries finally came back online Tuesday after six days in the dark, the telecommunications giant behind the data disaster apologized for the massive disruption.
Shaw Communications Inc. still doesn't know what caused the fire and resulting power outage at its downtown Calgary headquarters, but president Peter Bissonnette vowed the company would learn from the incident and do better.
"To the extent we've had a negative impact on many Albertans and our customers, we apologize for that," Bissonnette said in an interview.
"We're looking at ways we can beef up the redundancies that existed beforehand and make them even more synonymous with reliable service."
It began with an explosion in a 13th floor electrical room, supplied with the 26,000-volt lines that powered Shaw's hub and three floors of servers that tenant IBM Canada Ltd. uses to run data systems for Alberta Health Services, ATB Financial Ltd. and Service Alberta.
Meanwhile, the top storey fire had triggered the building's water sprinkler system which ran for more than two hours, soaking furniture, walls and sensitive electronic equipment on the floors below as it cascaded through the building.
While other data centres depend on chemical and gas systems to put out fires, the company said the water sprinklers it counted on were what the building code required.
"There are hybrid systems that can start to suppress the fire with non-water based solutions," said Jay Mehr, Shaw's vice-president of operations.
"As you get into the magnitude of what happened, we'll find out if there was a better way to put out a fire of that magnitude."
Competitor Q9 Networks Inc., which operates three data centres in the city, has early detection systems that can detect smoke and use gases that don't harm people to prevent fires from starting.
"Water needs to be there by law," said Q9 chief executive Osama Arafat, "but you want to try and deal with a fire using less destructive means first."
As a result of the blaze, about 900 Shaw employees will now be temporarily relocated to seven other company locations around the city for three months while the damage is repaired.
IBM was already a tenant when Shaw purchased the building 16 years ago, but Bissonnette said the company is now reconsidering whether the office tower is a suitable location for a data centre.
"We're looking at our own operations and should we even have a data centre here, Bissonnette said.
For now, the building's computer systems are running on backup power. Services that provided information on hospital patients to AHS were restored by IBM late last week, while the land titles and motor vehicle registered started running again early Tuesday.
Service Alberta Minister Manmeet Bhullar has promised his department will conduct a full review of the province's data services, but the Wildrose opposition says the investigation should be done by an independent, outside firm.
© Copyright (c) The Calgary Herald
Travis Wright | Apr 01, 2010 |
A Google Data Center in Seattle catches fire causing domino fail effect.
A fire at a Seattle data center, nicknamed "The Googtopia" has affected over 42,000 servers, bringing down search engine results pages across the globe. The blaze, which started after an electrical short on a Co2 handler caught fire and took out three walls surrounding the centre's electrical equipment room. So far the team has established that the servers themselves are undamaged and it looks likely that no data has been lost. "We have just been allowed into the building to physically inspect the damage," said Googtopia team in a company blog.
According to their statement, Google returned 404 errors (page not found) for over an hour in many locations around the world after a fire in one of their data centers, but not because of the fire. According to Google, the Google Data Center Fire started as only a small blaze caused by systems designed to reduce CO2 emissions. Ironically, the worst damage was caused by the fire suppression sprinklers and not the fire.
"Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our back-up generator plan based on instructions from the fire department."
Google is now working round the clock to fix the problem and is using parts from the company's seventeen other data centers to re-establish the affected sites, which include popular British online forum B3ta.com. The team had hoped to get the sites online yesterday but continuing fire service action and network issues have extended the problem.
Nearly 50,000 Google Servers were destroyed in their data centers.
While data loss was minimized by Google's swift reaction with a replacement router, A Web Guy, Mark Aaron Murnahan reported that many advertisers were falsely charged for AdWords and that Google expects it to take some time to determine the advertisers and the amount of lost ads.
Where are Google's data centers located?
Google has disclosed the sites of four new facilities announced in 2007, but many of its older data center locations remain under wraps. Much of Google's data center equipment is housed in the company's own facilities, but it also continues to lease space in a number of third-party facilities. Much of its third-party data center space is focused around peering centers in major connectivity hubs. Here's our best information about where Google is operating data centers, building new ones, or maintaining equipment for network peering. The data centers in bold were affected by the fire which caused a domino effect of fail into other data centers.
By: Rich Miller
October 10th, 2011
The long main hallway of the NYSE Euronext data center provides a sense of the immense scale of the 400,000 square foot facility in New Jersey. (Photo: Rich Miller)
The New York Stock Exchange says it expects a normal trading day after a small fire Sunday at its data center in Mahwah, New Jersey, which supports computer systems that are critical to the U.S. financial markets. The NYSE says the incident briefly interrupted connectivity for some customers, but should have no impact on its trading operations for Monday.
"On Sunday, there was an isolated electrical fire that was quickly extinguished within a single computer cabinet at our Mahwah data center," the NYSE said in a statement. "The incident, which resulted in no injuries, affected communications connectivity to 58 customers who have been notified that we are testing all systems and expect completely normal operations for Monday's market open."
The 400,000 square foot data center in Mahwah serves as the nerve center for the NYSE"s electronic trading operations. The facility is staffed around the clock by employees trained to respond to electrical fires and other emergencies. Like most major data centers, the NYSE facility has systems that monitor temperature in data halls and provide alerts in the event of sudden changes. Data centers also are equipped with sensitive fire detection systems triggered by the presence of smoke or heat, which tie into fire suppression systems.
The NYSE didn't provide details on how the fire was extinguished, but it appears that the event was quickly contained.
"We are grateful for the quick and thorough response by the Mahwah Fire and Police Departments as well as our Mahwah data center staff," the exchange said.
Fires within data centers can cause major downtime for customers, as seen in 2009 outages at Fisher Plaza and 151 Front Street. That's not always the case, as seen in a 2010 incident at Terremark's NAP of the Capital Region in Virgina in which a fire broke out in one of the data center electrical rooms, but the facility remained online throughout the entire event, with no downtime for customers.
Fortunately, the event at the NYSE was small. Another factor protecting modern data center facilities is the use of "pod" designs that segment the power and cooling systems to limit the impact of electrical events. The Mahwah facility includes three data center pods. Each is approximately 20,000 square feet, and has dedicated power and cooling systems so that a failure in one pod won't affect operations of the other data halls.
The March 19 of 2008, a Wisconsin data center was wiped out in a fire, leaving many business web sites offline. In the datacenter in GreenBay, servers, routers, and switches were destroyed at Camera Corner, a business that offered web hosting and other IT services. It took 10 days to get customer web sites back online.
This picture was taking on ITRM's website.
A total of 75 servers were destroyed and the CEO Rick Chernick indicated that the company had no live backup plan. More details on DataCenterKnowledge website.
9,000 servers temporarily affected but should be up and running soon
by KathrynV Sunday 01-Jun-2008
The Planet is a dedicated server hosting provider which operates six large data centers in Texas. One of their Houston-based data centers suffered from a fire last night which has left all servers there temporarily down which means that approximately 7,500 customers can't access their websites. One of those customers is Entrecard, a popular social networking and advertising service for bloggers. The fire apparently didn't do any actual damage to the servers but caused the power to go out which took out the service. As of this morning, they don't have an estimated time of repair but are providing hourly updates to customers here.
by Daniel Tsang | July 3/2009 |
This morning, the Authorize.net payment gateway went offline preventing thousands of vendors from accepting credit card payments including many FreshBooks customers. With their website and phone system inaccessible, Authorize.net has been fielding questions with their Twitter account this morning.
For FreshBooks customers using the Authorize.net gateways to automatically bill their customers, Authorize.net transactions that failed this morning will automatically be retried tomorrow morning. You will know this has happened as the status of your Authorize.net invoices will have changed to 'retry'.
However, if Authorize.net is not back up by tomorrow morning, your invoices will change to the 'failed' status and the status will become a red active link. The 'failed' status link allows you to retry the payment any time in the future giving you to option to retry the transaction when Authorize.net becomes available.
Please note that although Authorize.net remains offline, all of our other payment gateways continue to function normally such as PayPal, Google Checkout, and Linkpoint.
Puget Sound Business Journal by Cook, Bishop
Date: Friday, July 3, 2009, 7:07am PDT
We first got word of the fire early this morning from the online real estate service Redfin, which suffered an outage last night. Redfin's Web site was back up this morning, but we've noticed other sites that are experiencing problems. KOMO TV and radio broadcasts also were impacted, and we've noticed that Seattle's AllRecipes.com was offline too. Reports are circulating that Verizon's FIOS service, also is down, though it was unclear whether that was tied to the Fisher Plaza fire.
A blown transformer appears to be the culprit, according to a message posted on the Web site of AdHost.
"Beginning at approximately 11:18 PM on July 2nd and continuing through the present time Fisher Plaza experienced a significant power event that required all power systems including street power, UPS, and Generator power to be completely shut down in Plaza East.
The event is ongoing and, at this point, we do not have a ETA for service restoration. Please note that Adhost's Plaza West facility is not affected by this event. The Adhost phone system is not operational as a result of this event, but we are fully staffed and responding to emails sent to email@example.com .
Until further notice this will be the only conduit for communicating with Adhost. Please be patient with any requests sent into the Support email. We will process all requests as quickly as we can and we promise to answer every query.
If you have specific needs for assistance for powering up your devices, please send email to Support with specific instructions. We will do our best to accommodate your requirements. We apologize for the nature and extent of this event and are doing everything possible to restore service as quickly as possible."
Short circuit in Mumbai knocks out mobile network
6 January 2012 by Yevgeniy Sverdlik - DatacenterDynamics
A fire broke out at a Mumbai-area data center of the Indian telecommunications company Airtel in late December creating a network outage and disrupting mobile services for many of the company's customers for several hours, Indian newspaper The Economic Times reported.
The fire in the company's data center in Malad, a Mumbai suburb, led to shutdown of servers used to route calls in the region. It was reportedly caused by a short circuit.
The data center is a central point of presence for the company's Western region, which includes Mumbai, Maharashtra, Goa, Madhya Pradesh, Chhattisgarh and Gujarat.
On the day of the outage, Airtel released a statement, saying, "There was a network outage in the Western region this morning as a result of a fire in our central POP (point of presence) location in Malad, Mumbai. Our teams have been working since morning to normalize all affected services."
According to the report, the company's data services - and especially data services to many corporate customers in the region - were hit hardest.
About 17% of Airtel's entire customer base is located in the affected region. Chhattisgarh and Gujarat were the only two states in the Western region that were not affected by the outage.
Airtel has faced a similar incident in the past. The Economic Times reports that in 2008, a short circuit at another one of the company's Mumbai-area data centers had let to a large fire, causing an outage that lasted more than 12 hours.
By Winston W. Parley, 1 August 2012
Barely a week after a bloody student riot at the University of Liberia marred by stone throwing supporters of rival campus-based political parties, a raging fire Monday engulfed the UL Electronic Data Processing (EDP) center, destroying several valuables.
The extent of the damage has not been disclosed as authorities at the University say an investigation has been launched into the incident, which seems to have generated serious concern here.
It has not been established whether or not computers were damaged, as the fire left marks on the windows through which students usually submit their registration documents to EDP staff for storage into the database. The roof and parts of a nearby building [Varsity Christian Fellowship Center] facing the EDP center were also affected though not as badly as the EDP building.