Delivering data center power and the understanding the issues around it can be challenging. We’ve put together this overview of regulation, power systems and various state-of-the-art ways of addressing the areas of power delivery, power reliability, and power costs (“e” costs) associated with data centers and the data center industry.
Topics addressed include:
- The key elements in the critical power delivery system.
- Redundancy and challenges in the critical power delivery system.
- Power substations, Independent System Operators, and the U.S. transmission “grid”.
- Distinctions in the power delivery systems between TIA-942 Tier IV and Tier III data centers.
- Proper testing and maintenance protocols for power delivery systems and the overall data center.
- Shortcomings in the critical power delivery system and why and how every data center should be protected from them with specific state-of-the-art systems-level monitoring and management capabilities.
There is no more important aspect of a data center than its power and power quality. A block diagram of an actual AiNET TIA-942 Tier IV datacenter is shown below. In the diagram, power flow is shown from main utility power to delivery to Customer racks/servers. For clarity, it is separated into four sections: Main Power Input, Backup Power, Uninterruptible Power Supply (UPS), and Delivery.
The related power delivery system management diagram is not shown, but is represented by the sensors/controls and discussed below. It is important to note that the “red squares” and “blue dots” are absolutely necessary to develop a continuous and overall sense of the power system’s health and performance — as well as to predict impending maintenance and failure conditions.
Main Power Input Section
For the highest reliability, all modern, high performance data centers supporting mission-critical data and applications have the utility Main Power delivered by at least two diverse feeds originating from separate utility substations. This represents the first step in power reliability and protection. When multiple feeders from diverse sources are present, the fault-tolerance demands on the data center, in practice, tend to reduce significantly. However, some datacenters get dual feeds from a single substation or a single feed – this dramatically reduces the reliability of utility power. It is important to match the capabilities of a data center with your reliability requirements.
As illustrated, the substation is part of the utility’s electrical distribution system. A substation takes multiple transmission (13Kv – 765Kv, 3 phase) power feeds and combines/transforms them to medium (13-80Kv, 3 phase) voltage power for distribution. Under normal conditions, both feeds should always be available at the data center — whenever both are missing or unusable, the data center must generate its own power.
To eliminate some of the risks associated with multiple independent power feeders, they are delivered synchronized (by regulation), so that if one feed is lost the switchover is nearly invisible to machinery (like air conditioning system chillers, air handlers and other large mechanical systems).
The Tie Switch is part of AiNET’s on-site substation, combining multiple independent feeders from larger substations. The Tie Switch directs either feed to either output or to both outputs. As suggested in the diagram, the Tie Switch is actually a paired (redundant) device so it is concurrently maintainable. The Tie Switch contains advanced electronics to monitor the quality of each feeder (phase, synchronization, voltage, etc) to independently decide which of each feeder (or both) should be utilized to maximize uptime.
Concurrently maintainable means that one part of the system can be serviced (or replaced, or fail) without affecting the normal operation of the remaining part. Maintainability is a key factor in proper data center operations and critical for high reliability.
The first Transformer takes the 3-phase, input medium voltage power and transforms it to lower voltage, typically 3-phase, 480 volts (distribution power) where each phase is 277 volts separated by 120 degrees. 480V power is considered standard for most commercial/industrial applications. Medium voltage and above are only used for transmission between the power companies (and themselves) and large customers.
Typically the Tie Switch, On-site Substation and Transformers in the Main Power section are owned and operated by the utility company. By contrast, AiNET owns and maintains these elements, which enable more frequent checks and maintenance than would otherwise be performed. For example, AiNET conducts infra-red scans, power readings and ensures Main Power synchronization at higher levels than utility standards. AiNET’s ownership and control of these items allows AiNET to specify higher quality components and better physical security for them. In a traditional arrangement, the utility has a common set of keys and locks available to hundreds of utility personnel at any one time — an often undiscussed security risk.
All Main Power elements are in highly-secure, vandalism-resistant enclosures protected by high security fencing and under continual surveillance by infra-red/low-light cameras. These areas are regularly and routinely patrolled to maintain proper operation and security controls are in place.
On the power grid: an element of confusion is the use of the word grid instead of substations. The U.S. transmission system (the grid) is operated by a number of interconnected Independent System Operators (ISO). ISO’s pool and control energy over large sections of the national grid, and manage energy markets within their regions. Examples of ISO’s include the California ISO (CAISO), the New England ISO (ISO-NE) and PJM, which serves several mid-Atlantic states and extends into portions of Great Lake states. PJM and its component transmission distribution companies are shown in the figure below.
Note that transmission distribution, which is commonly regulated at the state level, is largely separated from largely-deregulated, electrical generation. Many will see this separation even on household electric bills.
Backup Power Section
When both main power feeders are interrupted, a data center turns to its backup power. The backup power section is comprised of the backup power source, typically the Generators and their fuel, and a Transfer Switch (often also called Automatic Transfer Switch or ATS).
The Transfer Switch monitors Main Power and initiates an internal hold-time countdown upon detection of loss of Main Power. Only if the Main Power is unavailable longer than the hold-time will the Transfer Switch signal the Generator to engage. The Transfer Switch regards Main Power in a binary manner, as either totally available or totally unavailable. After the Generators are up to speed and stabilized (<30 seconds) they are switched into the power path. From the loss of Main Power until the Generator(s) are fully engaged, the UPS Section powers the data center.
The second Tie Switch shown in the diagram allows for any component on either side of the Main or Backup Power section to be replaced or maintained without service interruption or vulnerability to critical data center functions.
A problem in data center power is that a full outage is not the only shortcoming in Main Power; many other shortcomings, e.g. a stream of transients, loss or diminution of voltage, or loss or diminution of a phase, are not detected by the Transfer Switch. This problem is part of what the patented, AiNET Critical Power Protection Supervisor resolves (discussed below).
The Generators as shown in the diagram are a true 2N redundancy solution for backup power, but other Generator configurations can be employed. The 2N approach depicted has the added benefit of being free of the problems attributable to starting, synchronizing, or throttle controls associated with teamed Generators.
For the highest level of data center operations, all Generators should be tested weekly and at least monthly with full load. The best practice of high-quality data centers, such as AiNET, is to test the full load of the data center itself, which tests all systems — including transfer systems — and not an external, unrealistic, load bank.
Tier IV tip: Its not enough to simply test your generators and keep them fueled. It is important to keep them heated to at least 130F at all times. This ensures the prompt “warm” start up when needed — even in middle of winter. This smooth start up improves the quality of the electricity generated and reduces stress on your power system.
Generator Fuel is a critical element to generator operations, so the amount of fuel stored on hand is an important consideration. If the storage time is long for the fuel supply on hand, then the operator must take measures to ensure the fuel’s stability. AiNET, which maintains at least 7 days of runtime in underground storage tanks, ensures fuel stability by adding stabilizers, consuming fuel in thorough Generator tests, and recycling fuel. Most data center owner-operators, including AiNET, maintain fuel supply contracts, including SLAs, with multiple providers — these SLAs can require fuel and other supplies in as little as an hour.
Uninterruptible Power Supply Section
The UPS Section performs three key functions:
- Drawing energy from the Batteries (or other energy storage), the UPS Section provides power to the data center during Main Power interruption until the Generator(s) are fully engaged.
- Conditions/smooths power output to the Delivery Section.
- Using an internal bypass, distinct from the external maintenance bypass, withdraws itself from the critical power chain when it detects its own failure and during internal testing.
The UPS itself is a sophisticated, dual-conversion device. It accepts nominally 3-phase 480V AC power, internally converts it to DC, supplements input power shortcomings with energy from the Batteries, and presents clean power to the Delivery Section. Its typical output is 3-phase 480 V. If there are shortcomings in input power, the UPS will draw energy from the Batteries until they exhaust – at which point a portion or the entire data center could suffer an outage.
A byproduct of the dual conversion process of the UPS is an approximate 5-7% power loss, even in a top-of-the line UPS. (Caveat emptor – in light of this and other distribution losses, be wary of incredibly low PUE claims.)
Typically, the UPS also provides the recharge for the Batteries. Battery recharge occurs at a very slow rate compared to discharge. This is why data center operations should be very sensitive to the level of energy stored in the Batteries.
All supplemental power delivered by the UPS Section is derived from energy stored in the Batteries. Batteries are electrochemical devices, and they age. Aging mostly manifests as reduced energy storage capacity and occasionally, sudden catastrophic failure. Therefore, it is very important that data centers monitor Battery health and have a preemptive Battery replacement policy. AiNET continually monitors Battery energy, inspects and tests every Battery monthly, and replaces every Battery every three (3) years despite a nominal 7-10 year rated warranty period.
The Delivery Section
In the Delivery Section, power is transformed from 3-phase 480 V, to the working voltages in the data center and delivered there. Common working voltages delivered include 3 phases of 208 V, single phases at 208 V, single phases at 120 V , and rectified to -48 VDC.
A hallmark of AiNET is its avoidance of Power Distribution Units (PDUs) in the Delivery Section. Besides increasing PUE through additional power losses, PDUs introduce a single point of failure across multiple power delivery paths. The ANSI/TIA-942-2 Standard (Telecommunications Infrastructure Standard for Data Centers Addendum 2) acknowledges this by explicitly excluding PDUs from the requirement to support concurrent maintenance in the electrical system in Tier III data centers. Tier IV data centers are designed for complete system redundancy and maintainability, which typically precludes employing PDUs.
AiNET continually monitors delivered power for phase alignment, phase voltage, and power draw (amp draw).
Where Things Go Wrong
As owner-operator of data centers for over 15 years, and as a data center tenant prior, AiNET sees three main areas responsible for power outages at data centers:
- Insufficient/inconsistent Operational Testing and Monitoring.
- Lack of Redundancy in Design and Implementation.
- Lack of System-Level Management of the Critical Power Chain.
As discussed, examples of insufficient operational testing and monitoring include poor/infrequent testing of Generators (and all connected systems) under full data center load, inattention to viability of Generator Fuel, and an inadequate Battery-replacement cycle.
It’s true that redundancy adds cost, but it’s not as costly as downtime. AiNET suggests your organization always demand to see a thorough block diagram of the critical power chain of any data center under consideration – and then ask if there is any single point of failure, such as a PDU. A TIA-942 Tier III data center is permitted to have one, single point of failure in the distribution systems serving electrical equipment or mechanical systems (whereas a TIA-942 Tier IV data center, such as AiNET, can not have any).
The third item, the lack of system-level management in the critical power chain, is a universal problem for all data centers and critical power environments in the world – except those that have installed new technology, discussed below, specifically developed to address this problem.
System-Level Critical Power Chain Management
Even in well-designed systems, the burden of monitoring and ensuring the delivery of power is placed on individual elements. Mostly this burden falls on the Transfer Switch to detect and act upon (engage the Generator) obvious power failures, or on the UPS to detect and address other power shortcomings. The problem is that no element has a systems-level view of critical power management. For example, what happens when continual subtle power transients are ignored by the Transfer Switch, and over draw (the energy stored in the Batteries in) the UPS Section. This is descriptive of a class of problems that routinely afflict critical power installations.
AiNET noticed these problems, and after failing to find an available solution, developed one. The newly-patented AiNET Critical Power Protection Supervisor, provides a new level of protection for critical power systems including data centers, hospitals and military applications. AiNET believes that in the next 10 years, nearly all critical power installations in the world will have adopted it.
* * * * *
This article is on critical power delivery with a focus on data centers. It has segregated the critical power chain into functional sections, discussing each of them from the perspective of a data center operator dealing with maintaining power delivery under all circumstances.
It has highlighted areas most commonly at the root of power failures and introduced a new and newly-patented solution that addresses some of the more subtle, problematic failure modes as well as discussed AiNET’s specific processes and solutions plaguing an industry with less than ideal power/uptime histories.
Readers may also be interested in thorough block diagrams with commentary of the power delivery and cooling systems of AiNET’s TIA-942 Tier IV data center in the Washington, DC area.