Data centers generate a tremendous amount of heat – the computer and storage systems stored within them exhaust as much as 90% of their input power as heat. This heat has to be mechanically cooled or rejected from the data center space to keep the machines operating properly. Compounding this challenge, the air quality and security of the data center have to be precisely maintained leading to additional costs and complexity in the design and operation of data center cooling. Data center cooling is a super-speciality of the Heating Ventilation and Air Conditioning (HVAC) industry. Best practices and designs, worldwide, are generally vetted through the American Society of Heating, Refrigeration and Air Conditioning Engineers (ASHRAE).
Tremendous amounts of operational and capital expenses (OPEX and CAPEX), 100%-200% above computer load in energy costs, have been spent on data center cooling systems. HVAC presented the single largest area for cost savings and increased efficiencies in data centers.
Since 2004, few areas of data center design or operation have seen as much improvement as cooling. Heat loads, first from blade computing and now all efficient computing operations, have steadily increased maximum heat density in data centers, sometimes to over 35KW per rack. This represents greater than a 300% increase in heat densities over the last decade, while at the same time, state-of-the-art cooling “overhead” is well below 50% of energy costs – irrespective of season and outside temperature – depending on the level of redundancy and fault tolerance.
Best in Class Design
AiNET’s TIA-942 Tier IV cooling system at WDC-8 (diagrammed below) shows a 3N system with fault tolerant, concurrently-maintainable systems. With completely redundant cooling loops, completely redundant heat rejection loops, and three 100% capacity chillers operated with efficient partial loading, the system is designed to survive multiple critical simultaneous failures without impact to the critical load.
Raised Floors and CRAC Units
Proceeding right-to-left in the above diagram, the CRACs (Precision Computer Room Air Conditioners) are the loud, typically mono-colored elements in the data center room that produce the strong circulation of cold air. They usually also manage the humidity.
CRACs sit on a Raised Floor – the white tiles of anti-static material that are part of the signature data center look. The area where the cool air flows underneath the floor is the raised floor plenum (processed air distribution system) and it is an important part of the air flow system. Some data centers forgo the capital expense of creating a raised floor plenum. The result is higher operating expenses since the inability to control the mixing of hot and cold air reduces heating densities, which is critical for efficient cooling.
A proper raised floor plenum has few blockages. Anything under the floor (e.g. cabling) must be limited and encased in conduit to reduce “drag” on the air flow. Its purpose is to provide the clearest path to delivering cold air flow anywhere in the data center. The tiles are replaceable and interchangeable allowing for the redistribution of air as needed. Poor operations block the raised floor.
The most efficient CRAC units have no compressors or moving parts other than fans and valves and act principally as heat exchangers. Certain designs place compressor units in the CRAC units -smaller compressors tend to be less efficient, and moreover, since compressors generate heat, they have to work harder to remove their own heat from the data center.
The highest efficiency CRAC units use chilled water cooling loops to provide cold air. Simply stated, air is passed through a cooled heat exchanger coil (think of the radiator in your car) and it cools as it passes over the coil. Cooling loop temperatures are very important – too warm and they won’t provide enough cooling, too low and they can cause condensation. In a traditional design, best practices put the cooling loop temperature at between 45F and 52F – depending upon the environment and season. (To produce air at 65F, a cooling loop needs to be at ~40F. If it were 100F outside, that is an expensive 60 degree difference).
The cooling loop heats slightly as it flows between each CRAC unit. Therefore the cooling loop has to be sized to carry enough water to ensure that the water for last CRACs in the loop remains adequately cooled. Highly efficient water pumps keep the flow rate of each loop consistent and slow enough to ensure the most efficient operation (if the water moves too fast, it can’t cool the heat exchanger properly).
Redundant Cooling Loops
Why redundant cooling loops? – glad you asked. Just like all data center systems, risk of failure and maintenance have to be considered and managed closely. At regular intervals, CRAC units need to be serviced or replaced. On occasion, valves on the cooling loops or the pipes themselves may need servicing or replacement. Without dual redundant cooling loops, a data center has no acceptable way of servicing its critical cooling plant without some reduction in capacity or service interruption. Some data centers try to work around this by servicing systems late at night or in the coldest months, but proper design saves customers from being exposed to these limitations and risks.
One of the significant design costs in a Tier IV data center is its redundant cooling loops. If a data center isn’t Tier IV, it likely doesn’t have them. Check their block diagrams thoroughly.
Tier IV tip: Significant risks to your cooling plant from blockage, poor water-quality maintenance and other factors drive down your efficiency and can lead to cooling system failure. Only certified data centers have proven methodologies and practices for design and maintenance of all critical facilities.
With data center heat having been passed to the water in the CRACs, the water from the cooling loop passes into the chiller complex, depicted in the middle of the diagram. Chillers use a highly-efficient compressor to transfer heat from the cooling loop on the right hand side of the diagram to the heat rejection loop on the left side. (Heat rejection loops are also called “outside” loops.) The most efficient chillers use a centrifugal style compressor; less efficient ones use scroll or other designs. Chillers are the largest energy user in the cooling plant, and represent one of the largest opportunities for improvements in efficiency through smart operations.
As shown in the diagram of AiNET’s WDC-8 facility, up to 3 chillers are connected between the cooling loop and the outside loop. Besides the obvious fault tolerance and redundancy benefits, this represents an important aspect of energy efficiency. In general, it is more efficient to run two chillers at 50% capacity than a single one at 100% capacity. Industry and vendor measurements have shown that chillers are often more efficient at partial loads (less than 100%) than at full capacity. So it is often advantageous to run multiple chillers at lower levels than a smaller number at full capacity.
Many data center operators do not have an option in this regard. With only two chillers per deployment or “pod”, it is an all-or-nothing proposition… or worse – the redundant unit is being used as a primary unit! By having several chillers, AiNET is able to operate highly efficiently (i.e. reduced PUE) while maintaining completely redundant spare 100% capacity.
Redundant Cooling Towers
Cooling towers are the highly efficient heat rejection units that remove the heat from the “outside” cooling loop (to the outside air). Since they are exposed to the elements, careful maintenance and water quality are a must.
The most effective cooling towers, “open” cooling towers, use evaporation to cool the water entering them to the “wet bulb” temperature. Less efficient designs, “closed” cooling towers, sacrifice efficiency for somewhat reduced maintenance – they are limited to approaching the much higher “dry bulb” temperature. [Primer: Wet bulb temperature is the lowest temperature that can be reached solely by the evaporation of water. Dry bulb temperature is that measured by a regular thermometer, so it does not consider humidity.]
In warmer seasons, cooling tower temperatures are higher. The higher the temperature of the cooling tower, the more the chiller complex has to work to remove heat.
As with chillers, running multiple cooling towers at partial loads – i.e., less than 100% speed – improves efficiency by increasing the surface area from which heat leaves the system. When advantages, AiNET operates multiple cooling towers at its WDC-8 facility to improve efficiency without risk to its backup 100% capacity cooling towers.
Every component of the cooling plant is installed with at least one maintenance bypass. The cooling system is a series of loops, and these allow troubled or serviced components to be removed from the system without interrupting operations.
Tier-IV tip: Even Maintenance Bypasses need maintenance! The purpose for redundant cooling loops is to allow even the maintenance bypasses to be serviced without affecting operation.
The cooling plant of an efficient data centers does use a lot of water. Like the redundant power systems a data center uses to protect its operations, water is also a critical input. Similar to the refueling contracts for its diesel generation systems, AiNET maintains water “refueling” contracts to ensure its water supply is not interrupted should utility water supply be disrupted – but that’s not all.
To satisfy TIA-942 Tier IV requirements, a data center must have significant onsite water reserves. For AiNET’s WDC-8 facility, this takes multiple forms. First, an onsite well (with redundant pumps – not shown in the diagram) produces enough water enough to supply the data center. In the highly unlikely event that a severe drought prevents the optimal performance of the well, a gravity fed water tank stores thousands of gallons of water for significant run-time.
What is a gravity fed water tank? Simply put, it is a water tank set high above the cooling towers. This allows the water to use gravity (and osmotic pressure) to fill the cooling towers and provide water for evaporation (“make up water”). Since there is no pump, there is no risk of a pump failure and no energy cost in its use. Gravity always works and this design is a hallmark of high efficiency. In cold weather this tank is electrically heated to stay above freezing so that even on the coldest day of the year, emergency water is available and ready to go at a moment’s notice.
These are examples, but not exhaustive of the level of consideration and design that is necessary for the most reliable datacenter operations in the world, as evidenced by a certified TIA-942 Tier IV data center.
Part of ASHRAE’s recent efforts to improve efficiencies has been in the area of data center temperatures. Previously, data centers were expected to be kept as cold as 65F (18C) and 50% Relative Humidity (RH) to ensure that computer systems had adequate cooling. Many of these standards had not been substantially updated since the 1950s (think paper punch cards). For most environments, this not only was expensive to maintain, but it did not properly address the modern challenge, which is removing heat, not simply providing cold air. “Hot spots” often plagued data center operators as equipment of different sizes was placed haphazardly or opportunistically throughout the facility with little coordination or consideration with the cooling system design. To address these “hot spots”, cooling systems were often overspecified (made many times larger than required) to brute force the hot spots away – basically by increasing the air flow at a high equipment and operational cost.
As a result, traditional data center designs allowed for a lot of mixing of the hot exhaust and the cold cooling system air, with the loss of heat density an unintended consequence and contributor to huge inefficiencies.
Hot Aisle Segregation (Hot Row/Cold Row)
It turns out that temperature really isn’t the problem, keeping the air moving is. By taking a two step approach, first by separating the hot exhaust from machines and the cool air of the cooling system, and then separating the exhaust of each row of machines from the intake of each row of machines, significant efficiencies can be achieved.
Essentially, the data center cooling system is most effective when large amounts of heat are provided to it – with as little cold air as possible. The most risk to computer systems is when hot air is exposed to their intakes. By preventing mixing and increasing the amount of warm air returning to the cooling system, the HVAC system operates much more efficiently and reduces the risk to computer equipment at the same time.
Traditionally, a 750 ton cooling plant might have been able to cool 2-3MW of computer load. However, with modern efficiency and best practices, it can successfully cool 40-60MW, a 2000% improvement. Coupled with enhancements in economizer and other technologies (not shown) this load can be increased further – all at little increased energy cost.
A recent trend in data center design – of increasing the raised floor heights and widening cold rows – has started with several vendors. The theory is to allow more cold air to enter the row and reduce the air pressure under the raised floor. This is not without cost.
The widening of the cold row serves to reduce data center (and heat density) significantly – often more than offsetting the increase in density the air flow was targeting. These designs are frequently used to mask underlying design and density shortcomings. Without very careful consideration and computational modeling, this design often leads to lower efficiencies (and higher costs) than properly applied best practices and sensible use of raised floors and cold rows.
There are theoretical limits to air cooling, but in practice, every design has an economical air cooling solution (proven at least up to 37Kw).
The dynamic nature of cooling a data center, increasing computing capabilities, and corresponding increases in rack and room power densities present challenges. Today’s IT equipment can push data centers to 750 watts per square foot and the cooling challenges and costs are much more obvious. By 2014, power densities above 35Kw (air cooled) per cabinet are expected to be common.
It is important to choose the right partner with infrastructure and expertise to navigate these challenges. The wrong decision certainly will lead to higher operational costs and worse, increased risk of downtime.
To make the best decision, be sure to examine the detailed system block diagrams of an AiNET, certified TIA-942 Tier IV data center and make plans to visit one.