分享

【DKV】改变中的数据中心前景 (第一章)

 yi321yi 2019-08-12

Part1  Information Technology Equipment

第1部分 信息技术设备

BY DONALD L. BEATY, P.E., FELLOW ASHRAE; DAVID QUIRK, P.E., MEMBER ASHRAE. 

作者:DONALD L.BEATY 专业工程师、ASHRAE会员;DAVID QUIRK,专业工程师、ASHRAE会员。

译者:何海

Information technology equipment (ITE) continues to evolve to keep pace with the rate of change of software and the feature-rich services they provide. The HVAC systems and their interactions with the ITE will also need to evolve. While the“occu-pant” of data center is the software, at its core, the ITE design dictates how that occupant load manifests itself within the data center. By understanding the ITE  its changes, and how it interacts with the data center HVAC, data center designers will be equipped to handle this changing landscape.

信息技术设备(ITE)持续不断发展,以适应软件及其提供的丰富服务的演变速度。HVAC(暖通空调系统)和ITE的相互作用同样需要转变。当数据中心的“居住负载”是软件时,ITE设计规定的核心是如何将居住负载体现在数据中心内部。通过了解ITE的变化及其与数据中心暖通空调的相互作用,数据中心设计师将可以更好地应对这一变化的前景。

A tremendous variety of ITE populates today’s data centers, owing to the exponential growth of data generation, data storage, and data consumption that permeates our lives more each day. Traditional servers now cohabitate in the data center space with huge storage arrays and powerful supercomputing clusters, all with widely different workloads that stream our videos, maintain our homes, route our communications, predict our  weather, and generally accomplish things previously thought not possible!

数据生成、数据存储和数据消费已经渗透到我们生活中的每一天,并呈指数级增长,因而如今的数据中心遍布各类不同的ITE。现在,传统的服务器在数据中心空间中拥有巨大的存储阵列和强大的超算集群,所有这些服务器都扮演着各种不同的工作角色,如:视频传输、家庭运行、通信路由、天气预报,并且通常可以完成从前认为不可能完成的任务。

And yet, all of these things are accomplished by equipment in data centers, using similar resources and encountering similar challenges. Data may be just a seriesof1’s and0’s, but manipulating trillions of them consumes electricity, and lots of it. That electricity isdissipated in the form of heat, which itself must be dissipated, or the bulk of equipment would almost certainl yincinerate.

然而,所有这些都是通过在数据中心使用类似的资源和面临相似挑战的设备来实现的。数据可能只是一系列的“1”和“0”,但操纵数万亿的数据会消耗大量电能。电是以热能的形式耗散的,散热是必不可少的,否则大部分设备几乎肯定会被烧毁。

This is Part1 in a series that will explore the current state of ITE in the data center, covering the different equipment types, thei rroles, and their general arrangement within the data center. This will not be comprehensive by any means, but rather serve to create a solid foundation from which to explore in more detail the changing landscape of data centers. 1 Future parts in this series will cover ITE thermal design and the interactions between IT systems and data center cooling systems.

本系列文章的第1部分将探讨数据中心中ITE的现状,包括不同的设备类型、它们的角色及其在数据中心中的常规布置。然而无论如何,这部分都不是全部内容,而是为深入探索数据中心不断变化的前景打下了坚实的基础1。本系列接下来的部分将介绍ITE热设计以及IT系统和数据中心冷却系统之间的相互影响。

Data Center Design 

数据中心设计

Data center infrastructure designers are tasked with wrapping a shell, power, and cooling around ITE. To do this effectively, flexibly, efficiently, and in a scalable manner, we need to better understand how the ITE is designed, how it’s changing, and why it’s changing.

数据中心基础设施设计师的任务是为ITE封装外壳、提供电源和冷却。为了以有效、灵活、高效、可扩展的方式实现这一点,我们需要更好地理解如何设计ITE、以及ITE是如何变化的、为什么会变化。

Historical data center design philosophy often considered the ITE as a homogenous load separated into aisles of equipment and arranged in hot/cold aisles for proper thermal management.2. The goal of the HVAC was to provide the proper entering air conditions to the ITE.3 

历史上,数据中心的设计理念通常是将ITE作为均匀的负荷分散到设备通道中,并安排在热/冷通道中进行适当的热管理2。HVAC的目的是为ITE提供适当的进风条件3

Today’s ITE consists of several very distinct types of hardware (servers, networking, and storage), each with their own unique designs and functions for supporting different types of data and services. Each of these types of ITE will correspondingly interact with the data center infrastructure in different ways.

当今的ITE由几种完全不同的硬件(服务器、网络和存储)组成,每种硬件都有自己独特的设计和功能,用于支持不同类型的数据和服务。每种类型的ITE都将以不同的方式与数据中心基础设施相互影响。

In generic terms the ITE is divided into these basic functions:

·   Servers: data computing and processing;

·   Networking: data transmission; and

·   Storage: data storing.

一般情况下,ITE可以分为以下功能类型:

1. 服务器:数据计算和处理;

2. 网络:数据传输;

3. 存储:数据储存。

In all cases, the use of data consumes energy and releases heat within the data center. The most efficient and effective management of that heat is dependent upon a more detailed understanding of the ITE design and its interactions with the data center infrastructure.

在所有情况下,数据的使用都会消耗能量,并在数据中心内释放热量。最高效和有效的热量管理依赖于更深入地理解ITE设计及其与数据中心基础设施的相互影响。

Servers 

服务器

In IT terms, “server” broadly refers to equipment that provides utility to a “client.” Servers not only provide the webpages and files that we are most familiar with, but more abstractly provide computational resources across networks to be consumed by services.

在IT术语中,“服务器”泛指为“客户端”提供实用程序的设备。服务器不仅提供我们最熟悉的网页和文件,还更抽象地提供跨网络计算资源供服务使用。

The role of the server has grown immensely through the years to meet our demands, and, through specialization, servers may achieve higher degrees of throughput, speed, and energy efficiency. Therefore, the server design of yesterday is not the server design of today, and will not be the server design of tomorrow.

多年来,为满足我们的需求,服务器的性能已大大增强,通过专门设计,服务器可以实现更高的吞吐量,速度和能效。因此,昨天的服务器设计不同于今天的服务器设计,也不同于明天的服务器设计。

The central processing unit (CPU) has traditionally been the heart of the server, supported by memory, storage, and other peripherals. A CPU consists of millions

or billions of individual switches that each produce heat as they operate billions of times per second. As such, they are one of the highes theat producing components within the server and a primary concern for cooling, accomplished by either air or liquid.

中央处理器(CPU)传统上是服务器的核心,由内存、存储器和其他外围设备支持。一个CPU由数百万或数亿个独立的开关组成,每一个开关每秒运行数亿次时都会产生热量。因此,它们是服务器中产生热量最高的器件之一,也是风冷和液冷的主要关注点。

The CPU is usually attached via a socket to a mother-board, whichprovides power to and interconnections between the components of the server located directly on the motherboard as well as other peripherals.

CPU通常通过一个插座连接到主板上,主板为主板上和其他外围设备上的服务器组件提供电源和相互连接。

For CPUs to perform their function, they require memory to hold in the inputs and outputs, and it must be incredibly fast to keep up with the demands of the CPU. As a technical result, the memory closest to the CPU is volatile memory, or memory that does not retain its data without power, as this yields the highest performance. Static random-access memory (SRAM) is the fastest and is located on the same die as the CPU, whereas dynamic random-access memory (DRAM) is somewhat slower and typically located on memory chips near the CPU. Later, the important resulting data is written to non-volatile storage media, but at slower speeds.

为了让CPU执行它们的功能,它们需要内存来保存输入和输出数据,而且必须足够快才能满足CPU的需求。从技术上讲,在得到最高的性能同时,最接近CPU的内存是易失性存储器,它在失去电源的情况下无法保存数据。静态随机存取内存(SRAM)是速度最快的内存,集成在CPU内部;动态随机存取内存(DRAM)稍微慢一些,通常位于CPU附近的内存芯片上。稍后,重要的结果数据将以较慢的速度写入非易失性存储介质。

Traditional types of non-volatile storage found in the data center include hard disk drives (HDD) and tapes, although solid state drives (SSD) are increasingly preva-lent due to their comparatively low latency and decreasing cost. SSDs use flash memory, a non-volatile form of memory that involves no moving components, unlike HDDs and tapes.4.

数据中心中传统的非易失性存储类型包括硬盘驱动器(HDD)和磁带,然而固态硬盘(SSD)由于其相对较低的时延和成本而越来越受欢迎。SSD使用闪存,与HDD和磁带不同,闪存是一种无运动器件的非易失性内存形式4

To increase server performance for some types of workloads, a current trend is using SSDs closer to the CPU via peripheral component interconnect express (PCIe) as their costs decrease and performance and capacity increase to where they offer attractive benefits compared to adding more DRAM.

为提高某些类型工作负载的服务器性能,当前的趋势是使用SSD通过标准总线接口(PCIe)与CPU连接。与增加DRAM相比,SSD的成本更低,性能和容量更高,因而更具吸引力。

However, HDDs and tapes still play an important role in the data center. HDDs use magnetic storage on rotating disks and offer low cost and fast access, whereas tapes are significantly slower but can store data with unparalleled economy and are the only format to offer an archival-appropriate lifespan (e.g., 30 years).

然而,硬盘和磁带在数据中心里依然扮演着重要角色。HDD使用机械磁盘上的磁存储器,提供低成本和快速访问,而磁带速度非常慢,但可以非常经济地存储数据,是唯一满足档案寿命(如30年)的存储格式。

The most common form factor for servers is the 19 in. (483mm) wide rack-mount server. The design is a time-less standard defined by the EIA-310 specification5 that has filled   the majority of the world’s data centers for decades. It is based on 1.75 in. (44.5mm) tall rack units(U), and equipment is specific as 1U,2U,4U,etc., referring to its height. Whilemost equipment makes full use of the width, half-width equipment is an increasingly popular alternative that offers potentially higher density per U.

服务器最常见的规格是19 英寸(483毫米)宽度的机架式服务器。该设计是一个由EIA-310规范5定义的永久标准,几十年来世界上大多数数据中心都采用了该规范。它基于1.75英寸(44.5毫米)高度的机架单元(U),设备高度具体分为1U、2U、4U等。虽然大多数设备都充分采用了这个宽度,但半宽度设备越来越受欢迎,因为它可以提供更高密度每U的选择。

To further increase density beyond the standard lay-out, blade servers offer increased flexibility for manufacturers. Blade servers typically consist of a standard-width chassis that occupies between 3U and 10U of a traditional rack. However, the servers are individual “blades” of various dimensions, which plug into the chassis. Additionally, shared resources such as cooling fans and power supplies may be located on thec hassis, further broadening thelimits of what is possible with blade design (Figure1).

为进一步增加超出标准布局的密度,刀片式服务器为制造商提供了更大的灵活性。刀片服务器通常由标准宽度的机箱组成,它占传统机架的3U到10U高度。只不过服务器是不同尺寸独立插入到机箱中的“刀片”。此外,共用资源如冷却风扇和电源可能也位于机箱上,这进一步扩大了刀片设计的极限(图1)。

Networking 

网络连接

Networking is the wa yservers are connected to other servers, other resources, andto the world outside the data center. Network topologies, the ways that networking equipment is connected, is a rapidly evolving topic that varies greatly based on the size andfunction of a workload. Traditional networking can easily be a source of bottlenecking if not optimized for a workload, but the current trend toward software-defined networking

(SDN)meets the dynamic demands on the network through abstraction and separation of the control from the hardware.

网络是服务器连接到其他服务器、资源和外界数据中心方式。网络拓扑,即网络设备的连接方式,是一个快速发展的主题,它随着工作负载的大小和功能的变化而变化。如果不针对工作负载进行优化,传统的网络很容易成为瓶颈,然而目前的趋势是采用软件定义网络(SDN),通过控制功能与硬件的分离,以满足网络的动态需求。

FIGURE1 Typical compute server rack and packaging.

图1.典型计算服务器机架和封装

At the basic level, switches provide communication between devices by receiving data and forwarding it to its destination via the correct port, in contrast to a hub tha receives data and forwards it to all othe rports. Since it would be impractical for every server in a data center of any considerable size to be connected to a single switch, often multiple switches are connected to form a hierarchy with data flowing up and down the hierarchy as required. Such a hierarchy can be functionally characterized into core, distribution, and edge switches.6

在通常情况下,交换机接收数据并通过正确的端口将其转发到目的地,从而在设备之间提供通信;而集线器接收数据并将其转发到其他所有端口。由于将任意规模的数据中心中的每台服务器连接到单个交换机都是不切实际的,因此常需要连接多个交换机,以形成一个层级结构,并根据需要在层级结构中上下传送数据。这种层级结构可以按功能划分为核心、汇聚和边缘交换机。

Based on the size and design of the data center, network implementations may take many forms. At the simplest level, equipment could be connected to thesame (core) switch. As the number of servers increases, it may be optimal to have the servers connected to edges witches, which are in turn each connected to the cores witch. If theneeds of the network exceed even that configuration, edge switches may be locatedwithin the server racks that each communicate with row level distribution switches that each join multiple edge switches to the core switch (Figure2).

根据数据中心的规模和设计,可以采用多种形式实现网络连接。在最简单的级别上,设备可以连接到同一个(核心)交换机。随着服务器数量的增加,最好将服务器先连接到边缘交换机,而边缘交换机又依次连接到核心交换机。如果网络需求甚至超过了该配置,则位于服务器机架中的边缘交换机会通过行级汇聚交换机通信,再将多个边缘交换机连接到核心交换机.

Current switch design, like nearly everything in the data center, is largely driven by performance. At its core, data is received through a port and reaches the physical layer chips. These communicate with an application-specific integrated circuit (ASIC) chip, which is a highly specialized chip designed to route data as quickly and intelligently as possible. The data flow though the physical layer to the proper outgoing port. Switches may also include a CPU that handles additional data processing such as encryption, but processing data adds significant latency to the otherwise low-latency switching.

与数据中心中的几乎所有设备一样,当前交换机的设计在很大程度上是由性能决定的。其核心是,数据通过端口接收并到达物理层芯片。它们与特定用途集成电路(ASIC)芯片通信,ASIC芯片是一种高度专用的芯片,旨在尽可能快速和智能地实现路由和数据转发。数据通过物理层流向正确的输出端口。交换机还可以包括一个CPU来进行额外的数据处理,比如加密,但是数据处理会给低时延的交换机增加明显的延迟。

FIGURE2 Possible switching hierarchy based on network size.

图2.基于网络规模的可能的交换体系

In addition to just forwarding data to the right destination, the ASIC chip may also enforce priority flow control (PFC) rules that prioritize some data and deprioritize other data to provide the desired result, such as prioritizing streaming video to be delivered without interruption.

除了将数据转发到正确的目的地之外,ASIC芯片还可以执行优先级流控制(PFC)规则,对某些数据进行优先级排序,以提供所需的结果,比如对连续传输的视频流进行优先级排序。

Networking equipment may also be classify  based on an abstraction of its function within the data center, using the open system interconnection (OS) seven-layer model. The majority of networking equipment resides at Layers 1 to 3, which are the physical layer, the data link layer, and the network layer, respectively. A description of each layer follows:

· Layer 1: Physica llayer equipment such as hubs and repeaters only deal with the raw bit stream.

· Layer 2: Data link layer equipment such as bridges and switches deal with routing entire frames from Point A to Point B.

· Layer 3: Network layer equipment such as a router deals with higher order functions such as addressing, path determination, and subnets.

· Layers 4 to 7: These serve even higher-order functions, ranging from betweennetworks to ultimately communication with the application layer and the human m- achine interface.

网络设备也可以根据数据中心内部功能,使用开放系统互连(OS)七层模型进行抽象分层。大多数网络设备位于第1层到第3层,分别是物理层、数据链路层和网络层。以下是对每一层的描述:

第1层: 物理层设备,如集线器和中继器,只处理原始比特流。

第2层数据链路层设备如网桥和对数据帧进行转发的交换机。

第3层: 网络层设备,如具有更高级功能(寻址、路由选择和划分子网)的路由器。

第4到7层: 提供更高级的功能,从网络之间的通信到最终与应用层和人机界面的通信。

Storage Arrays 

存储阵列

A third major component of ITE within data centers is for the purpose of storing data, which is typically stored for one of four reasons. Data may be stored online, meaning it is current and able to be accessed rapidly. Data may be for backup to prevent the loss of information. Data may be for archival purposes, which should be accessible in the long term but is rarely needed. Or it may be for disaster recovery, which is similar to archival except that it is stored in a different physical location to guard against natural disasters.

数据中心内的ITE的第三个主要组件用于存储数据,通常数据存储有以下四个目的之一:数据可以在线存储,这意味着它是实时的,可以快速访问;数据可以作为备份,以防止信息丢失;数据可用于存档目的,可以长期访问,尽管很少需要;或者也可以用于灾难恢复,它类似于存档,只是存储在不同的物理位置以防止自然灾害。

A typical consumer’s computer may include one o two hard drives that are individually managed by that machine. However, when servers need to store data, especially where that data needs to be accessible by tens to thousands of servers, it simply is not feasible for each server to manage individual hard drives. Storage arrays are the solution to this problem.

一个典型的消费级电脑可能包括一个或两个硬盘驱动器,由该机器单独管理。然而,当服务器需要存储数据时,特别是那些数据需要由无数台服务器访问的地方,每个服务器管理单个硬盘驱动器是不可行的。存储阵列正是这个问题的解决方案。

Storage arrays abstrac tindividual storage devices (HDDsorSSDs) and aggregate them into larger arrays that appear as a single logical unit. The presentation of devices may be asjust a bunch of disks (JBOD), but commonly the storage devices are grouped togethe into a redundantarray of independent disks (RAID) to improve performance or enhance redundancy.

存储阵列将单个存储设备(HDD或SSD)组合成更大的阵列,并表现为单个逻辑单元。设备的表现形式可以只是简单磁盘捆绑(JBOD),但通常是将存储设备组合成为独立磁盘冗余阵列(RAID),以提高性能或增强冗余。

RAID works by “striping”data across multiple devices, which can reduce the data written by each disk, increasing how quickly the data can be written (RAID 0). The RAID array may also use additional devices to write data redundantly across multiple disks (RAID 1).

RAID通过跨多个设备“分段”数据,可以减少每个磁盘写入的数据,提高数据的写入速度(RAID 0)。RAID阵列还可以使用额外的设备跨多个磁盘实现冗余数据写入(RAID 1)。

More complex RAID configurations can provide both performance and redundancy. For example, RAID 4 and 5 do this by striping data across multiple devices, and then calculating the parity of each bit in each corresponding stripe of data (essentially whether the sum of bits is even or odd) and writing the parity bit to a redundant device. If any stripe is lost, the parity of the remaining stripes can be used to determine the lost bit. RAID 4 uses a dedicated parity device, whereas RAID 5 distributes the parity bits across all of the devices.

更复杂的RAID配置可以提供性能和冗余。例如,RAID 4和RAID 5通过跨多个设备分段数据来实现这一点,然后计算每个对应数据段中的每位的奇偶校验(本质上无论位的和是偶数还是奇数),并将校验位写入冗余设备。如果任意数据段丢失,则可以使用其余段的校验位来确定丢失的位。RAID 4使用专用奇偶校验设备,而RAID 5将奇偶校验位分布在所有设备上。

Beyond RAID itself, storage arrays typically have redundant power supplies, RAID controllers, and ports to provide resiliency and reduce points of failure. Each device in the storage array is accessible by both controllers, which may use either power supply, and which may use either set of ports to communicate.

除了RAID本身,存储阵列通常还具有冗余电源、RAID控制器和端口,以提供弹性和减少故障点。存储阵列中的每个设备都可由两个控制器访问,控制器可以使用任何一个电源,也可以使用任何一组端口进行通信。

As mentioned before, tape storage still plays a role in the data center, particularly for archive and backup, in spite of newer technologies. Tape formats have been updated throughout the years with higher capacities and faster speeds, now nearing 10 terabytes each. Although individual tape drives are available, tape drives in the data center are most often found within tape libraries that manage a large number of tapes at once.

如上所述,尽管有了更新的技术,磁带存储仍然在数据中心中发挥作用,特别是用于存档和备份。多年来,磁带格式一直在以更高的容量和更快的速度更新,现在每种格式的容量都接近10TB。虽然可以单独使用磁带驱动器,但是数据中心中的磁带驱动器通常位于同时管理大量磁带的磁带库中。

Summary 

小结

Today’s ITE consists of several very distinct types of hardware (servers, networking, and storage), each with its own unique designs and functions for supporting different types of data and services. Each of these types of ITE will correspondingly interact with the data  center infrastructure in different ways.

今天的ITE由几种完全不同的硬件组成(服务器、网络和存储),每种硬件都有自己独特的设计和功能来支持不同类型的数据和服务。每种类型的ITE都将以不同的方式与数据中心基础设施发生相应的交互。

It is reasonable to believe that the IT industry is currently undergoing more rapid change than almost anything it has experienced. Not only is data within existing segments being created, stored, and processed more quickly than ever before, but entirely new IT segments are emerging, such as virtual reality (VR) and self-driving vehicles. These changes are placing increased demands on data centers and the ITE in them.

有理由相信,IT行业目前正经历着比以往任何时候都更快的变革。不仅现有段中的数据产生、存储和处理速度比以往任何时候都要快,而且全新的IT应用正在涌现,比如虚拟现实(VR)和无人驾驶汽车。这些变化对数据中心和其中的ITE提出了更高的需求。

By understanding the fundamental roles of ITE (servers, networking, and storage), its changes, and how it interacts with the data center HVAC, data center designers will be better equipped to handle this changing landscape.

通过了解ITE的基本类型(服务器、网络和存储)、变化、及其如何与数据中心暖通空调相互影响,数据中心设计师将能更好地应对这种变化的前景。

References 

参考文献

1. ASHRAE. 2016. IT Equipment Design Impacton Data Center Solutions. Atlanta: ASHRAE.

2. ASHRAE. 2009. Design Considerations For Datacom Equipment Centers, 2ndEdition. Atlanta: ASHRAE.

3. ASHRAE. 2015. Thermal Guidelines For Data Processing Environments, 4th Edition. Atlanta: ASHRAE.

4. ASHRAE. 2015. “Data Center Storage Equipment Thermal Guidelines, Issues, and Best Practices.”

5. EIA/ECA-310-D-1992, Cabinets, Racks, Panels, and Associated Equipment.

6. ASHRAE. 2014. “Data Center Networking Equipment—Issues and Best Practices. 

1. ASHRAE. 2016. IT设备设计对数据中心解决方案的影响。亚特兰大:ASHRAE

2. ASHRAE. 2009. 数据通信设备设计研究(第二版)。亚特兰大:ASHRAE  

3. ASHRAE. 2015. 数据处理环境热指南(第四版). 亚特兰大: ASHRAE

4. ASHRAE. 2015. 数据中心存储设备热指南、问题与最佳实践

5. EIA/ECA-310-D-1992,机柜、机架、面板和辅助设备

6. ASHRAE. 2014. 数据中心网络设备——问题与最佳实践

翻译:

何海

中国空气动力研究与发展中心,工程师

DKV(Deep Knowledge Volunteer)计划精英成员

编辑:

李擎

北京欣盛云路科技有限公司 高级运营经理

公众号声明:

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多