SGI 10-Gigabit Ethernet Network Adapter Manual do Utilizador

Consulte online ou descarregue Manual do Utilizador para Redes SGI 10-Gigabit Ethernet Network Adapter. InfiniBand and 10-Gigabit Ethernet for Dummies Manual do Utilizador

  • Descarregar
  • Adicionar aos meus manuais
  • Imprimir
  • Página
    / 150
  • Índice
  • MARCADORES
  • Avaliado. / 5. Com base em avaliações de clientes
Vista de página 0
Designing Cloud and Grid Computing Systems
with InfiniBand and High-Speed Ethernet
Dhabaleswar K. (DK) Panda
The Ohio State University
http://www.cse.ohio-state.edu/~panda
A Tutorial at CCGrid ’11
by
Sayantan Sur
The Ohio State University
E-mail: sur[email protected]-state.edu
http://www.cse.ohio-state.edu/~surs
Vista de página 0
1 2 3 4 5 6 ... 149 150

Resumo do Conteúdo

Página 1 - A Tutorial at CCGrid ’11

Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed EthernetDhabaleswar K. (DK) PandaThe Ohio State UniversityE-mail: panda@cse.

Página 2

Hadoop Architecture• Underlying Hadoop Distributed File System (HDFS)• Fault-tolerance by replicating data blocks• NameNode: stores information on dat

Página 3 - Computing Systems

CCGrid '11OpenFabrics Stack with Unified Verbs InterfaceVerbs Interface(libibverbs)Mellanox(libmthca)QLogic(libipathverbs)IBM (libehca)Chelsio(li

Página 4 - Cluster Computing Environment

• For IBoE and RoCE, the upper-level stacks remain completely unchanged• Within the hardware:– Transport and network layers remain completely unchange

Página 5 - (http://www.top500.org)

CCGrid '11OpenFabrics Software StackSA Subnet AdministratorMAD Management DatagramSMA Subnet Manager AgentPMA Performance Manager AgentIPoIB IP o

Página 6 - Grid Computing Environment

CCGrid '11103InfiniBand in the Top500Percentage share of InfiniBand is steadily increasing

Página 7

45%43%6%1%0%0%1%0%0%0%4%Number of SystemsGigabit Ethernet InfiniBandProprietary MyrinetQuadrics Mixed NUMAlink SP Switch Cray Interconnect Fat Tree Cu

Página 8 - Compute cluster

105InfiniBand System Efficiency in the Top500 ListCCGrid '1101020304050607080901000 50 100 150 200 250 300 350 400 450 500Efficiency (%)Top 500 S

Página 9 - Cloud Computing Environments

• 214 IB Clusters (42.8%) in the Nov ‘10 Top500 list (http://www.top500.org)• Installations in the Top 30 (13 systems):CCGrid '11Large-scale Infi

Página 10 - Hadoop Architecture

• HSE compute systems with ranking in the Nov 2010 Top500 list– 8,856-core installation in Purdue with ConnectX-EN 10GigE (#126)– 7,944-core installat

Página 11 - Memcached Architecture

• HSE has most of its popularity in enterprise computing and other non-scientific markets including Wide-area networking• Example Enterprise Computing

Página 12

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Página 13 - • Software components

Memcached Architecture• Distributed Caching Layer– Allows to aggregate spare memory from multiple nodes– General purpose• Typically used to cache data

Página 14 - • Ex: TCP/IP, UDP/IP

Modern Interconnects and Protocols110ApplicationVerbsSocketsApplicationInterfaceTCP/IPHardwareOffloadTCP/IPEthernetDriverKernelSpaceProtocolImplementa

Página 15 - – Not scalable:

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Página 16 - Myrinet (1993 -) 1 Gbit/sec

CCGrid '11112Low-level Latency Measurements051015202530VPI-IBNative IBVPI-EthRoCESmall MessagesLatency (us)Message Size (bytes)010002000300040005

Página 17

CCGrid '11113Low-level Uni-directional Bandwidth Measurements02004006008001000120014001600VPI-IBNative IBVPI-EthRoCEUni-directional BandwidthBand

Página 18

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Página 19 - IB Trade Association

• High Performance MPI Library for IB and HSE– MVAPICH (MPI-1) and MVAPICH2 (MPI-2.2)– Used by more than 1,550 organizations in 60 countries– More tha

Página 20

CCGrid '11116One-way Latency: MPI over IB0123456Small Message LatencyMessage Size (bytes)Latency (us)1.961.541.602.17050100150200250300350400MVAP

Página 21 - • I/O interface bottlenecks

CCGrid '11117Bandwidth: MPI over IB0500100015002000250030003500Unidirectional BandwidthMillionBytes/secMessage Size (bytes)2665.63023.71901.11553

Página 22

CCGrid '11118One-way Latency: MPI over iWARP0102030405060708090Chelsio (TCP/IP)Chelsio (iWARP)Intel-NetEffect (TCP/IP)Intel-NetEffect (iWARP)Mess

Página 23

CCGrid '11119Bandwidth: MPI over iWARP0200400600800100012001400Message Size (bytes)Unidirectional BandwidthMillionBytes/sec839.81169.7373.31245.0

Página 24 - (not shown)

• Good System Area Networks with excellent performance (low latency, high bandwidth and low CPU utilization) for inter-processor communication (IPC) a

Página 25

CCGrid '11120Convergent Technologies: MPI Latency0102030405060Small MessagesLatency (us)Message Size (bytes)0200040006000800010000120001400016000

Página 26

CCGrid '11121Convergent Technologies:MPI Uni- and Bi-directional Bandwidth02004006008001000120014001600Native IBVPI-IBVPI-EthRoCEUni-directional

Página 27 - • Myricom GM

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Página 28 - IB Hardware Acceleration

CCGrid '11123IPoIB vs. SDP Architectural ModelsTraditional ModelPossible SDP ModelSockets AppSockets APISockets ApplicationSockets APIKernelTCP/I

Página 29 - • Hardware Checksum Engines

CCGrid '11124SDP vs. IPoIB (IB QDR)050010001500200028321285122K8K32KBandwidth (MBps)IPoIB-RCIPoIB-UDSDP0510152025302481632641282565121K2KLatency

Página 30 - TOE and iWARP Accelerators

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Página 31

• Option 1: Layer-1 Optical networks– IB standard specifies link, network and transport layers– Can use any layer-1 (though the standard says copper a

Página 32

Features• End-to-end guaranteed bandwidth channels• Dynamic, in-advance, reservation and provisioning of fractional/full lambdas• Secure control-plane

Página 33

• Supports SONET OC-192 or 10GE LAN-PHY/WAN-PHY• Idea is to make remote storage “appear” local• IB-WAN switch does frame conversion– IB standard allow

Página 34 - 2003 (Gen1), 2007 (Gen2)

CCGrid '11129InfiniBand Over SONET: Obsidian Longbows RDMAthroughput measurements over USNLinuxhostORNL700 milesLinuxhostChicagoCDCISeattleCDCISu

Página 35

• Hardware components– Processing cores and memory subsystem– I/O bus or links– Network adapters/switches• Software components– Communication stack• B

Página 36 - IB, HSE and their Convergence

CCGrid '11130IB over 10GE LAN-PHY and WAN-PHYLinuxhostORNL700 milesLinuxhostSeattleCDCIORNLCDCIlongbowIB/SlongbowIB/S3300 miles 4300 milesORNL lo

Página 37 - Traditional Ethernet

MPI over IB-WAN: Obsidian RoutersDelay (us) Distance (km)10 2100 201000 20010000 2000Cluster ACluster BWAN LinkObsidian WAN Router Obsidian WAN Router

Página 38 - IB Overview

Communication Options in Grid• Multiple options exist to perform data transfer on Grid• Globus-XIO framework currently does not support IB natively• W

Página 39 - Components: Channel Adapters

Globus-XIO Framework with ADTS DriverGlobus XIO Driver #nDataConnectionManagementPersistentSessionManagementBuffer &FileManagementData Transport I

Página 40 - • Switches: intra-subnet

134Performance of Memory BasedData Transfer• Performance numbers obtained while transferring 128 GB of aggregate data in chunks of 256 MB files• ADTS

Página 41 - – Not directly addressable

135Performance of Disk Based Data Transfer• Performance numbers obtained while transferring 128 GB of aggregate data in chunks of 256 MB files• Predic

Página 42

136Application Level Performance050100150200250300CCSMUltra-VizBandwidth (MBps)Target ApplicationsADTSIPoIB• Application performance for FTP getopera

Página 43 - IB Communication Model

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Página 44 - Queue Pair Model

A New Approach towards OFA in CloudCurrent ApproachTowards OFA in CloudApplicationAccelerated Sockets10 GigE or InfiniBandVerbs / Hardware OffloadCurr

Página 45 - Memory Registration

Memcached Design Using Verbs• Server and client perform a negotiation protocol– Master thread assigns clients to appropriate worker thread• Once a cli

Página 46 - Memory Protection

• Ex: TCP/IP, UDP/IP• Generic architecture for all networks• Host processor handles almost all aspects of communication– Data buffering (copies on sen

Página 47 - (Send/Receive Model)

Memcached Get Latency• Memcached Get latency– 4 bytes – DDR: 6 us; QDR: 5 us– 4K bytes -- DDR: 20 us; QDR:12 us• Almost factor of four improvement ove

Página 48 - Hardware ACK

Memcached Get TPS• Memcached Get transactions per second for 4 bytes– On IB DDR about 600K/s for 16 clients – On IB QDR 1.9M/s for 16 clients• Almost

Página 49

Hadoop: Java Communication Benchmark• Sockets level ping-pong bandwidth test• Java performance depends on usage of NIO (allocateDirect)• C and Java ve

Página 50

Hadoop: DFS IO Write Performance• DFS IO included in Hadoop, measures sequential access throughput• We have two map tasks each writing to a file of in

Página 51 - Hardware Protocol Offload

Hadoop: RandomWriter Performance• Each map generates 1GB of random binary data and writes to HDFS• SSD improves execution time by 50% with 1GigE for t

Página 52 - • Switching and Multicast

Hadoop Sort Benchmark• Sort: baseline benchmark for Hadoop• Sort phase: I/O bound; Reduce phase: communication bound• SSD improves performance by 28%

Página 53 - Buffering and Flow Control

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Página 54 - Virtual Lanes

• Presented network architectures & trends for Clusters, Grid, Multi-tier Datacenters and Cloud Computing Systems• Presented background and detail

Página 55 - Service Levels and QoS

CCGrid '11Funding AcknowledgmentsFunding Support byEquipment Support by148

Página 56 - Traffic Segregation Benefits

CCGrid '11Personnel AcknowledgmentsCurrent Students – N. Dandapanthula (M.S.)– R. Darbha (M.S.)– V. Dhanraj (M.S.)– J. Huang (Ph.D.)– J. Jose (P

Página 57 - Identifiers)

• Traditionally relied on bus-basedtechnologies (last mile bottleneck)– E.g., PCI, PCI-X– One bit per wire– Performance increase through:• Increasing

Página 58 - Switch Complex

CCGrid '11Web Pointershttp://www.cse.ohio-state.edu/~pandahttp://www.cse.ohio-state.edu/~surshttp://nowlab.cse.ohio-state.eduMVAPICH Web Pagehttp

Página 59 - – 3D Torus (Sandia Red Sky)

• Network speeds saturated at around 1Gbps– Features provided were limited– Commodity networks were not considered scalable enough for very large-scal

Página 60 - More on Multipathing

• Industry Networking Standards• InfiniBand and High-speed Ethernet were introduced into the market to address these bottlenecks• InfiniBand aimed at

Página 61 - IB Multicast Example

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Página 62

• IB Trade Association was formed with seven industry leaders (Compaq, Dell, HP, IBM, Intel, Microsoft, and Sun)• Goal: To design a scalable and high

Página 63 - IB Transport Services

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Página 64 - Reliability

• 10GE Alliance formed by several industry leaders to take the Ethernet family to the next speed step• Goal: To achieve a scalable and high performanc

Página 65 - Transport Layer Capabilities

• Network speed bottlenecks• Protocol processing bottlenecks• I/O interface bottlenecksCCGrid '1121Tackling Communication Bottlenecks with IB and

Página 66 - Data Segmentation

• Bit serial differential signaling– Independent pairs of wires to transmit independent data (called a lane)– Scalable to any number of lanes– Easy to

Página 67 - Transaction Ordering

CCGrid '11Network Speed Acceleration with IB and HSEEthernet (1979 - ) 10 Mbit/secFast Ethernet (1993 -) 100 Mbit/secGigabit Ethernet (1995 -) 10

Página 68 - Message-level Flow-Control

2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011Bandwidth per direction (Gbps)32G-IB-DDR48G-IB-DDR96G-IB-QDR48G-IB-QDR200G-IB-EDR112G-IB-FDR300G-IB-EDR1

Página 69

• Network speed bottlenecks• Protocol processing bottlenecks• I/O interface bottlenecksCCGrid '1125Tackling Communication Bottlenecks with IB and

Página 70

• Intelligent Network Interface Cards• Support entire protocol processing completely in hardware (hardware protocol offload engines)• Provide a rich c

Página 71 - Concepts in IB Management

• Fast Messages (FM)– Developed by UIUC• Myricom GM– Proprietary protocol stack from Myricom• These network stacks set the trend for high-performance

Página 72 - Subnet Manager

• Some IB models have multiple hardware accelerators– E.g., Mellanox IB adapters• Protocol Offload Engines– Completely implement ISO/OSI layers 2-4 (l

Página 73

• Interrupt Coalescing– Improves throughput, but degrades latency• Jumbo Frames– No latency impact; Incompatible with existing switches• Hardware Chec

Página 74 - HSE Overview

CCGrid '11Current and Next Generation Applications and Computing Systems3• Diverse Range of Applications– Processing and dataset characteristics

Página 75 - Differences

• TCP Offload Engines (TOE)– Hardware Acceleration for the entire TCP/IP stack– Initially patented by Tehuti Networks– Actually refers to the IC on th

Página 76 - – Multi Stream Semantics

• Also known as “Datacenter Ethernet” or “Lossless Ethernet”– Combines a number of optional Ethernet standards into one umbrella as mandatory requirem

Página 77

• Network speed bottlenecks• Protocol processing bottlenecks• I/O interface bottlenecksCCGrid '1132Tackling Communication Bottlenecks with IB and

Página 78

• InfiniBand initially intended to replace I/O bus technologies with networking-like technology– That is, bit serial differential signaling– With enha

Página 79

• Recent trends in I/O interfaces show that they are nearly matching head-to-head with network speeds (though they still lag a little bit)CCGrid &apos

Página 80

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Página 81

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics– Novel Features– Subnet Management and Services• High-spee

Página 82

CCGrid '1137Comparing InfiniBand with Traditional Networking StackApplication LayerMPI, PGAS, File SystemsTransport LayerOpenFabrics VerbsRC (rel

Página 83 - Offloaded TCP

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics• Communication Model• Memory registration and protection•

Página 84

• Used by processing and I/O units to connect to fabric• Consume & generate IB packets• Programmable DMA engines with protection features• May hav

Página 85 - Myrinet Express (MX)

CCGrid '11Cluster Computing EnvironmentCompute clusterLANFrontendMeta-DataManagerI/O ServerNodeMetaDataDataComputeNodeComputeNodeI/O ServerNodeDa

Página 86 - Datagram Bypass Layer (DBL)

• Relay packets from a link to another• Switches: intra-subnet• Routers: inter-subnet• May support multicastCCGrid '11Components: Switches and Ro

Página 87 - • Solarflare approach:

• Network Links– Copper, Optical, Printed Circuit wiring on Back Plane– Not directly addressable• Traditional adapters built for copper cabling– Restr

Página 88

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics• Communication Model• Memory registration and protection•

Página 89

CCGrid '11IB Communication ModelBasic InfiniBand Communication Semantics43

Página 90 - Hardware

• Each QP has two queues– Send Queue (SQ)– Receive Queue (RQ)– Work requests are queued to the QP (WQEs: “Wookies”)• QP to be linked to a Complete Que

Página 91 - IB Transport

1. Registration Request • Send virtual address and length2. Kernel handles virtual->physical mapping and pins region into physical memory• Process

Página 92 - IB iWARP/HSE RoE RoCE

• To send or receive data the l_keymust be provided to the HCA• HCA verifies access to local memory• For RDMA, initiator must have the r_key for the r

Página 93

CCGrid '11Communication in the Channel Semantics(Send/Receive Model)InfiniBand DeviceMemoryMemoryInfiniBand DeviceCQQPSend RecvMemorySegmentSend

Página 94 - IB Hardware Products

CCGrid '11Communication in the Memory Semantics (RDMA Model)InfiniBand DeviceMemoryMemoryInfiniBand DeviceCQQPSend RecvMemorySegmentSend WQE cont

Página 95 - Tyan Thunder S2935 Board

InfiniBand DeviceCCGrid '11Communication in the Memory Semantics (Atomics)MemoryMemoryInfiniBand DeviceCQQPSend RecvMemorySegmentSend WQE contain

Página 96 - IB Hardware Products (contd.)

CCGrid '11Trends for Computing Clusters in the Top 500 List (http://www.top500.org)Nov. 1996: 0/500 (0%)Nov. 2001: 43/500 (8.6%)Nov. 2006: 361

Página 97 - – Nortel Networks

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics• Communication Model• Memory registration and protection•

Página 98 - • Support for VPI and RoCE

CCGrid '11Hardware Protocol OffloadComplete HardwareImplementationsExist51

Página 99 - – OFED 1.6 is underway

• Buffering and Flow Control• Virtual Lanes, Service Levels and QoS• Switching and MulticastCCGrid '11Link/Network Layer Capabilities52

Página 100 - (libibverbs)

• IB provides three-levels of communication throttling/control mechanisms– Link-level flow control (link layer feature)– Message-level flow control (t

Página 101 - • Within the hardware:

• Multiple virtual links within same physical link– Between 2 and 16• Separate buffers and flow control– Avoids Head-of-Line Blocking• VL15: reserved

Página 102 - OpenFabrics Software Stack

• Service Level (SL):– Packets may operate at one of 16 different SLs– Meaning not defined by IB• SL to VL mapping:– SL determines which VL on the nex

Página 103 - InfiniBand in the Top500

• InfiniBand Virtual Lanes allow the multiplexing of multiple independent logical traffic flows on the same physical link• Providing the benefits of i

Página 104 - SP Switch

• Each port has one or more associated LIDs (Local Identifiers)– Switches look up which port to forward a packet to based on its destination LID (DLID

Página 105

• Basic unit of switching is a crossbar– Current InfiniBand products use either 24-port (DDR) or 36-port (QDR) crossbars• Switches available in the ma

Página 106 - CCGrid '11

• Someone has to setup the forwarding tables and give every port an LID– “Subnet Manager” does this work• Different routing algorithms give different

Página 107 - • Integrated Systems

CCGrid '11Grid Computing Environment6Compute clusterLANFrontendMeta-DataManagerI/O ServerNodeMetaDataDataComputeNodeComputeNodeI/O ServerNodeData

Página 108 - Other HSE Installations

• Similar to basic switching, except…– … sender can utilize multiple LIDs associated to the same destination port• Packets sent to one DLID take a fix

Página 109 - Presentation Overview

CCGrid '11IB Multicast Example61

Página 110 - InfiniBand

CCGrid '11Hardware Protocol OffloadComplete HardwareImplementationsExist62

Página 111 - Case Studies

• Each transport service can have zero or more QPs associated with it– E.g., you can have four QPs based on RC and one QP based on UDCCGrid '11IB

Página 112 - Message Size (bytes)

CCGrid '11Trade-offs in Different Transport Types64AttributeReliableConnectionReliableDatagrameXtendedReliableConnectionUnreliableConnectionUnrel

Página 113 - Bandwidth (MBps)

• Data Segmentation• Transaction Ordering• Message-level Flow Control• Static Rate Control and Auto-negotiationCCGrid '11Transport Layer Capabili

Página 114

• IB transport layer provides a message-level communication granularity, not byte-level (unlike TCP)• Application can hand over a large message– Netwo

Página 115 - MVAPICH/MVAPICH2 Software

• IB follows a strong transaction ordering for RC• Sender network adapter transmits messages in the order in which WQEs were posted• Each QP utilizes

Página 116 - One-way Latency: MPI over IB

• Also called as End-to-end Flow-control– Does not depend on the number of network hops• Separate from Link-level Flow-Control– Link-level flow-contro

Página 117 - Bandwidth: MPI over IB

• IB allows link rates to be statically changed– On a 4X link, we can set data to be sent at 1X– For heterogeneous links, rate can be set to the lowes

Página 118

CCGrid '11Multi-Tier Datacenters and Enterprise Computing7...Enterprise Multi-tier DatacenterTier1Tier3Routers/ServersSwitchDatabase ServerAppli

Página 119 - Bandwidth: MPI over iWARP

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics• Communication Model• Memory registration and protection•

Página 120

• Agents– Processes or hardware units running on each adapter, switch, router (everything on the network)– Provide capability to query and set paramet

Página 121 - Convergent Technologies:

Inactive LinksCCGrid '11Subnet ManagerActive LinksCompute NodeSwitchSubnet ManagerInactive LinkMulticast JoinMulticast SetupMulticast JoinMultica

Página 122

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics– Novel Features– Subnet Management and Services• High-spee

Página 123 - InfiniBand CA

• High-speed Ethernet Family– Internet Wide-Area RDMA Protocol (iWARP)• Architecture and Components• Features– Out-of-order data placement– Dynamic an

Página 124 - SDP vs. IPoIB (IB QDR)

CCGrid '11IB and HSE RDMA Models: Commonalities and DifferencesIB iWARP/HSEHardware Acceleration Supported SupportedRDMA Supported SupportedAtomi

Página 125

• RDMA Protocol (RDMAP)– Feature-rich interface– Security Management• Remote Direct Data Placement (RDDP)– Data Placement and Delivery– Multi Stream S

Página 126 - IB on the WAN

• High-speed Ethernet Family– Internet Wide-Area RDMA Protocol (iWARP)• Architecture and Components• Features– Out-of-order data placement– Dynamic an

Página 127 - Features

• Place data as it arrives, whether in or out-of-order• If data is out-of-order, place it at the appropriate offset• Issues from the application’s per

Página 128 - “appear” local

• Part of the Ethernet standard, not iWARP– Network vendors use a separate interface to support it• Dynamic bandwidth allocation to flows based on int

Página 129 - Sunnyvale

CCGrid '11Integrated High-End Computing EnvironmentsCompute clusterMeta-DataManagerI/O ServerNodeMetaDataDataComputeNodeComputeNodeI/O ServerNode

Página 130 - 3300 miles 4300 miles

• Can allow for simple prioritization:– E.g., connection 1 performs better than connection 2– 8 classes provided (a connection can be in any class)• S

Página 131 - Cluster B

• High-speed Ethernet Family– Internet Wide-Area RDMA Protocol (iWARP)• Architecture and Components• Features– Out-of-order data placement– Dynamic an

Página 132 - Communication Options in Grid

• Regular Ethernet adapters and TOEs are fully compatible• Compatibility with iWARP required• Software iWARP emulates the functionality of iWARP on th

Página 133 - Modern WAN

CCGrid '11Different iWARP ImplementationsRegular Ethernet AdaptersApplicationHigh Performance SocketsSocketsNetwork AdapterTCPIPDevice DriverOffl

Página 134 - Data Transfer

• High-speed Ethernet Family– Internet Wide-Area RDMA Protocol (iWARP)• Architecture and Components• Features– Out-of-order data placement– Dynamic an

Página 135 - IPoIB-64MB

• Proprietary communication layer developed by Myricom for their Myrinet adapters– Third generation communication layer (after FM and GM)– Supports My

Página 136 - Application Level Performance

• Another proprietary communication layer developed by Myricom– Compatible with regular UDP sockets (embraces and extends)– Idea is to bypass the kern

Página 137

CCGrid '11Solarflare Communications: OpenOnload Stack87Typical HPC Networking StackTypical Commodity Networking Stack• HPC Networking Stack provi

Página 138

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics– Novel Features– Subnet Management and Services• High-spee

Página 139 - Memcached Design Using Verbs

• Single network firmware to support both IB and Ethernet• Autosensing of layer-2 protocol– Can be configured to automatically work with either IB or

Página 140 - Memcached Get Latency

CCGrid '11Cloud Computing Environments9LANPhysical MachineVMVMPhysical MachineVMVMPhysical MachineVMVMVirtual FSMeta-DataMetaDataI/O ServerDataI/

Página 141 - Memcached Get TPS

• Native convergence of IB network and transport layers with Ethernet link layer• IB packets encapsulated in Ethernet frames• IB network layer already

Página 142 - Bandwidth with C version

• Very similar to IB over Ethernet– Often used interchangeably with IBoE– Can be used to explicitly specify link layer is Converged (Enhanced) Etherne

Página 143

CCGrid '11IB and HSE: Feature ComparisonIB iWARP/HSE RoE RoCEHardware Acceleration Yes Yes Yes YesRDMA Yes Yes Yes YesCongestion Control Yes Opti

Página 144

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Página 145 - Hadoop Sort Benchmark

• Many IB vendors: Mellanox+Voltaire and Qlogic– Aligned with many server vendors: Intel, IBM, SUN, Dell– And many integrators: Appro, Advanced Cluste

Página 146

CCGrid '11Tyan Thunder S2935 Board(Courtesy Tyan)Similar boards from Supermicro with LOM features are also available 95

Página 147 - Concluding Remarks

• Customized adapters to work with IB switches– Cray XD1 (formerly by Octigabay), Cray CX1• Switches:– 4X SDR and DDR (8-288 ports); 12X SDR (small si

Página 148 - Funding Acknowledgments

• 10GE adapters: Intel, Myricom, Mellanox (ConnectX)• 10GE/iWARP adapters: Chelsio, NetEffect (now owned by Intel)• 40GE adapters: Mellanox ConnectX2-

Página 149 - Personnel Acknowledgments

• Mellanox ConnectX Adapter• Supports IB and HSE convergence• Ports can be configured to support IB or HSE• Support for VPI and RoCE– 8 Gbps (SDR), 16

Página 150 - Web Pointers

• Open source organization (formerly OpenIB)– www.openfabrics.org• Incorporates both IB and iWARP in a unified manner– Support for Linux and Windows–

Comentários a estes Manuais

Sem comentários