Original Link: http://www.anandtech.com/show/1113

There are many recent technologies that have signalled a shift in the way data is sent within a desktop computer in order to increase speed and efficiency. Universal Serial Bus (USB), Serial ATA, and RDRAM, are all examples of moving away from a parallel architecture to a high-speed serial format, designed to ensure maximum bandwidth and provide future scalability.

The PCI (Peripheral Component Interconnect) Bus has been widely used as a general purpose I/O interconnect standard over the last ten years, but is really beginning to hit the limits of its capabilities. Extensions to the PCI standards, such as 64-bit slots and clock speeds of 66MHz or 100MHz, are too costly, and just cannot meet the rapidly increasing bandwidth demands in PCs over the next few years.

3rd Generation IO, or 3GIO, has been recently renamed PCI Express, and looks to be the replacement for the ubiquitous PCI bus, the most successful peripheral interconnect bus used in PCs. With support coming in the Intel Grantsdale chipset, along with Microsoft's next version of Windows, codenamed Longhorn, let's take a look at the technology that is designed to last the computer industry for the next ten years.

Intel proposed the original PCI 1.0 specification back in 1991. The PCI Special Interest Group (which took over development of PCI), produced revision 2.0 in May 1993.

Its rival at the time was the VESA Local Bus (VL-bus or VLB). Introduced by the Video Electronics Standards Association, VL-bus was a 32-bit bus that involved a third and forth connector appended to the end of a regular ISA slot. It ran at a nominal speed of 33MHz and offered significant performance over ISA.

One of the main features that provided such great performance was, ironically, one of the main factors in VLB's downfall. It was essentially a direct extension of the 486 processor/memory bus, running at the same speed as the processor, hence the name "local bus". This direct extension meant that connecting too many devices risked interfering with the processor itself, particularly if the signals went through a slot. VESA recommended that only two slots be used at clock frequencies up to 33MHz, or three if they were electrically buffered from the bus. At higher frequencies no more than two devices should be connected, and at 50MHz or above they should both be built into the motherboard.

Because the VL-bus ran synchronously with the processor, increasing processor frequencies caused real problems for VL-bus peripherals. The faster the peripherals are required to run, the more expensive they are, due to the difficulties associated with manufacturing high-speed components. Very few VL-bus components were built to handle speeds in excess of 40MHz.

PCI had some compelling advantages over VL-bus. It was designed as a mezzanine bus: PCI was a separate bus isolated from CPU, but still had access to main memory.

It had the ability to run asynchronously from the processor, with the nominal speeds of 25MHz, 30MHz and 33MHz. As processor speeds increased, the PCI bus speed could remain constant, as it ran at an adjustable fraction of the front side bus. The maximum number of slots and/or peripherals allowed by PCI, 5 or more, doubled what the VL-bus could handle, without any restrictions set by bus speed, buffering or other electrical considerations.

Other "smart" features promoted ease of use. Plug and Play allowed automatic configuration of peripherals without the need to set IRQ jumpers, DMA and IO addresses. It allowed IRQs to be shared, as well as having its own interrupt system (hidden away as #A, #B, #C and #D).

Finally, PCI bus mastering allows devices on the PCI bus to take control of the bus and perform transfers directly without CPU arbitration. This lowers latency and processor usage.

Its introduction alongside the Pentium processor, along with its clear benefits over rival buses at the time, helped PCI emerge from the bus wars as the dominant standard in 1994. Since then, just about all peripheral devices, from hard disk controllers, sound cards, to NICs and video cards, have been PCI based.

With the advent of RAID arrays, Gigabit Ethernet and other high bandwidth devices on consumer class systems, PCI's 133MB/s available bandwidth is clearly insufficient to handle these demands.

Chipset makers have foreseen this limitation and have made various changes to motherboard chipsets in order to alleviate some of the load from the PCI bus.

Up until 1997, graphics data was probably the single largest cause of traffic on the PCI bus. The Accelerated Graphics Port (AGP), introduced by Intel's 440LX chipset, had two main purposes: to increase graphics performance and to pull the graphics data off the PCI bus. With graphics data transfers taking place on another "bus" (technically, AGP is not a bus, since it only supports one device), the previously saturated PCI bus was freed up for use with other devices.

Yet AGP was just one step in reducing the load on the PCI bus. The next was to redesign the link between the North Bridge and South Bridge of motherboard chipsets. Older chipsets, such as the Intel 440 series used a single PCI bus to connect the North Bridge to the South Bridge. The PCI bus not only had to cope with inter-bridge traffic, but it also had to carry regular PCI traffic, IDE, Super I/O (Serial, Parallel, PS/2), and USB. To alleviate the situation Intel, VIA and SiS replaced the PCI bus between the North and South Bridges with a High Speed interconnect, and then shifted IDE, Super I/O and USB to their own dedicated links to the South Bridge.

Now with Intel's Communications Streaming Architecture bus built into the Memory Controller Hub of the i875/i865 chipsets, even Gigabit Ethernet is off the PCI bus.

Numerous dedicated interconnects for various devices in the i875 chipset: not really a cost effective solution

While AGP, CSA, Intel's Accelerated Hub Architecture Hub Link, VIA's V-Link and SiS' MuTIOL have been relatively successful in reducing the PCI bus load, those are just stop-gap solutions.

PCI Express, previously known as 3rd Generation I/O (3GIO), is all set to replace PCI and take general IO connectivity into the next decade.

PCI Express seeks to fulfil a number of requirements.

It is designed to support multiple market segments and emerging applications, as a unifying I/O architecture for Desktop, Mobile, Server, Communications, Workstations and Embedded Devices. It is not just for the desktop, like the original PCI specification was designed to be.

With regards to cost in both high and low volumes, the target is to come in at or below PCI cost structure at the system level. A serial bus requires fewer traces on PCBs, easing board design and increasing efficiency by allowing more space for other components.

It has a PCI Compatible software model, where existing Operating Systems should be able to boot without any changes. In addition, configuration and device drivers for PCI Express are to be compatible with existing PCI.

Performance scalability is achieved through increasing frequency and adding "lanes" to the bus. It is designed for high bandwidth per pin with low overheads and low latency. Multiple virtual channels per physical link are supported.

As a point-to-point connection, it allows each device to have a dedicated connection without bus sharing.

Other advanced features include
- ability to comprehend different data structures,
- low power consumption and power management features
- quality of service policies
- hot swappability and hot pluggability for devices
- data integrity and error handling end-to-end and at the link level
- isochronous data transfer support
- host-based transfers through host bridge chips and peer-to-peer transfers through switches
- packetized and layered protocol architecture

At the high level, the PCI Express System is comprised of a root complex, which would be placed either in the chipset's North Bridge or South Bridge, switches, and finally end-point devices. The new item in the PCI Express topology is the switch. It replaces the multi-drop bus and is used to provide fan-out for the I/O bus. The switch provides peer-to-peer communication between different end-point devices and does not require traffic to be forwarded to the host bridge if it does not involve cache-coherent memory transfers.

System Topology using the new Switch

The following diagrams show possible PCI Express implementations across an entire range of platforms: Desktops and Mobiles, Servers and Workstations, and Networking Communications Systems.

PCI Express based desktop and mobile system

Server and Workstation system

Networking Communications system

The PCI Express Architecture is specified in layers, which helps ease cross-platform design.

At the very bottom is the physical layer. The most basic PCI Express link consists of two low voltage differential signals: transmit and receive. A data clock is embedded using the 8/10b encoding scheme to achieve very high data rates. The initial frequency is 2.5Gb/s in each direction, with speeds expected to increase with advances in silicon technology up to possibly around 10Gb/s in each direction.

Transmit and receive signal pairs

One of the most exciting features for all the speed freaks out there is PCI Express's ability to scale speeds by aggregating links to form multiple lanes. The physical layer supports X1, X2, X4, X8, X12, X16 and X32 lane widths. Transmission over multiple lanes is transparent to other layers.

The data link layer ensures reliability and data integrity for every packet sent across a PCI Express link. Along with a sequence number and CRC, a credit-based, flow control protocol guarantees that packets are transmitted when a buffer is available to receive the packet at the other end. Packet retries are eliminated resulting in a more efficient use of bus bandwidth. Any corrupted packets are automatically retried.

The transaction layer creates request packets from the software layer to the link layer, implemented as split transactions. Each packet is uniquely identified, supporting 32-bit memory addressing as well as extended 64-bit addressing. Additional attributes including "no-snoop", "relaxed ordering" and priority are used for routing and quality of service.

Furthermore, the transaction layer comprises of four address spaces: memory, I/O, configuration (these three are already in the PCI specification) and the new Message Space. This fourth address space is used to replace prior side-band signals in the PCI 2.2 specification and does away with all the "special cycles" in the old format. These include interrupts, power management requests and resets.

Finally, the software layer is touted as of utmost importance as the key to maintaining software compatibility. The initialisation and runtimes are unchanged from PCI with the purpose of allowing operating systems to boot with PCI Express without modification. Devices are enumerated such that the operating system can discover the devices and allocate resources as necessary while the runtime again reuses the PCI load-store, shared-memory model. Whether or not modification is really required remains to be seen as "PCI Express support" is counted as one of the features of Microsoft's next Operating System codenamed Longhorn; a tacit implication that previous operating systems may not support PCI Express.

Initial implementations are designed to co-exist with legacy PCI connectors. As you can see from the diagram below, a 1X connector sits neatly behind the PCI slot at the back of the motherboard, allowing either a regular PCI card or a PCI-Express card to be used.

Other innovations include separating the main "box" from the human interface, and "device-bay" units which allow hot-swapping of cards and other PCI-Express Peripherals.

PCI Express slot on left, hot swappable PCI Express device bays on the right

Even mobile users won't be left out, with the new PCMCIA standard codenamed NEWCARD. The NEWCARD features a form-factor that neatly fits two NEWCARDS side by side in the space of a single CardBus card. Unfortunately, it is not designed to handle graphics, so the possibilities of video upgrades on a laptop are still virtually non-existent. On the bright side, future expansion capabilities range from wireless communications, ultra wideband TV tuners, security card readers to optical compression/encryption and smart clocking.

Single-wide and Double-wide NEWCARDs: a Double-wide is the same width as the old PCMCIA standard

With over 200 Megabytes per second in each direction for an X1 lane, PCI-Express claims to be a very cost effective solution for bandwidth per pin.

Intel's Grantsdale chipset provides an X16 link for graphics, some 4 Gigabytes per second in each direction (8GB/s concurrent bandwidth) dedicated to graphics, over double the bandwidth offered by AGP 8X. Hopefully, this additional capacity would be able to accomodate graphics demands for the next couple of years.

X16 and X1 PCI Express Slots

PCI Express Slots on the BigWater reference form factor shown at IDF 2002

Will PCI Express start a new bus war with other solutions such as PCI-X and HyperTransport?

The PCI Express Working Group, Arapahoe, claims that these buses are targeted at different solutions. RapidIO and HyperTransport were developed for specific applications while PCI-Express is designed for general usage.

The possibility that PCI Express could replace HyperTransport as a processor to processor interconnect is also unlikely. PCI-Express lacks the cache coherency protocols and its higher latency than parallel interconnects with source-synchronous clocks make it inappropriate for that type of usage. Certainly, AMD and nVidia have nothing to fear. Intel probably would not use it to replace the P4 bus either, since an open PCI Express standard means that Intel would not be able to charge third party chipset vendors for P4 bus licensing.

PCI Express has a great deal of potential. Its positioning as a general purpose interconnect gives it clear advantages in terms of flexibility and ensures that it is capable enough to be used in a wide variety of solutions.

As with many major changes, the transition from PCI to PCI Express won't happen overnight. ISA slots had stuck around for nearly 10 years before they were finally gone, so don't assume that your PCI peripherals are obsolete just yet.

The PCI Express Base 1.0a Specification and Card Electromechanical 1.0a Specification have already been released, although we won't see any PCI Express products until 2004, probably the first being video cards from nVidia and ATi, along with motherboards based on the Grantsdale chipset from Intel. At the server end of the market, Intel is looking to introduce PCI Express with the Lindenhurst and Twin Castle chipsets.  With new form factors and promising great performance, the future looks good.

Log in

Don't have an account? Sign up now