A 0.4-μm CMOS 10-Gb/s 4-PAM Pre-Emphasis Serial Link Transmitter
Ramin Farjad-Rad, Chih-Kong Ken Yang, Mark Horowitz and Thomas Lee
Center for Integrated Systems, Stanford University
Stanford, CA 94305

Abstract
A 10-Gb/s serial link transmitter fabricated in the LSI 0.4-μm CMOS process uses multilevel signaling (4-PAM) and a 3-tap pre-emphasis filter to reduce intersymbol interference (ISI) caused by channel low-pass effects. Due to the maximum on-chip frequency set by process limitations, a 5:1 output multiplexer is used to reduce the required clock frequency to 1/5 the symbol rate. With a 3.3-V supply, the chip shows an eye opening of >200mV after a 10-m coaxial cable in simulations*.

Introduction
As the demand for higher data rate communication increases, low-cost, high-speed serial links using copper cables become more attractive [1],[2]. Due to the skin effect loss in conductors, copper cables show a low-pass frequency response which imposes the main bottleneck in multi-gigabit/s data transmission. The 10-m coaxial cable (RG400U) used in this work has a -3dB bandwidth of 1GHz. Furthermore, the intrinsic process speed limits the on-chip frequency. In this 0.4-μm CMOS process, the maximum on-chip operating frequency of digital logic is roughly 1GHz. To achieve the 10-Gb/s data rate, we employ a pre-emphasis technique using a 3-tap FIR filter, a 5:1 multiplexing scheme, and a 4-level pulse amplitude modulation (4-PAM). This modulation scheme was chosen as a compromise between improved bit-rate and reduced noise-margins.

System Architecture
In multi-gigabit/s applications, optimal detection methods can consume large area and are complex to implement [3]. Instead, square pulses, which can be generated and detected with modest complexity, are used in this system as the basis communication symbols [4]. Transmitting a sequence of these square symbols results in a data eye. Larger eye openings result in better noise immunity of the system. Symbol rates well above the channel bandwidth result in severe ISI which reduces the eye opening. For a given data rate, the 4-PAM scheme reduces the symbol rate by a factor of two compared to a conventional 2-PAM system. This symbol rate reduction lowers not only the signal ISI in the channel, but also the maximum required on-chip clock frequency.

To characterize the channel, we measured the impulse response of the coaxial line. A 3-tap FIR pre-emphasis filter is used to invert the low-pass effects of the channel. The results of this filter, for a 0.2-ns pulse (5Gsym/s), at the near and far end of the channel are shown in Fig. 1. The unfiltered pulse response (Fig. 1) shows a large value 0.2ns after its peak, when the next symbol is sampled, whereas the filtered response has almost zero amplitude at that point. The tap weights are programmable to allow flexibility in optimizing the eye opening for different channels.

The on-chip frequency requirement is further reduced to 1/5 the symbol rate (1/10 bit rate) by performing a 5:1 multiplexing directly onto the 50-Ω line allowing 5 symbols to be transmitted every cycle. The 5 symbols correspond to 10 bits which includes 4 data symbols and 1 symbol for line coding. In this design, line coding is performed on chip to provide enough transitions for clock recovery.

Circuit Implementation
The architecture to achieve the 10-Gb/s transmission rate is shown in Fig. 2. The multiplexing transmitter, comprising 5 identical drivers, uses 10 different clock phases from a 5-stage differential ring oscillator (Tx-PLL) to generate the output stream. Figure 3 shows how the transmitted symbols depend on the clock phases. The resynchronizer retimes the 10-bit parallel data into five 2-bit groups. Each group has a different phase to prevent set-up and hold time violation for the input data to each driver. Because the data eye can be reduced by phase errors, the oscillator elements are designed with low jitter [5] and the buffering paths for clock phases are precisely matched.

Each of the 5 drivers is composed of four 2-bit DAC modules (Fig. 4). The main module drives the coax line with a current proportional to one of the 4 symbol levels. The 5 other modules using the same data but different clock inputs turn on consecutively in the next 3 symbol periods, implementing the FIR filter that cancels the tail of the main module's pulse. Instead of a complex resynchronizer for the three filter modules, three 0.2-ns delay stages in each driver guarantee enough setup and hold time for the driver input data while passing from one module to the other. The currents in the filter taps (tap weights) are determined by three controllable current sources at the bottom of each module. To protect the tap currents from on-chip noise, each current source is a mirror whose input current is supplied from a clean off-chip source. Because the corresponding modules in each of the 5 drivers are turned on sequentially, only one of the modules is pulling current at each symbol time. Thus, each current source is shared among the 5 drivers (Fig. 4). Once the 3 tap weights are determined by static channel characteristics, no logic is needed to compute the pre-emphasized signal since each driver cancels the tail of its own transmitted symbol.

The 2-bit DAC module contains two differential driving legs (Fig. 5). This circuit uses $D_{0,1}$, $D_{0,1}$ and two clocks which are 200ps out-of-phase to generate a precise 200-ps current pulse as explained in Fig. 3. The 2 driving legs are binary weighted to generate 4 selectable levels. The “tail” nodes (Fig. 5b) of the main modules' legs are grounded to minimize the device size for a given output current. Smaller device sizes prevent parasitic diffusion capacitances at the output from limiting the overall bandwidth. The AND and NOT gates (Fig. 5b) are designed identical to match the delays in the buffering paths. The differential outputs are connected to 50-Ω on-chip PMOS resistors to eliminate line reflections. To achieve good linearity and almost constant 50-Ω termination for varying output voltages, the output devices must remain in saturation. Due to short channel velocity saturation, these devices can have a maximum output swing of 1.1V while maintaining 2% linearity and 500-Ω(>50Ω) output impedance.

To facilitate eye-diagram generation and BER measurements, 2^1 PRBS encoder is built on-chip. The 4/5sym encoder performs line coding for the PRBS sequence. Also a 1.2-kb memory and 20-b data register are also implemented on-chip which enables us to load and transmit data patterns of different sizes (Fig. 2).

Simulation Results*
The chip described was simulated over all process corners with 0.4-μm CMOS models. To implement the channel, the impulse response of the 10 meter cable assembly (including connectors and board traces) was convolved with transmitter output in time domain. Figure 6 shows the eye-diagrams after the cable without and with transmit shaping respectively. Notice that the filter obtains a large eye opening (>200mV). According to [6], a hard detector receiver in this technology (0.4-μm LSI) achieves a BER<10^-14 for a 70-mV eye opening. Measurements from [6] together with jitter calculations based on [5] result in a maximum of 45ps (p-p) phase error for this 1 GHz PLL and

*The chip is built and measurement results will be ready and presented at the conference.
the following buffers. This phase error can reduce the eye width by less than 40% and the eye height by 30%.

The authors would like to thank B. Ellersick, K. Yu, B. Amrutur, S. Krishnan, L. Sampson, A. Hajimiri, LSI Logic and MCC for their assistance.

References


