# A 26-Gb/s CMOS optical receiver with a reference-less CDR in 65-nm CMOS

# Quan Pan<sup>1, †</sup>, Xiongshi Luo<sup>1</sup>, Zhenghao Li<sup>1</sup>, Zhengzhe Jia<sup>1</sup>, Fuzhan Chen<sup>1, 2</sup>, Xuewei Ding<sup>3</sup>, and C. Patrick Yue<sup>2</sup>

<sup>1</sup>School of Microelectronics, Engineering Research Center of Integrated Circuits for Next-Generation Communications, Ministry of Education, Southern University of Science and Technology, Shenzhen 518055, China

<sup>2</sup>Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, China <sup>3</sup>ZTE Microelectronics, Shenzhen 518055, China

**Abstract:** This paper presents a 26-Gb/s CMOS optical receiver that is fabricated in 65-nm technology. It consists of a tripleinductive transimpedance amplifier (TIA), direct current (DC) offset cancellation circuits, 3-stage gm-TIA variable-gain amplifiers (VGA), and a reference-less clock and data recovery (CDR) circuit with built-in equalization technique. The TIA/VGA frontend measurement results demonstrate 72-dB $\Omega$  transimpedance gain, 20.4-GHz –3-dB bandwidth, and 12-dB DC gain tuning range. The measurements of the VGA's resistive networks also demonstrate its efficient capability of overcoming the voltage and temperature variations. The CDR adopts a full-rate topology with 12-dB imbedded equalization tuning range. Optical measurements of this chipset achieve a  $10^{-12}$  BER at 26 Gb/s for a  $2^{15}$ –1 PRBS input with a –7.3-dBm input sensitivity. The measurement results with a 10-dB @ 13 GHz attenuator also demonstrate the effectiveness of the gain tuning capability and the built-in equalization. The entire system consumes 140 mW from a 1/1.2-V supply.

Key words: clock and data recovery; equalizer; optical receiver; transimpedance amplifier; variable-gain amplifier

**Citation:** Q Pan, X S Luo, Z H Li, Z Z Jia, F Z Chen, X W Ding, and C P Yue, A 26-Gb/s CMOS optical receiver with a reference-less CDR in 65-nm CMOS[J]. J. Semicond., 2022, 43(7), 072401. https://doi.org/10.1088/1674-4926/43/7/072401

# 1. Introduction

Short-range optical communications for high-speed, high-density interconnect applications have drawn significant research efforts in recent years because conventional electrical links have become much less competitive in terms of weight, energy efficiency, channel bandwidth (BW), crosstalk, and electromagnetic interference (EMI). Meanwhile, deep-submicron complementary metal-oxide-semiconductor (CMOS) optoelectronic integrated circuits (OEICs) have become extremely attractive because they can be extensively used in high-speed communications with much lower fabrication cost and higher integration when compared to other III-V compound technologies, such as GaAs, InP-based alternatives. Therefore, optical interconnections in CMOS technology are gaining research interest as a promising candidate for next-generation cloud computing and big data applications<sup>[1]</sup>.

The 100-Gb/s Ethernet (100GbE) for short-reach interconnections 100G BASE-SR4 is still the mainstream product with low-cost technologies, such as CMOS 65 and 40 nm<sup>[2–7]</sup>. Meanwhile, 100G BASE-SRZ offers layout advantages to the host board implementer and substantial cable plant fiber count reduction. Overall, a low power 100GbE solution with cost-efficient technologies is very competitive and desirable in many aspects<sup>[8–10]</sup>.

Correspondence to: Q Pan, panq@sustech.edu.cn Received 28 JANUARY 2022; Revised 15 MARCH 2022. ©2022 Chinese Institute of Electronics In this work, a 26-Gb/s CMOS receiver chipset including both an optical front-end and a reference-less full-rate clock and data recovery (CDR) with 12-dB built-in equalization capability is demonstrated, targeting applications for 100GbE optical communications. All circuits are designed and fabricated in 65-nm CMOS technology.

It should be noted that the TIA stage has been previously published in its single-ended topology<sup>[11, 12]</sup>. In this work, an optimized pseudo-differential TIA with a TIA buffer is presented to alleviate the imbalance between its differential outputs. The full-rate CDR has been presented in Ref. [10, 13]. Based on these previous designs, this work focuses on research of the optical front-end, including TIA buffer, DC offset circuit, and digitally tuning gm-TIA variable-gain amplifiers (VGAs). The analysis and measurement of the optical front-end (TIA+VGA) itself and TIA+VGA+CDR are also provided in detail.

# 2. System architecture

Fig. 1 shows the system architecture of the implemented optical receiver chipset, which consists of an optical front-end transimpedance amplifier (TIA), a TIA buffer, 3stage cascaded VGAs, and a reference-less CDR circuit. The tiny optoelectronic current signal generated in the photodetector (PD) is converted to voltage signals by the TIA. After cancelling out the DC offsets due to the single-ended characteristic, the voltage signals pass through the cascaded VGAs to obtain sufficient output swings and then drive the CDR. A reference-less full-rate CDR is adopted to recover the desired data and clock, accordingly. To achieve this, besides a phase detec-



Fig. 3. (a) Schematic of the inverter-based TIA, (b) schematic of the TIA buffer, and (c) post-layout frequency response of TIA and buffer.

tion loop, a frequency detection loop is also implemented to get rid of the external reference clock.

# 3. Circuit implementation

#### 3.1. Optical front-end

As the first critical circuit block at the optical-electrical interface, the optical front-end determines the overall system performance in terms of input optical sensitivity and BW. Fig. 2 depicts the block diagram of the optical front-end. The TIA converts the incoming single-ended photocurrent generated by a commercial PD to be a pseudo-differential voltage signal with 38-dBΩ transimpedance gain and 23.5-GHz –3-dB BW. A differential TIA buffer is then designed to alleviate the pseudo-differential mismatches by adopting shunt inductive peaking and cross-coupled pair. The incoming DC offset is cancelled by the DC offset cancellation (DOC) buffer (20-dB DC offset cancellation capability and 25-GHz BW) before further amplification, such that the following receiver circuits will not be saturated. The 3-stage VGA totally provides 30-dB DC gain and 23-GHz BW. The VGAs have 12-dB DC gain tuning range, to accommodate the different gain requirements under various conditions. To remove the accumulated offset voltages from 3 VGA stages, another DC offset cancellation feedback

loop is incorporated to provide 31-dB DC offset cancellation with a corner frequency at 80 kHz. Overall, the optical frontend provides 72-dB $\Omega$  transimpedance gain, 20.4-GHz –3-dB bandwidth, 12-dB DC gain tuning range, and 51-dB DC offset cancellation capability. At last, the output driver (OD) is used to drive the following CDR circuit.

Fig. 3(a) shows the circuit diagram and design details of the presented pseudo-differential triple-inductive inverterbased TIA. Both the shunt-shunt inductive feedback ( $L_{\rm f}$ ) and input series peaking techniques ( $L_{\rm s}$ ) are adopted to extend the circuit BW<sup>[11, 12, 14]</sup>. This pseudo-differential TIA is optimized based on the single-ended topology presented in Ref. [11]. Because of the single-ended photocurrent input characteristic and differential output requirement, the TIA prefers a pseudo-differential structure<sup>[15]</sup>. Therefore, a dummy TIA is added, as shown in Fig. 3(a). In the dummy TIA, the two on-chip inductors are removed to save more area, and a 320-fF onchip metal capacitor ( $C_{\rm in}+C_{\rm PD}$ ) is adopted.

In Fig. 3(b), the current mode logic (CML) TIA buffer (with 4-dB DC gain and 25-GHz BW) with shunt inductive peaking and the cross-coupled pair mainly has two functions: 1) it acts as a DC level shifter, increasing the previous 0.5-V DC bias voltage to ~0.65 V, so that the following stages can operate in strong saturation region with better linearity; and 2) it



Fig. 4. (Color online) (a) Schematic of the DOC and (b) its post-layout frequency response under different process corners.



Fig. 5. (Color online) (a) Schematic of the high-BW gm-TIA VGA and (b) its post-layout frequency response of one nominal VGA under different process corners.

also further alleviates the mismatch between its differential outputs, to enhance the offset cancellation capability of the DOC stage. The front pseudo-difference TIA delivers the amplifying signal and the reference voltage to the TIA buffer. The active-balun based TIA buffer will help to realize the singleto-differential conversion with mismatch due to the parasitism of M<sub>tail2</sub>. The auxiliary cross-coupled pair can sense the better side signal to compensate the worse side signal. When the dc offset appears at the TIA buffer outputs, the balanced state of cross-couple pair will be changed to unbalance, the unbalanced current rejects the dc offset, which will lighten the load of DOC stage. In other words, the TIA buffer enhances the offset cancellation capability of the DOC stage. Compared with the inverter-based cascode TIA in Ref. [11], this triple-inductive inverter-based TIA plus TIA buffer topology could achieve moderate transimpedance gain, higher bandwidth, and less DC offset mismatch<sup>[16]</sup>. The cross-coupled pair is adopted to compensate for the large loading capacitance from the DOC stage<sup>[17-20]</sup>. The cross-coupled technique improves the bandwidth effectively.

The bonding wire  $L_{BW}$  has ±20% variation tolerance in the post-layout simulation, with a typical value of 0.8 nH. The inverter-based TIA topology is adopted due to better gainpower efficiency compared to conventional differential pair topologies. By doubling gm contributions from both NMOS and PMOS, it achieves a better noise performance with an input-referred noise of 16.3 pA/sqrt(Hz). To provide a stable and lownoise supply for TIA and TIA buffer stages, an on-chip low-dropout regulator (LDO) is adopted to provide a –12-dB suppression of noise from TIA power supply and interference from other circuit blocks up to 20 GHz. Fig. 3(c) shows the post-layout frequency response of this presented TIA and buffer.

Fig. 4(a) shows a schematic of the DOC. It consists of one differential inductor, two differential pairs and two sets of RC low-pass filters. The drains of M2 and M3 are reversely connected, giving a negative gain to cancel out the DC offset from PD and TIA. Fig. 4(b) shows the post-layout simulation response of the DOC. The low frequency gain is designed to be –20 dB, while the mid-band gain remains positive, and the bandwidth is boosted to 25 GHz by inductive peaking. The low cut-off frequency is set to be 15 kHz.

Fig. 5(a) presents the schematic of a single-stage VGA. Each VGA consists of a gm stage and a TIA stage. The gm stage converts the input differential voltages to output differential current signals for better linearity consideration, and then the TIA amplifies the signals to obtain adequate swings. In contrast from the previous inverter-based TIA topology, this TIA has less requirement on the noise performance. Therefore, a common-source NMOS-type TIA with purely resistive feedback is adopted. Fig. 5(b) shows the post-layout frequency response of one single-stage VGA with the 00101 control word under different process corners. Compared to the



Fig. 6. (Color online) (a) Normalized gain and bandwidth variations of the 3-stage VGA under different process corners and temperatures, and (b) simulated gain and bandwidth tuning range for different control words.

low-voltage Gilbert-based VGA<sup>[21]</sup>, gm-TIA based VGA provides a larger gain-bandwidth product. Under a typical corner, each VGA can provide 10-dB DC gain and 29-GHz BW, and together 3-stage VGA can provide 30-dB DC gain and 23.5-GHz BW. However, the DC gain variations under different corners are too large to be ignored.

In the VGA stage, the loading resistance directly affects the gain and the bandwidth of the whole optical frontend<sup>[17]</sup>. Fig. 6(a) shows simulated gain and bandwidth variations of the 3-stage VGA under different process corners and temperatures. For example, under a typical-typical (tt) process corner, when the temperature changes from –10 to 80 °C, the simulation shows that gain and bandwidth vary by 10% and 24%, respectively. For other process corners, the variation is much larger.

To alleviate the process, voltage, temperature (PVT) sensitivity, two resistor networks are used instead of the fixed value resistor, as depicted in Fig. 5(a). The gain and loading resistor networks of the VGA are controlled by a 5-bit digital word from 00000 to 11111. These control signals are carefully designed to turn on or turn off the switching transistors in parallel or series in the resistor networks. Fig. 6(b) presents the simulated gain and bandwidth tuning ranges of the digital controlled 3-stage VGA. The 3-stage VGA has a simulated DC gain tuning range of 15 dB in total with a maximum gain step of 0.8 dB, which is adequate to compensate the gain variation of the whole front-end.

#### 3.2. Clock and data recovery circuit

This reference-less CDR with embedded equalization capability is a dual-loop topology with full-rate operation. Fig. 7 shows the circuit diagram. The 4-stage linear delay chain not only has the advantage of high operating frequency (up to 40 GHz) compared with its traditional alternatives, but also can be modified to be equalizer stages without extra power consumption penalty. The outputs of the delay chain then drive both the phase detector and the frequency detector simultaneously. The phase detector is a linear type as explained in Ref. [13]. The voltage-to-current (V2I) circuits are charge pumps providing output currents to the third-order loop filter, generating the control voltages for the LC-type voltagecontrolled oscillator (VCO). Delay Chain with Built-in Equalization Frequency Detector V21 V21 V21 V21 V21 V21 Low Pass Filter Retiming Recovered Data

Fig. 7. Reference-less 26-Gb/s CDR with built-in equalization.

cial attention should be paid to the interactions between phase and frequency loop. The transition from frequency acquisition to phase capture normally would generate lots of glitches, which could result in a wrong decision, especially at the beginning of the acquisition stage. To remove these glitches, inter-stage buffers are inserted to filter out them. The FD helps to settle the loop control voltage during the initial operating time before it enters idle state. Once the FD loop achieves the locking condition, internal signal will turn off FD and IFD will become zero, then PD loop keeps working to track the phase variation.

Fig. 8(a) depicts the LC-VCO with 2-bit digital control. The post-layout simulation results show that this NMOS-type LC-VCO has a tuning range of 15% centered at 27 GHz. Finally, the recovered clock is also used to retime the received data from the delay chain to obtain the recovered data. Given the LC-VCO and the CDR data input are two major noise sources, the system loop BW determines the trade-off of noise contributions between these two sources. In this work, a loop bandwidth of 20 MHz is chosen for the best jitter performance.

It is worth mentioning here that the VCO needs to drive three different blocks located at different layout positions. Therefore, clock distributions are indispensable, which are implemented as CML buffers with custom-design stacked inductors. By using the frequency detection loop, this CDR becomes a reference-less topology which simplifies the system significantly.

Fig. 8(b) shows a single-stage delay cell with equalization technique. By tuning the resistive and capacitive degeneration, the equivalent zeros and poles can be adjusted to

Due to the reference-less dual-loop characteristic, spe-



Fig. 8. (a) Schematic of the LC-VCO and (b) schematic of a single-stage delay cell with equalization.

achieve gain boosting at higher frequency. To ensure enough BW to support 26-Gb/s data rate, on-chip inductors are utilized to further boost the BW. To save the chip area, stacked spiral inductors are adopted in the CDR design<sup>[13, 22]</sup>. In future research, this technique can also be used in the optical front-end design to further reduce the die area. Each delay cell can provide 3-dB equalization tuning range and achieve maximum peaking frequency of 14 GHz, without affecting its original phase delay function. Therefore, 4-stage cascaded delay cells can provide 12-dB equalization capability in total without extra power consumption.

# 4. Experimental results

The two chips were both fabricated in TSMC 65-nm CMOS technology. The optical front-end consumes 36 mW from a 1-V supply (TIA itself and its buffer have 1.2-V supply since an on-chip LDO is used to provide -12-dB power supply noise suppression), and the reference-less CDR dissipates 104 mW from a 1-V supply, the measured power breakdown is shown in the Fig. 9. The active areas are 0.95 × 0.5 mm<sup>2</sup> and 0.58 × 0.58 mm<sup>2</sup>, respectively. The chips are tested in a chip-on-board (CoB) assembly. The modulated optical light is top-illuminated onto the off-chip PD by a 9-µm single-mode fiber (SMF). Fig. 10 shows the detailed chipset microphotograph.

# 4.1. Electrical measurement results at the front-end output

For the electrical measurement, the frequency response is obtained by measuring the front-end's *S*-parameter with 50-GHz Network analyzer by directly on-chip probing. The measured *S*-parameter is then converted to be the final transimpedance gain by mathematical equations. Fig. 11 shows the final electrical frequency response of the front-end at the maximal gain setting, where the transimpedance gain of 72 dB $\Omega$  and the –3-dB bandwidth of 20.4 GHz are measured. Due to absence of the off-chip PD's loading capacitance, there is a false peaking at 18 GHz and 4.5-dB equalization.

Under room temperature (~27 °C), the VGA has a total measured DC gain tuning range of 11 dB, with a maximum gain step of 0.6 dB. Compared to 15-dB simulated DC gain tuning range, this range is limited by the parasitic resistance, but it is still sufficient to compensate for the designed DC gain tuning range of the overall optical front-end.



Fig. 9. (Color online) The CDR power breakdown.



Fig. 10. (Color online) Chipset microphotograph.



Fig. 11. Measured electrical frequency response of the optical frontend without 250-fF  $C_{PD}$ .

#### 4.2. Electrical measurement results of CDR

With the full-rate reference-less architecture, this work achieves a NRZ data rate of 26 to 28 Gb/s, 0.955-ps (RMS) clock jitter and 2.59-ps (RMS) recovered data jitter under



Fig. 12. (Color online) Measured single-ended PRBS-15 eye diagram at optical front-end output for (a) 26 Gb/s and (b) 30 Gb/s.

| Parameter          | Default | Voltage variation |       | Temperat | ure variation |
|--------------------|---------|-------------------|-------|----------|---------------|
| Control bit        | 11100   | 11100             | 11111 | 11100    | 11000         |
| Voltage supply (V) | 1       | 0.9               | 0.9   | 1        | 1             |
| Temperature (°C)   | 27      | 27                | 27    | 80       | 80            |
| SNR (dB)           | 10.3    | 6.24              | 8.95  | 7.91     | 9.50          |
| RMS jitter (ps)    | 2.37    | 3.50              | 3.10  | 2.69     | 2.29          |
| Output (mV)        | 298     | 286               | 305   | 352      | 341           |

Table 1. Control bits vs voltage and temperature variations at the output of the optical front-end.

104-mW power consumption. This CDR has a reference-less dual-loop architecture with built-in equalization capability. Compared to Refs. [1, 2, 5, 23], this work achieves better jitter performance under the similar power consumption. It is worth mentioning here that an interesting single-loop full-rate bang-bang PAM-4 CDR has been presented in 28-nm CMOS<sup>[24, 25]</sup>. A novel trimodal (NRZ/PAM-4/PAM-8) half-rate bang-bang CDR with excellent energy efficiency has been reported also in 28-nm CMOS<sup>[26]</sup>. Moreover, phase interpolator (PI) is used in Refs. [27–29] to tune the clock phase and optimize the BER. These methods indeed will help to improve the energy efficiency and BER performance in our future research.

# 4.3. Optical measurement results at the front-end output

For the optical measurements, a commercial Vertically Integrated Systems (VISs) 850-nm PD with a responsivity of 0.4 A/W and an operating data rate of 28 Gb/s (Product No.: D20-850C) is utilized to convert the incoming modulated light to photocurrent. Fig. 12 depicts the PRBS-15 optical measurement result at the output of the optical front-end. For this setup, the input light power is -5 dBm. With 26-Gb/s optical data rate, the RMS jitter is 2.4 ps; with 30-Gb/s optical data rate, the RMS jitter is 2.55 ps. Due to limitations of measurement equipment, only data rates up to 30 Gb/s are measured.

To verify the effectiveness of these two resistor networks of the modified gm-TIA VGA, 26-Gb/s optical eye diagrams under different voltages and temperatures are detected and summarized in Table 1. By tuning the control words in VGA, the total transimpedance gain and eye diagram can be optimized. The highest precise temperature (i.e., 80 °C) is limited by the heater in the laboratory. In the first setup, the supply voltage decreases from 1 to 0.9 V. As shown, the measured SNR and jitter deteriorate significantly from 10.3 to 6.24 dB and from 2.37 to 3.5 ps, respectively. By tuning the control



Fig. 13. (Color online) Measured single-ended PRBS-15 eye diagram for 26-Gb/s at chipset output.

bit from 11100 to 111111, the measured SNR and jitter is improved to 8.95 dB and 3.10 ps, respectively. In the second setup, the operating temperature increases from 27 to 80 °C. The measured SNR and jitter deteriorate largely from 10.3 to 7.91 dB and from 2.37 to 2.69 ps, respectively. By changing the tuning bits from 11100 to 11000, the measured SNR and jitter is improved to 9.50 dB and 2.29 ps, respectively. In summary, the optimized setting at 27 °C and 1-V supply is 11100, the optimized setting at 27 °C and 0.9-V supply is 11111, and the optimized setting at 80 °C and 1-V supply is 11000. This demonstrates that the presented 3-stage gm-TIA VGA has an efficient digital tuning capability to successfully overcome different voltage supply and temperature variations.

#### 4.4. Optical measurement results at the chipset output

In this optical front-end and CDR chipset testbench, a PRBS-15 pattern is utilized. Fig. 13 shows the PRBS-15 26-Gb/s optical measurement result at output of the whole chipset consisting of the optical front-end and CDR. The overshooting in the eye diagram stems from the equalization



Fig. 14. (Color online) Measured differential eye diagram for 26-Gb/s PRBS-15: (a) without channel at chipset input, (b) without channel at chipset output, (c) with channel at chipset input, and (d) with channel at chipset output.

tuning of these 4-stage delay chain. The measured RMS jitter is 2.1 ps and the peak-peak jitter is 9.8 ps, respectively. The overall measured optical sensitivity achieving BER of  $10^{-12}$  is -7.3 dBm at 26 Gb/s. This RMS jitter performance is 12.5% better than the value at the output of optical front-end because the equalization compensates for the high-frequency loss, which improves the '0' and '1' transitions, and also with the help of the CDR.

### 4.5. Equalization capability

To demonstrate the capability of the built-in equalization, an attenuator is inserted after the Picosecond 12070 pattern generator, to create the required additional attenuation. This attenuator has 10-dB attenuation at 13 GHz.

Without the attenuator, Figs. 14(a) and 14(b) show the measured eye diagrams at the input of optical front-end and at the output of the CDR, respectively. In this setup, the 26-Gb/s CDR output has a measured RMS jitter of 1.9 ps and a peak-peak jitter of 10.5 ps, respectively.

With the attenuator, as shown in Fig. 14(c), the eye at the input of optical front-end is totally closed. However, when the built-in equalizers are enabled, the 26-Gb/s data eye is recovered successfully with clear quality. As shown in Fig. 14(d), a measured RMS jitter is 2.0 ps and a peak-peak jitter is 13.8 ps, respectively. This demonstrates that the built-in equalization technique can compensate the 10-dB high-frequency loss with only 0.1-ps RMS jitter degradation. When tuning the equalization of the delay cell in the CDR, due to the characteristic of the CLTE, the low-frequency gain is negative compared to its high-frequency gain. Therefore, to achieve a constant CDR output swing, the low-frequency gain of the circuit system can be compensated by tuning the DC gain of



Fig. 15. (Color online) The clock jitter of the recovered clock of the CDR output.

the 3-stage VGA.

Fig. 15 shows the measured RMS jitter of the recovered clock is only 688 fs, which shows that the CDR achieves stateof-art performance. Fig. 16 shows the measured phase noise of the chipset's recovered clocks with and without the 10-dB attenuator at 26 Gb/s, respectively. The recovered clock with the attenuator has a phase noise of –114.5 dBc/Hz at 1-MHz offset, which is only 2.6 dB worse than the result without the channel.

Fig. 17 depicts the chipset's bathtub curves with and without the attenuator. With the criteria of  $10^{-10}$  BER, the measured UI percentages are 18.7% and 14%, respectively. Both the phase noise and bathtub curves comparisons demonstrate that the system can overcome undesired channel attenuation, and recover the data and clock with little performance degradation and no extra power consumption.

Fig. 18 shows the measured optical input sensitivity of the chipset. With the 0.4-A/W 850-nm PD, the chipset can



Fig. 16. (Color online) Measured phase noise performance of the recovered CDR clock: (a) without attenuator and (b) with attenuator.



Fig. 17. (Color online) Bathtub curves with and without the attenuator.



Fig. 18. (Color online) Measured optical input sensitivity.

|           | 1                                   | Ref. [1] | Ref. [2]   | Ref. [5]      | Ref. [23]  | This work |
|-----------|-------------------------------------|----------|------------|---------------|------------|-----------|
|           | CMOS technology                     | 65-nm    | 65-nm      | 130-nm BiCMOS | 45-nm      | 65-nm     |
| System    | Туре                                | Chipset  | Monolithic | Chipset       | Monolithic | Chipset   |
|           | Supply voltage (V)                  | 1.2      | 1/1.2      | 1.2           | 3.3/1.3    | 1/1.2     |
|           | Data rate (Gb/s)                    | 25       | 26.5       | 25.78         | 25.78      | 26        |
|           | Sensitivity (10 <sup>-12</sup> dBm) | -6.8     | /          | -8            | /          | -7.3      |
| Front-End | PD responsivity (A/W)               | 0.47     | /          | /             | 0.5        | 0.4       |
|           | PD cap. (fF)                        | 150      | 40^        | /             | 150        | 180*      |
|           | Gain (dBΩ)                          | 72.5     | 71         | /             | /          | 72        |
|           | BW (GHz)                            | 21##     | 20.4#      | /             | /          | 20.4##    |
|           | Power (mW)                          | 68       | 35.8       | <100          | 70         | 36        |
| CDR       | Rate type                           | Half     | Half       | Full          | Half       | Full      |
|           | Recovered clock<br>PN(@1-MHz)       | -98.3    | /          | /             | /          | -117.1    |
|           | Reference-less                      | Yes      | No         | /             | Yes        | Yes       |
|           | Clock jitter (ps)                   | 1.01**   | 1.28**     | /             | /          | 0.955**   |
|           | Power (mW)                          | 120      | 218        | 100           | 100        | 104       |

Table 2. Comparison with published optical receivers/chipsets.

<sup>^</sup>Calculated value. \*80 fF is the PD capacitance itself. 100 fF is the on-chip PAD capacitance. \*\*RMS jitter. #Simulated result with 40 fF. ##Measured Result without C<sub>PD</sub>.

achieve 26-Gb/s optical data rate with -7.3 dBm sensitivity.

Table 2 summarizes the key receiver performance and compares them with previously reported results. Compared to other standalone optical front-end chips, this work has the

remarkable overall performance with ultra-high energy efficiency. Compared to other work with both optical front-end and CDR, this work has improvement in total power consumption and recovered clock jitter, given the similar technology, data rate, gain, bandwidth, and so on. This indicates that this presented system is competitive with other works.

# 5. Conclusion

A 26-Gb/s CMOS optical receiver chipset including an optical front-end and a reference-less CDR with embedded equalization has been demonstrated in TSMC 65-nm technology. Optical measurements demonstrate that the overall system has  $10^{-12}$  BER at 26 Gb/s for a  $2^{15}$ –1 PRBS input with a -7.3-dBm input sensitivity. Moreover, the built-in equalization inside CDR also presents the capability of overcoming a 10-dB @ 13 GHz attenuator, and recovering the desired clock and data output.

# Acknowledgements

This work was supported in part by Research and Development Program in Key Areas of Guangdong Province under Grant 2019B010116002, in part by the National Natural Science Foundation of China under Grant 62074074, and in part by the Science and Technology Plan of Shenzhen under Grants JCYJ20190809142017428 and JCYJ20200109141225025.

# References

- Chiang P C, Jiang J Y, Hung H W, et al. 4 × 25 Gb/s transceiver with optical front-end for 100 GbE system in 65 nm CMOS technology. IEEE J Solid State Circuits, 2015, 50, 573
- [2] Chu S H, Bae W, Jeong G S, et al. A 22 to 26.5 Gb/s optical receiver with all-digital clock and data recovery in a 65 nm CMOS process. IEEE J Solid State Circuits, 2015, 50, 2603
- [3] Rahman W, Yoo D, Liang J, et al. A 22.5-to-32-Gb/s 3.2-pJ/b referenceless baud-rate digital CDR with DFE and CTLE in 28-nm CMOS. IEEE J Solid State Circuits, 2017, 52, 3517
- [4] Ozkaya I, Cevrero A, Francese P A, et al. A 60-Gb/s 1.9-pJ/bit NRZ optical receiver with low-latency digital CDR in 14-nm CMOS Fin-FET. IEEE J Solid State Circuits, 2018, 53, 1227
- [5] Tsunoda Y, Shibasaki T, Oku H, et al. 25.78-Gb/s VCSEL-based optical transceiver with retimer-embedded driver and receiver ICs. Optical Fiber Communication Conference, 2015
- [6] Lee Y S, Ho W H, Chen W Z. A 25-Gb/s, 2.1-pJ/bit, fully integrated optical receiver with a baud-rate clock and data recovery. IEEE J Solid State Circuits, 2019, 54, 2243
- [7] Han J, Choi B, Seo M, et al. A 20-Gb/s transformer-based currentmode optical receiver in 0.13-μm CMOS. IEEE Trans Circuits Syst II, 2010, 57, 348
- [8] Komatsu Y, Shinmyo A, Kato S, et al. A 0.25–27-Gb/s PAM4/NRZ transceiver with adaptive power CDR and jitter analysis. IEEE J Solid State Circuits, 2019, 54, 2802
- [9] Moayedi Pour Fard M, Liboiron-Ladouceur O, Cowan G E R. 1.23pJ/bit 25-Gb/s inductor-less optical receiver with low-voltage silicon photodetector. IEEE J Solid State Circuits, 2018, 53, 1793
- [10] Wu K C, Jri L. A 2 × 25Gb/s receiver with 2:5 DMUX for 100Gb/s Ethernet. IEEE J Solid-State Circuits, 2010, 45, 2421
- [11] Wang Y P, Lu Y, Pan Q, et al. A 3-mW 25-Gb/s CMOS transimpedance amplifier with fully integrated low-dropout regulator for 100GbE systems. 2014 IEEE Radio Freq Integr Circuits Symp, 2014, 275
- [12] Pan Q, Wang Y P, Lu Y, et al. An 18-Gb/s fully integrated optical receiver with adaptive cascaded equalizer. IEEE J Sel Top Quantum Electron, 2016, 22, 361
- [13] Sun L, Pan Q, Wang K C, et al. A 26–28-Gb/s full-rate clock and data recovery circuit with embedded equalizer in 65-nm CMOS.

IEEE Trans Circuits Syst I, 2014, 61, 2139

- [14] Liu L X, Zou J, En Y F, et al. A high gain wide dynamic range transimpedance amplifier for optical receivers. J Semicond, 2014, 35,015001
- [15] Dong Y, Martin K W. A 4-Gbps POF receiver using linear equalizer with multi-shunt-shunt feedbacks in 65-nm CMOS. IEEE Trans Circuits Syst II, 2013, 60, 617
- [16] Pan Q, Luo X S. A 58-dBΩ 20-Gb/s inverter-based cascode transimpedance amplifier for optical communications. J Semicond, 2022, 43, 012401
- [17] Chen Y, Mak P I, Boon C C, et al. A 36-Gb/s 1.3-mW/Gb/s duobinary-signal transmitter exploiting power-efficient cross-quadrature clocking multiplexers with maximized timing margin. IEEE Trans Circuits Syst I, 2018, 65, 3014
- [18] Chen Y, Mak PI, Boon C C, et al. A 27-Gb/s time-interleaved duobinary transmitter achieving 1.44-mW/Gb/s FOM in 65-nm CMOS. IEEE Microwave Wirel Compon Lett, 2017, 27, 839
- [19] Zhao X T, Chen Y, Mak P I, et al. A 0.0018-mm<sup>2</sup> 153% lockingrange CML-based divider-by-2 with tunable self-resonant frequency using an auxiliary negative-g<sub>m</sub> cell. IEEE Trans Circuits Syst I, 2019, 66, 3330
- [20] He J, Zhang Y G, Liu H, et al. A 56-Gb/s reconfigurable siliconphotonics transmitter using high-swing distributed driver and 2tap in-segment feed-forward equalizer in 65-nm CMOS. IEEE Trans Circuits Syst I, 2022, 69, 1159
- [21] Kong L S, Chen Y, Boon C C, et al. A wideband inductorless dB-linear automatic gain control amplifier using a single-branch negative exponential generator for wireline applications. IEEE Trans Circuits Syst I, 2018, 65, 3196
- [22] Chen Y, Mak P, Yu H, et al. An area-efficient and tunable bandwidth- extension technique for a wideband CMOS amplifier handling 50+ Gb/s signaling. IEEE Trans Microwave Theory Tech, 2017, 65, 4960
- [23] Wang J, Pan Q, Qin Y, et al. A fully integrated 25 Gb/s low-noise TIA+CDR optical receiver designed in 40-nm-CMOS. IEEE Trans Circuits Syst II, 2019, 66, 1698
- [24] Zhao X T, Chen Y, Mak P I, et al. A 0.0285mm<sup>2</sup> 0.68pJ/bit singleloop full-rate bang-bang CDR without reference and separate frequency detector achieving an 8.2(Gb/s)/μs acquisition speed of PAM-4 data in 28nm CMOS. 2020 IEEE Custom Integrated Circuits Conference, 2020, 1
- [25] Zhao X T, Chen Y, Wang L, et al. A sub-0.25pJ/bit 47.6-to-58.8Gb/s reference-less FD-less single-loop PAM-4 Bang-Bang CDR with a deliberately-current-mismatch frequency acquisition technique in 28nm CMOS. 2021 IEEE Radio Frequency Integrated Circuits Symposium, 2021, 131
- [26] Zhao X T, Chen Y, Mak P I, et al. A 0.14-to-0.29-pj/bit 14-GBaud/s trimodal (NRZ/PAM-4/PAM-8) half-rate bang-bang clock and data recovery (BBCDR) circuit in 28-nm CMOS. IEEE Trans Circuits Syst I, 2021, 68, 89
- [27] Balachandran A, Chen Y, Boon C C. A 32-Gb/s 3.53-mW/Gb/s adaptive receiver AFE employing a hybrid CTLE, edge-DFE and merged data-DFE/CDR in 65-nm CMOS. 2019 IEEE Asia Pacific Conference on Circuits and Systems, 2019, 221
- [28] Liao Q W, Zhang Y G, Ma S Y, et al. A 50-Gb/s PAM-4 silicon-photonic transmitter incorporating lumped-segment MZM, distributed CMOS driver, and integrated CDR. IEEE J Solid State Circuits, 2022, 57, 767
- [29] Zhong M, Wang Q W, Chen Y, et al. A 4 × 25-Gb/s serializer with integrated CDR and 3-tap FFE driver for NIC optical interconnects.
  2021 IEEE International Conference on Integrated Circuits, Technologies and Applications, 2021, 255

## Q Pan et al.: A 26-Gb/s CMOS optical receiver with a referenceless CDR in 65-nm CMOS

#### 10 Journal of Semiconductors doi: 10.1088/1674-4926/43/7/072401



**Quan Pan** (S'08–M'14) received his B.S. degree in Electrical Engineering (EE) at University of Science and Technology of China (USTC) in 2005, and his Ph.D. degree in Electronics and Computer Engineering (ECE) at the Hong Kong University of Science and Technology (HKUST) in 2014. He is an Assistant Professor at School of Microelectronics, Southern University of Science and Technology since 2018. His research interests include highspeed optical transceiver, wireless, and wireline circuit design.



Xiongshi Luo received B.S. degrees in microelectronics science and engineering from Southern University of Science and Technology, Shenzhen, China, in 2019. He is currently pursuing a M.S. degree at the same university. His research interests include high-speed serial links and optical interconnects.





**Fuzhan Chen** received a M.S. degree in microelectronics engineering from University of the Chinese Academy of Sciences, Beijing, China, in 2020. He is currently pursuing a Ph.D. degree in Electronics and Computer Engineering at Hong Kong University of Science and Technology under the joint Ph.D. training program in partnership with Southern University of Science and Technology. His research interests include high-speed optical communication and wireline IC design.

Xuewei Ding received the M.S. Degree in microelectronics engineering From Harbin Institute of Technology, Harbin, China, in 2007. He is currently working in ZTE. His reach interests include mix-signal design, high speed SerDes, both analog base & ADC base, high performance & high frequency PLL, etc.



Zhenghao Li received B.S. degrees in microelectronics science and engineering from Southern University of Science and Technology, Shenzhen, China, in 2020. He is currently working toward a Ph.D. degree at the same university. His research interests include high-speed optical receiver front-end circuits and wireline circuit design.



**C. Patrick Yue** (SM'05–F'15) received a B.S. degree (Hons.) from the University of Texas at Austin, TX, USA, in 1992, and M.S. and Ph.D. degrees in electrical engineering from Stanford University, CA, USA, in 1994 and 1998, respectively. His research interest includes optical wireless physical layer circuits and systems, highspeed wireline communication SoC, millimeter-wave communication and sensing circuits, indoor positioning and image processing technologies for robotic applications, and edge computing accelerator design for IoT applications.



Zhengzhe Jia is an undergraduate student and is currently working toward a B.S degree at School of Microelectronics, Southern University of Science and Technology, Shenzhen, China. His research interests include highspeed optical receiver front-end circuits and CMOS analog-integrated circuits, including the CDR circuit.