

#### BCM FPGA Firmware v4 Code/Design Review

Aleš Svetek J. Stefan Institute, Ljubljana



CERN, 2011-05-05



■ BCM FPGA Main Tasks
■ Upgrade v3 → v4
■ BCM FPGA Data Flow
■ BCM FPGA Firmware v4 Design
■ FPGA Resource Utilization
■ Design Status





- ■DAQ of sensor data at 2.56 GHz (64 samples at 390 ps for each BC)
- Beam Monitor → Controls Interlocks Beam User (CIBU), Detector Safety System (DSS) and Post-Mortem Buffer
- Luminosity Monitor
- **#**TDAQ ROD functionality
- **#**CTP triggers
- # Detector Control System (DCS)





#On-board (system) MGT synchronization support

- #Adapt to channel remapping
  - (8 LG channels → Beam\_Abort\_ROD)
  - (8 HG channels → Lumi\_ROD)
- **#**Redesign Basic Beam Abort Algorithm
- **#**Redesign CTP trigger outputs
- Integrate Test Vector "play back"
- #Gb Ethernet (TCP, UDP) faster Post-Mortem buffer download
- # Prepare only 1 FPGA firmware, final operation defined by SW









- Support GbE UDP and TCP/IP communication
- # Execute the MGT calibration algorithms
- **#**Generate and load MGT test vectors
- #Startup Built-in Self Test(BIST), DDR and DDR2 RAM
- **#**Provide additional debug information

#### **Processor System Architecture**





CERN, 2011-05-05

7 : IJS

# Clock Domain Crossing and Sync.

- Edge detector
- 2-stage synchronizer
- Pulse\_sync\_1\_way
- Pulse\_sync\_2\_way

• (FIFOs)









- #Reset generator (reset sequencing of PPC405, PLB Bus, Peripherals)
- Clock generator (1x 300 MHz, 2x 100 MHz, 2x 200 MHz, 1x 50 MHz)
- **#JTAG** controller
- #Interrupt controller (intc)
- **#**UART/RS-232
- #Watchdog timer





- # Available well-known socket communication APIs
- # Post-Mortem buffer dump
- **#**Startup parameter configuration from OKS
- DCS slow control
- **#**Syslog daemon channel
- **# Command-line/Telnet interface to PPC** (used to read/write to any register, parameter reconfiguration, diagnostics)

# Gb Ethernet throughput (LWIP)



■ Benchmark: iperf application for measuring maximum TCP and UDP bandwidth performance
 ■ Using MTU 1500 (Maximum Transmission Unit)
 ■ Using open-source LWIP (Lightweight IP) stack: sustained throughput (BCM FPGA → PC)

#### 11 MB/s via TCP

| Bandwidth Monitor                                                                                  |
|----------------------------------------------------------------------------------------------------|
| Intel(R) PRO/1000 GT Desktop Adapter - VirtualBox Bridged Networking Driver Miniport [192.168.1.1] |
| 11.0 MB/s 10:55:06 10:55:36 10:56:06 10:56:36 10:56:36 10:57                                       |
| 9.82 MB/s-                                                                                         |
| 8.59 MB/s                                                                                          |
| 7.36 MB/s                                                                                          |
| 6.13 MB/s                                                                                          |
| 4.91 MB/s-                                                                                         |
| 3.68 MB/s-                                                                                         |
| 2.45 MB/s-                                                                                         |
| 1.23 MB/s-                                                                                         |
| (Max: 10.8 MB/s) Down: 10.1 MB/s Up: 189.7 KB/s                                                    |

#### 25 MB/s via UDP





# # Commercial TCP/IP stack solution # Using the same FPGA hardware: MTU 1500 → 27 MB/s (213 Mbps) via TCP MTU 9000 → 115 MB/s (922 Mbps) via TCP # Price: 20.000 €

Source: Xilinx Application Note XAPP1043: Measuring Treck TCP/IP Performance Using the XPS LocalLink TEMAC in an Embedded Processor System









#### MGT Interface





CERN, 2011-05-05





#### MGT RX Operating Mode





#### MGT Receive Path





(Fine Delay implemented as a "RX slide" MGT feature.)







19

■ Separate 256 MB of DDR2 RAM in 2x128 MB buffers ■ Simultaneous read/write → MPMC

- Maximum write speed on 1 MPMC port: 1600 MB/s
- ■Actual data: 2560 MB/s → reduce the amount of recorded data, reduce resolution from 390 to 780 ps



#### NPI Data Path





# NPI Signaling (FIFO empty)



| Waveform - DEV:1 MyDevice1 (XC4VF)        | X60) UN | NIT:0 N | MLAO (ILA)    |            |                |              |            |             |              |                       |             |                      |            | 호 다 🏼        |
|-------------------------------------------|---------|---------|---------------|------------|----------------|--------------|------------|-------------|--------------|-----------------------|-------------|----------------------|------------|--------------|
| Bus/Signal                                | х       | 0       | 355           | 360        | 365            | 370          | 375        | 380         | 385          | 390                   | 395         | 400                  | 405        | 410          |
| - NPI FIFO Push                           | 1       | 1       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - NPI Addr Ack                            | 0       | 0       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - <mark>NPI Addr Req</mark>               | 0       | 1       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - NPI Addr Inc                            | 0       | 0       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - User FIFO NPI empty                     | 0       | 1       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - <mark>User FIFO NPI Rd En</mark>        | 1       | 1       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - User FIFO NPI Underflow                 | 0       | 0       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - <mark>User FIFO NPI Wr</mark>           | 1       | 1       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - User FIFO NPI Full                      | 0       | 0       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| - <mark>User FIFO NPI Overflow</mark>     | 0       | 0       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| ⊶ <mark>Burst Cnt</mark>                  | 19      | 00      | B) 1C )(1D)(1 | EX1FX 00 X | 01)(02)(03)( 0 | 04 X05X06X07 | 08 (09)(04 | Ховх ос хор | (OE)(OF)( 10 | <u>) (11)(12)(13)</u> | 14 (15)(16) | <u>17) 18 (19)(1</u> | A)(1B)( 1C | (1D)(1E)(1F) |
| -Burst Cnt En                             | 1       | 0       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| -Burst Cnt Rst                            | 0       | 0       |               |            |                |              |            |             |              |                       |             |                      |            |              |
| ← <mark>Fifo NPI wr count</mark>          | 00      | 00      |               |            |                |              |            | 00          |              |                       |             |                      |            |              |
| ⊶ <mark>User FIFO NPI Wr Count MAX</mark> | 32      | 32      |               |            |                |              |            | 32          |              |                       |             |                      |            |              |
| 🗢 <mark>Irg_Status Reg</mark>             | 0       | 0       |               |            |                |              |            | 0           |              |                       |             |                      |            |              |
|                                           |         |         |               |            |                |              |            |             |              |                       |             |                      |            |              |



# NPI Signaling (FIFO not empty)



| Waveform - DEV:1 MyDevice1 (XC4V          | FX60) UI | NIT:O N | MyILAO (ILA)     |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
|-------------------------------------------|----------|---------|------------------|-------------|---------------|------------------|------------------|-----------------------|----------------|--------------------|-------------|----------------|-----------------|----------------|---------------|
| Bus/Signal                                | х        | 0       | 1020             | 1025        | 1030          | 1035             | 1040             | 1045                  | 1050           | 1055               | 1060        | 1065           | 1070            | 1075           | 1080          |
| -NPI FIFO Push                            | 1        | 1       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| - NPI Addr Ack                            | 0        | 0       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| - <mark>NPI Addr Req</mark>               | 0        | 0       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| - NPI Addr Inc                            | 0        | 0       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| - <mark>User FIFO NPI empty</mark>        | 0        | 0       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| -User FIFO NPI Rd En                      | 1        | 1       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| - <mark>User FIFO NPI Underflow</mark>    | 0        | 0       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| - <mark>User FIFO NPI Wr</mark>           | 1        | 1       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
|                                           | 0        | 0       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| - <mark>User FIFO NPI Overflow</mark>     | 0        | 0       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| ⊶ <mark>Burst Cnt</mark>                  | 00       | 1E      | 9/1A/1B/1C/1D/1E | 1FX 00 X01X | 02\03\04\05\0 | 6\07\08\09\0A\0E | 3/0C/OD/OE/OF/10 | )(11)(12)(13)(14)(15) | 16\17\18\19\1A | (1B(1C)(1D)(1E)(1F | X 00 X01X02 | 03\04\05\06\07 | X08X09X0AX0BX0C | 0D/0E/0F/10/11 | 12/13/14/15/1 |
| -Burst Cnt En                             | 1        | 1       |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |
| -Burst Cnt Rst                            | 0        | 0       |                  |             |               |                  |                  |                       |                |                    | 1           |                |                 |                |               |
| ⊶ <mark>Fifo NPI wr count</mark>          | 0A       | 13      | 1                |             |               |                  |                  |                       | 0E             |                    |             |                |                 |                |               |
| ∽ <mark>User FIFO NPI Wr Count MAX</mark> | 32       | 32      | :                |             |               |                  |                  |                       | 32             |                    |             |                |                 |                |               |
| ∽ <mark>Irq_Status Reg</mark>             | 0        | 0       |                  |             |               |                  |                  |                       | 0              |                    |             |                |                 |                |               |
|                                           |          |         |                  |             |               |                  |                  |                       |                |                    |             |                |                 |                |               |







- **#**Based on pulse reconstruction on 64-bit data
- **#**Reconstruct max. 2 pulses in one BC sample
- #Count number of pulses (hits)
- #Each pulse encoded in:
- 6-bit rising edge position
- 5-bit pulse width
- Calculate collisions, background events and lumi conditions by applying time-windows
- **#**Provide 176-bit data stream to TDAQ
  - $(8 \text{ ch} \times 2 \text{ pulses} \times (6 \text{-bit} + 5 \text{-bit}))$



# Pulse Reconstruction 1/2



- Calculate rising (RE) and falling(FE) edges in a sample
- Search for first bit set ("1") from forward (FWD) and reverse (REV) direction on RE and FE →
- **#**Pulse 1: position = FWD\_RE

width = FWD\_FE - FWD\_RE

- # Pulse 2: position = REV\_RE
   width = REV\_FE REV\_RE
- #Examples follow









#### Pulse Reconstruction Simulation







#### Pulse Reconstruction Simulation





#### Pulse Reconstruction Simulation



| N | ame                  | ¥alue             |   | 360 ns          | 370 ns | 380 ns | 390 ns |
|---|----------------------|-------------------|---|-----------------|--------|--------|--------|
|   | Pulse reconstruction |                   |   |                 |        |        |        |
|   | 🇤 clk_bc_i           | 0                 |   |                 |        |        |        |
|   | ₩ <sub>a</sub> rst_i | 0                 |   |                 |        |        |        |
|   | Input RAW data       |                   |   |                 |        |        |        |
| ۲ | 🌄 raw_data_i[63:0]   | 00000000000000000 | 0 | 7fc00000001ff80 |        |        |        |
|   | Pulse 1              |                   |   |                 |        |        |        |
| ۲ | 📲 re_pos_pulse1[5:0] | 0                 | 0 | X               | 7      |        |        |
| Þ | nidth_pulse1[4:0]    | 0                 | 0 |                 | 10     | X_     |        |
|   | 🇤 valid_pulse1       | 0                 |   |                 |        |        |        |
|   | Pulse 2              |                   |   |                 |        |        |        |
|   | 🏹 re_pos_pulse2[5:0] | 0                 | 0 | X               | 50     |        |        |
|   | 📲 width_pulse2[4:0]  | 0                 | 0 |                 | 9      |        |        |
|   | 🗤 valid_pulse2       | 0                 |   |                 |        |        |        |



| Name                 | ¥. | 1       | 1650 ns | 660 ns         | 670 ns | 680 ns |
|----------------------|----|---------|---------|----------------|--------|--------|
| Pulse reconstruction |    |         |         |                |        |        |
| 🗤 clk_bc_i           | 1  |         |         |                |        |        |
| 🌆 rst_i              | 0  |         |         |                |        |        |
| Input RAW data       |    |         |         |                |        |        |
| ▶ 🔣 raw_data_i[63:0] | fí | 00000 ) |         | FFFFFFFFFFFFFF |        |        |
| Pulse 1              |    |         |         |                |        |        |
| re_pos_pulse1[5:0]   | 0  |         |         |                |        |        |
| Width_pulse1[4:0]    | 0  |         |         |                |        |        |
| 🇤 valid_pulse1       | 0  |         |         |                |        |        |
| Pulse 2              |    |         |         |                |        |        |
| re_pos_pulse2[5:0]   | O  |         |         |                |        |        |
| Width_pulse2[4:0]    | O  |         |         |                |        |        |
| 🇤 valid_pulse2       | Ο  |         |         |                |        |        |



| Na | ame                  | ٧. |         | 1800 ns    | 810 ns       | 820 ns | 1830 ns | 1840 ns | 850 |
|----|----------------------|----|---------|------------|--------------|--------|---------|---------|-----|
|    | Pulse reconstruction |    |         |            |              |        |         |         |     |
|    | 🇤 clk_bc_i           | 1  |         |            |              |        |         |         |     |
|    | 🇤 rst_i              | 0  |         |            |              |        |         |         |     |
|    | Input RAW data       |    |         |            |              |        |         |         |     |
| ۲  | 🌄 raw_data_i[63:0]   | 71 | 00000 ) | <b>7</b> f | fffffffffffe |        |         |         |     |
|    | Pulse 1              |    |         |            |              |        |         |         |     |
| ۲  | 📲 re_pos_pulse1[5:0] | 1  |         | 0          |              | 1      |         |         |     |
| ÷  | 📲 width_pulse1[4:0]  | 31 |         | 0          |              | 31     |         |         |     |
|    | 🇤 valid_pulse1       | 1  |         |            |              |        |         |         |     |
|    | Pulse 2              |    |         |            |              |        |         |         |     |
| ۲  | 📲 re_pos_pulse2[5:0] | 0  |         |            |              |        |         |         |     |
| ۲  | 📲 width_pulse2[4:0]  | 0  |         |            |              |        |         |         |     |
|    | 🗤 valid_pulse2       | 0  |         |            |              |        |         |         |     |







## BCM SLINK/ROD Data Format



#### $P{1,2}{x,w}[n]$ refers to pulse 1/2 position/width for channel n.

| <b>ROD Section</b> | 32-bit Word Counter |                                              | Word Description                                                |
|--------------------|---------------------|----------------------------------------------|-----------------------------------------------------------------|
| HEADER             | 1                   | Start of ROD header                          |                                                                 |
| HEADER             | 2                   | Header size                                  |                                                                 |
| HEADER             | 3                   | ROD version                                  |                                                                 |
| HEADER             | 4                   | ROD source ID (see BcmMapping)               |                                                                 |
| HEADER             | 5                   | o + 31-bit run number                        | 12-bit BCID                                                     |
| HEADER             | 6                   | Extended L1ID (24-bit L1ID + 8-bit ECRC      | + 176-bit of data                                               |
| HEADER             | 7                   | 0x00000 + 12-bit BCID                        |                                                                 |
| HEADER             | 8                   | 0x000000 + 8-bit Level-1 trigger type        | + 4-bit error code                                              |
| HEADER             | 9                   | Detector event type                          | per BC                                                          |
| DATA               | 1                   | 12-bit BCID + 6-bit P1x[0] + 5-bit P1w[0] +  | + 6-bit P2x[0] + 3-bit P2w[0]                                   |
| DATA               | 2                   | 2-bit P2w[0] + 6-bit P1x[1] + 5-bit P1w[1] + | + 6-bit P2x[1] + 5-bit P2w[1] + 6-bit P1x[2] + 2-bit P1w[2]     |
| DATA               | 3                   | 3-bit P1w[2] + 6-bit P2x[2] + 5-bit P2w[2] + | + 6-bit P1x[3] + 5-bit P1w[3] + 6-bit P2x[3] + 1-bit P2w[3]     |
| DATA               | 4                   | 4-bit P2w[3] + 6-bit P1x[4] + 5-bit P1w[4] + | + 6-bit P2x[4] + 5-bit P2w[4] + 6-bit P1x[5]                    |
| DATA               | 5                   | 5-bit P1w[5] + 6-bit P2x[5] + 5-bit P2w[5] + | + 6-bit P1x[6] + 5-bit P1w[6] + 5-bit P2x[6]                    |
| DATA               | 6                   | 1-bit P2x[6] + 5-bit P2w[6] + 6-bit P1x[7] + | - 5-bit P1w[7] + 6-bit P2x[7] + 5-bit P2w[7] + 4-bit Error code |
| TRAILER            | 1                   | Status word 1 - bit errors                   |                                                                 |
| TRAILER            | 2                   | Status word 2 - count of words with errors   | 3                                                               |
| TRAILER            | 3                   | Number of status words                       |                                                                 |
| TRAILER            | 4                   | Number of data words                         |                                                                 |
| TRAILER            | 5                   | Status block position (0=bef,1=aft data w    | ords)                                                           |
|                    |                     |                                              |                                                                 |

https://twiki.cern.ch/twiki/bin/view/Atlas/BcmRod















35 **: IJS** 



#### 

#### Basic Abort:





36

- **# Basic Beam Abort** (desribed on previous slide)
- **# X-of-Y** : takes into account last Y Basic Abort results and demands that at least X of them will fire before it issues an abort condition.
- **# Forgetting Factor** (Leaky bucket) Extension of Basic Abort algorithm. It provides a more dynamic behaviour by "forgetting« past results as they get older.















- #L1ID bookkeeping with ECR load support
- BCID bookkeeping
- **#** Post-Mortem delay
- Regenerate 40 MHz (BC) and 80 MHz from 320 MHz (or use 40 MHz available on the new Personality Modules)
- ■LTP interface, proper latching of LTP signals (*L1A*, *ECR*, *Orbit*, *Trigger Type*)

# 40/80 MHz BC Clock Scheme





CERN, 2011-05-05

40



Figure 3-6: REL Waveform Example









# **Device Utilization Summary**



| Logic Utilization                | Used   | Available | Utilization |
|----------------------------------|--------|-----------|-------------|
| Number of Slice Flip Flops       | 20,797 | 50,560    | 41%         |
| Number of 4 input LUTs           | 28,465 | 50,560    | 56%         |
| Number of occupied Slices        | 21,611 | 25,280    | 85%         |
| Number of bonded IPADs           | 24     | 80        | 30%         |
| Number of bonded OPADs           | 18     | 32        | 56%         |
| Number of bonded IOBs            | 222    | 576       | 38%         |
| Number of BUFG/BUFGCTRLs         | 14     | 32        | 43%         |
| Number of FIFO16/RAMB16s         | 137    | 232       | <b>59%</b>  |
| Number of DCM_ADVs               | 4      | 12        | 33%         |
| Number of PMCDs                  | 1      | 8         | 12%         |
| Number of PPC405_ADVs            | 2      | 2         | 100%        |
| Number of EMACs                  | 1      | 2         | 50%         |
| Number of BUFRs                  | 1      | 32        | 3%          |
| Number of JTAGPPCs               | 1      | 1         | 100%        |
| Number of IDELAYCTRLs            | 10     | 20        | 50%         |
| Number of GT11s                  | 10     | 16        | 62%         |
| Number of GT11CLKs               | 2      | 8         | 25%         |
| Number of RPM macros             | 72     |           |             |
| Average Fanout of Non-Clock Nets | 3.23   |           |             |





# Module Resource Utilization Breakdown



|                                       | Flin Flore Head |           |
|---------------------------------------|-----------------|-----------|
| <b>XPS Synthesis Summary Report</b> * | Flip Flops Used | LUIS USEd |
| proc_system                           | 21715           | 30401     |
| ddr_sdram_wrapper                     | 5620            | 6400      |
| ddr2_sdram_wrapper                    | 3857            | 3104      |
| trimode_mac_mii_wrapper               | 3712            | 3206      |
| mgt_ctrl_0_wrapper                    | 3405            | 3073      |
| abort_ctrl_0_wrapper                  | 1392            | 2626      |
| data_proc_ctrl_0_wrapper              | 899             | 5976      |
| npi_ctrl_0_wrapper                    | 711             | 1177      |
| slink_rod_ctrl_0_wrapper              | 700             | 1073      |
| ltp_ctrl_0_wrapper                    | 655             | 603       |
| xps_central_dma_0_wrapper             | 566             | 1005      |
| ppc405_0_wrapper                      | 381             | 409       |
| xps_intc_0_wrapper                    | 283             | 274       |
| xps_bram_if_cntlr_1_wrapper           | 229             | 184       |
| plb_wrapper                           | 180             | 1034      |
| xps_timebase_wdt_0_wrapper            | 169             | 224       |
| rs232_uart_1_wrapper                  | 148             | 143       |
| leds_8bit_wrapper                     | 128             | 97        |
| proc sys reset 0 wrapper              | 69              | 54        |

\* XPS Synthesis Summary produces approximate report, but it is still relevant to determine relative size of the modules.

44 📫 IJS



Optimization will be applied if necessary
 Trade Ethernet speed for FPGA resources
 Processor System Architecture Redesign
 More than 20% of resources can be saved by:

- reducing DDR 64 MB MPMC to one port
- excluding DMA controller
- excluding Ethernet Checksum HW offloading

■ Matter of 10 minutes

#### Resource optimization: From this...





#### Resource optimization: ...to this.



CERN, 2011-05-05

#### # Completed

- MGT DAQ
- MGT Test Vectors
- Gb Ethernet
- PPC development application

#### **#**To-do

- Finalize SLINK/ROD controller
- LTP interface and BC clock
- Finalize PPC application
- Slight modification of pulse reconstruction





#### Thank you!



