From rc@hep.ucl.ac.ukWed Jun 12 14:50:52 1996
Date: Thu, 6 Jun 1996 17:12:57 +0100 (BST)
From: Robert Cranfield <rc@hep.ucl.ac.uk>
To: Gordon CRONE <gjc@hep.ucl.ac.uk>, Owen BOYLE <boyle@na48-1.cern.ch>,
    Robert MCLAREN <mclaren@cn.msm.cern.ch>
Subject: Further ROB-IN discussion at UCL

Points from further discussion of ROB-IN at UCL
===============================================
(RC: 06-Jun-1996)

Discussion between Bob Cranfield, Gordon Crone, John Lane.


1) Usefullness of TTC input to ROB:
-----------------------------------

Basic point from Nick Ellis' note is the following:

For each channel, the event-ID and BCID are provided by separate counters in
the front-end which are incremented by quite different signals (i.e.
LVL1-accepts and beam-crossing signals). It is quite possible, therefore, for
these to get out of sync just for the channel concerned, which could result in
the wrong data-fragment being sent for an event. For example, an extra
spurious LVL1-accept would result in the data for the current BCID being
incorrectly sent as a LVL1 triggered event, whilst a missed LVL1-accept could
cause event-5's data to be sent as event-4.  These mismatches would continue
until corrected, for example, by a periodic counter reset. The TTC info,
however, sends a single event-ID/BCID pairing everywhere simultaneously. The
ROB could therefore use this information to detect mismatches on individual
ROLs and indeed to correct them.

However, the ROB is not the only place this could be done: it could also
be done at the ROD. If the RODs need TTC connections anyway, this would
be cheaper than arranging for TTC input to the ROBs. Moreover it seems
a better place logically, since a ROD may receive several front-end inputs,
each with an event-ID/BCID pairing. If any of these are mismatches it is only
the ROD that can determine which and take appropriate action.

The drawback to the ROD doing this event-ID/BCID match is that the RODs
may not be built to a common design and it may be harder to ensure that
all ROD designers incorporate the correct checking.

There is an alternative to all this, which is for the event-ID to be
passed directly to the front-ends. John, who is involved in the TTC problem
for the Si tracker, was not clear that this should be ruled out.

There is another, less important, potential check that would be provided
by TTC input to the ROB. This is to ensure that data is still flowing
properly on the ROL. Since the TTC input would contain information about
the latest event-IDs it would be possible to know which events were
expected to have arrived from the ROD and thus to detect a dataflow
problem on the ROL. However, there are other ways to obtain this knowledge...


2) RoIR arrival time:
---------------------

If RoIRs always arrive AFTER ROD-data (as stated at the RHUL ROB-IN meeting)
then they could be used to detect possible dataflow problems, indicated by
an RoIR for an event that has not been indexed by the ROB. Similarly,
eventually a decision-record for an event that has not been indexed would
provide a somewhat more delayed indirect test of dataflow.

However, in discussion it seemed that the system CANNOT actually guarantee
that RoIRs arrive AFTER the relevant ROD-data (the system is too
asynchronous).  It is possible to insist that RoIRs AFTER ROD-data are handled
as errors, but this may result in too much data-loss in the LVL2 system
(though the data could still be available for LVL3). On the other hand, if
RoIRs are artificially delayed to reduce such data-loss, the average latency
will be increased. Presumably, these possibilities have to be investigated and
experimented with.

Meanwhile it seems we should re-instate the possibility of RoIRs arriving
BEFORE ROD-data. This is actually not necessarily too much of a problem
(the current buffer-manager software allows for it, for example).

Maybe what we really want to do when we receive an RoIR BEFORE the
relevant ROD-data is to TEST the ROL i.e. check that dataflow has not
been interrupted. This might be done, even with cheap
"more-or-less-unidirectional" links, according to John Lane's suggestion
as follows:


3) Checking the ROL dataflow:
-----------------------------

If we assume that the only affordable ROB->ROD communication is an XOFF/XON
signal, we could adopt the following protocol for testing ROL dataflow:

Whenever the ROD receives an XON from the ROB and it has no data to send
it sends an acknowledge packet instead. (Equivalently: the ROD always sends
data on receipt of an XON, even if this has to be a dummy "event".)

The ROB could use this protocol to test the viability of the ROL whenever
it suspected that dataflow had been interrupted. First, it would only be able
to suspect this if XOFF was not being asserted. If so, the ROB could then
toggle an XOFF/XON combination. It would then expect a reply from the ROD
-- either an event-fragment or an acknowledge packet -- so if it didn't
receive anything it would know there was a problem with the line. 

Additionally, with this protocol the ROB would always receive a packet when
XOFF is released after reboot.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Bob Cranfield

/----------------------------------------------------------------------------\
|     telephone:  +44-(0)171-380-7223 |  High Energy Particle Physics Group, |
|           FAX:  +44-(0)171-380-7145 |  Department of Physics & Astronomy,  |
|  email(TCPIP):  rc@hep.ucl.ac.uk    |  University College London,          |
| email(DECnet):  UCLVA::RC           |  Gower Street, London, WC1E 6BT      |
\----------------------------------------------------------------------------/