From rc@hep.ucl.ac.ukWed Jun 12 14:50:52 1996 Date: Thu, 6 Jun 1996 17:12:57 +0100 (BST) From: Robert Cranfield To: Gordon CRONE , Owen BOYLE , Robert MCLAREN Subject: Further ROB-IN discussion at UCL Points from further discussion of ROB-IN at UCL =============================================== (RC: 06-Jun-1996) Discussion between Bob Cranfield, Gordon Crone, John Lane. 1) Usefullness of TTC input to ROB: ----------------------------------- Basic point from Nick Ellis' note is the following: For each channel, the event-ID and BCID are provided by separate counters in the front-end which are incremented by quite different signals (i.e. LVL1-accepts and beam-crossing signals). It is quite possible, therefore, for these to get out of sync just for the channel concerned, which could result in the wrong data-fragment being sent for an event. For example, an extra spurious LVL1-accept would result in the data for the current BCID being incorrectly sent as a LVL1 triggered event, whilst a missed LVL1-accept could cause event-5's data to be sent as event-4. These mismatches would continue until corrected, for example, by a periodic counter reset. The TTC info, however, sends a single event-ID/BCID pairing everywhere simultaneously. The ROB could therefore use this information to detect mismatches on individual ROLs and indeed to correct them. However, the ROB is not the only place this could be done: it could also be done at the ROD. If the RODs need TTC connections anyway, this would be cheaper than arranging for TTC input to the ROBs. Moreover it seems a better place logically, since a ROD may receive several front-end inputs, each with an event-ID/BCID pairing. If any of these are mismatches it is only the ROD that can determine which and take appropriate action. The drawback to the ROD doing this event-ID/BCID match is that the RODs may not be built to a common design and it may be harder to ensure that all ROD designers incorporate the correct checking. There is an alternative to all this, which is for the event-ID to be passed directly to the front-ends. John, who is involved in the TTC problem for the Si tracker, was not clear that this should be ruled out. There is another, less important, potential check that would be provided by TTC input to the ROB. This is to ensure that data is still flowing properly on the ROL. Since the TTC input would contain information about the latest event-IDs it would be possible to know which events were expected to have arrived from the ROD and thus to detect a dataflow problem on the ROL. However, there are other ways to obtain this knowledge... 2) RoIR arrival time: --------------------- If RoIRs always arrive AFTER ROD-data (as stated at the RHUL ROB-IN meeting) then they could be used to detect possible dataflow problems, indicated by an RoIR for an event that has not been indexed by the ROB. Similarly, eventually a decision-record for an event that has not been indexed would provide a somewhat more delayed indirect test of dataflow. However, in discussion it seemed that the system CANNOT actually guarantee that RoIRs arrive AFTER the relevant ROD-data (the system is too asynchronous). It is possible to insist that RoIRs AFTER ROD-data are handled as errors, but this may result in too much data-loss in the LVL2 system (though the data could still be available for LVL3). On the other hand, if RoIRs are artificially delayed to reduce such data-loss, the average latency will be increased. Presumably, these possibilities have to be investigated and experimented with. Meanwhile it seems we should re-instate the possibility of RoIRs arriving BEFORE ROD-data. This is actually not necessarily too much of a problem (the current buffer-manager software allows for it, for example). Maybe what we really want to do when we receive an RoIR BEFORE the relevant ROD-data is to TEST the ROL i.e. check that dataflow has not been interrupted. This might be done, even with cheap "more-or-less-unidirectional" links, according to John Lane's suggestion as follows: 3) Checking the ROL dataflow: ----------------------------- If we assume that the only affordable ROB->ROD communication is an XOFF/XON signal, we could adopt the following protocol for testing ROL dataflow: Whenever the ROD receives an XON from the ROB and it has no data to send it sends an acknowledge packet instead. (Equivalently: the ROD always sends data on receipt of an XON, even if this has to be a dummy "event".) The ROB could use this protocol to test the viability of the ROL whenever it suspected that dataflow had been interrupted. First, it would only be able to suspect this if XOFF was not being asserted. If so, the ROB could then toggle an XOFF/XON combination. It would then expect a reply from the ROD -- either an event-fragment or an acknowledge packet -- so if it didn't receive anything it would know there was a problem with the line. Additionally, with this protocol the ROB would always receive a packet when XOFF is released after reboot. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Bob Cranfield /----------------------------------------------------------------------------\ | telephone: +44-(0)171-380-7223 | High Energy Particle Physics Group, | | FAX: +44-(0)171-380-7145 | Department of Physics & Astronomy, | | email(TCPIP): rc@hep.ucl.ac.uk | University College London, | | email(DECnet): UCLVA::RC | Gower Street, London, WC1E 6BT | \----------------------------------------------------------------------------/