Day 2: 16 April 2003 ==================== Pete Clarke, Paul Mealor, Eric Boyd, Matt Zekauskas, Warren Matthews, Mark Leese, Yee-Ting Li Edit 2/7/2003: Eric's notes here: http://people.internet2.edu/~eboyd/ucl_workshop.html What piPES want (Eric) =============== List of things that need to get done (including the piPES internal stuff that they're working on) ----> indicates a point to do. Have two PMPs and want to run a test between them. Focussing on OWAMP, Iperf, Traceroute. There is a "master" script for performing the test. The tests are all special cases as it turns out. They have "Collecter" which *writes* to the database. The database is set out for writing, not particular reading. There is aggregation at the master too. They have a web interface with CGI scripts to read the database. They have been having somewhat poor performance on the database. It's MySQL. Specifically the database is set up for writing. ----> The database needs to be optimised. The also looked at the NMWG design document, but their design may not be that close to it. ----> That therefore needs to be improved. The reading design/interface needs to closely match the NMWG document. They made trade-offs between following NMWG and sensible design. Maybe there is a better way. They want to put a web service between the database between the database and any clients (clients like the web interface). This is probably already covered by Warren. Pete: perhaps we should look at OGSA-DAI. IBM have written an implementation of it. Warren: The current trend, though, in web interfaces is Python and Perl. It is worth looking at though. Pete: But OGSA *will* be the standard. Getting further out now ----> Beef up the web interface. Pete: From the applications viewpoint it would be interesting to be able to do sensible stuff. Eric: The database they're particularly thinking about is the AMI Observatory* interface - the interface onto the AMI stuff they leave permanently at the observatory. Obviously this has to be exactly what Rick wants (which will have lots of cool stuff, anyway). Pete: Clearly it would be difficult to do the AMI thing (as we are so distant). So it would be good to be able to write stuff that we can show the real applications. The AMI people would probably then just nick our stuff and use it :) ----> Build a web service to control the "master" ----> Plugging more stuff into Monalisa (i.e. getting stuff from the web interface->database) The Master works on a case-by-case basis for scheduling... that is: OWAMPMaster checks that the far side is up - out-of-band checker to make sure that measurements are valid or whatnot. IperfMaster has proper scheduling - checking that the far side has no other measurements running, and suchlike. Ok, so they have a mesh system where all the measurements they want to make (in a full mesh) are just shown. But they don't have any interesting web interface where they say what measurements to make across the meshes. They also have no particular way to do on-demand tests. At the moment, there is a master node, to which you log in and modify the mesh list (to do an on-demand test, say). Then rsync is used to spread the config to all the other systems. ----> Do not want any GPL in the code. They want to be able to give the code to *anybody* without any problems. They can't do good technology transfer to companies with the GPL. Other (boring stuff) to do: ----> Build a decent graphing program (that's not as shit as GnuPlot) AAA stuff to do: Infrastructure Designing Roles Demo with CITI (Andy Adamson) (uses some GSI stuff and the like), Demo with Gridsite, Demo with Shibboleth (this is being done internally with internet 2) * Observatory: A place where there are lots of monitoring boxes, plus the ability to plug in other people's monitoring boxes. Web interface stuff: -------------------- Pete: so we write a simple user-driven way of checking to see if there are problems. ie. you just show the measurements and let the user work it out. But maybe this can advance to more automated ways of doing this. Could be that a more complete analysis engine comes out of it. AAA stuff --------- Shibboleth is a plugin to Apache. No idea how it works. There must be a way to plug in any sort of AAA system. There is a former consultant at internet2: Sushi, who can be hired back. There is also one Sintel. Some other random stuff ----------------------- This is mostly next generation stuff Multiple Databases - database + admin domain Testing/Analysis engine Culprit database Netflow \bringing these in sooner would be good. There are people interested in doing them, and bringing htem in early would be good. SNMB / PMP/PMC discovery +- this will hit us later +- PMC registration and deregistration too `- We have two endpoints and do a traceroute. We want to be able to discover the PMPs that are on that route. (and an end user wants to be able to able to find the PMCs on a route they are interested in) Nicolas was interested in this. Of course, grid services are solutions to the discovery problem, basically. Physical locations of meetings and the like -------------------------------------------- Three levels of tech discussion: 1) Presentation and critique level 2) Present informally to very interested users - the TAK (TAC? TAQ?) 3) Very detailed interaction (like, discussions of code). All either in person or on IRC, or in Email, or possibly a phone call. The right forum for us to interact, is an email list, plus a monthly call. Also, there are three upcoming conferences: GGF Joint Techs (Kansas) So we can plan on getting something started, then actually go and visit and get coding done. Warren Matthews =============== His interpretation of the NMWG doc: He has a series of seperate set of types, all of which can be combined into YTL's set. The idea is that a resource broker would be asked for information, and would go away itself to find the *small* fragments of info which would be the result. YTL's idea is to have the full hierarchy to make it work. Afternoon discussion Questions to answer =================== Peter should be back at 2ish So, four questions to debate: 1) Should the PMC be broken into two components: one of which is a domain-level on, and one which has a 1 to 1 mapping with a PMP (which could be >1 machine) 2) Should the output of the database go back to the PMC interface, or should it go back to the client by some other route. 3) How do we get the sink and the source to both agree to do a test. Also, multicast is a problem. 4) Distributed or centralised scheduling. 4) == Assumptions are: only the NIC of the network monitors is the limiting factor. Design possibility: no centralised scheduler. If a source wants to make a measurement to the sink, then it talks to the domain controller, who says "no", or "yes, here is an IP". Then the Source PMP asks if the sink can handle a particular tools, and it says "no" or "yes". Then the source asks if the sink can do the measurement now, and it says "no, try again in n" or "yes, and I'm ready for you". So we have two interfaces: the domain admin - measurement request interface the domain admin - can I have a PMC ip name 1) PMP v PMP/PMC ================ So we can factorise out: the PMC has all the stuff on, including scheduling. The PMP has the tools, plus a wrapper to extract and store results, and the ability to respond to "Do it" commands from its PMC and only its PMC. 2) == So the domain controller, when asked for a *result* says: * No, sod off * Yep, it's done * Be patient, I'm working on it (and this repeats for a while until a "yes" or "no" appears) So then the database interface is separate, and should be queried seperately. 3) == We pass the certificates of all the people who made the request, or who agreed to make the request on their behalf, to the remote domain controller &c. Final diagram, extracted from the above ======================================= Text for the Visio diagram: Revised Protocol Diagram. These are the bits of text corresponding A) Here is who I am, I would like this result to exist (includes the ip addresses of the routers, and the characteristic, and some sort of newness) B) One of - Rejected (here's why) - Yes, the result exists with a pointer to the results - Be patient, I plan to get back to you later (so I will die, or I will respond with one of the above) C) I would like to contact the PMC, here is who I am, and this is who I am asking on behalf of, and this is the tool I want to run. D) One of - No (with info) - Yes, here's the IP of the sink PMC, and a permission token E) Here is who I, call this sink PMP, with this token. F) Here is my capability (the token) Can we start tool X with certain parameters now? G) Here's who I (the sink PMC) am, start receptor from this source IP. H) One of - OK - Rejected I) One of - No (rejected capability) - No (failed to initiate receptor) - Willing and able, here is the IP of the sink PMP - Ask again in +x time J) Here's who I (the source PMC) am, here is the sink PMP IP address, do this test with these arguments K) The test!!!!! L) Here's who I (the source PMP) am, I have these results from this tool (with some sort of implicit success or failure) M) Result report (the ID of the source PMP and either result ready, result failed (test failed or write failed)) N) Result report (the ID of the source PMC and either result ready, result failed (test failed or write failed)) O) Give me these results, and here is who I am P) One of - Rejected - Here are the results These are the other jobs we need to do ======================================= 1) Database modifications including optimising it, adding a reading set of tables. (the database is optimised for L and not for O) - UCL 2) Database performance is bad. _ internet 2 3) there is no culprit database or interface. - TBD 4) There is no testing and analysis. - Internet2 5) Making sure netflow fits in this scheme. - Internet2 6) Making sure SNMP fits in this scheme. - Internet2 7) Human-based ttesting & analysis engine. - Internet2 8) Backend of database gatekeeper (multiple physical databases) - Internet2 9) Domain interface discovery process - UCL Pete: how does this new stuff fit with stuff which is already done The sink/source/pmp/pmc and DB gatekeeper all exist except not as well split as this. That is, they all work, but need work. Nothing about the domain interface exists. The database layout needs ot be reengineered. Testing & analysis doesn't exist. And nor does a human based testing engine. We don't know if this all works for SNMP or netflow, so that has to be checked. the DB gatekeeper needs to be expanded to handle multiple physical databases. Also needs a domain interface discovery process. How do we find out which domain interfaces that can answer these questions once we have the router names. Action: ======= Paul is responsible for seeing if there is a tool to do O and P. If so, start using it, if not, do something else. So, A-D, O, P go to Yee, Warren and me. E-N goes to the US piPES lot. Ok, plyus all the jobs just above.