EDGSim: Simulating the European Data Grid


EDGSim is a simulation of the flow of computational jobs, physics data and information around a computational grid based on the software being developed by the European Data Grid project. It was created with the Ptolemy II software, using a "discrete event" framework. This essentially means that any change in state in the simulated system (e.g. a job being generated, a file transfer beginning, an information update) is placed as a time-stamped event in a queue, and the events are then dealt with in chronological order.

Ptolemy II is object oriented, and a PTII application consists of objects known as Actors, governed by a Director object which controls the system parameters and the event queue. The Actors communicate by exchanging Data Tokens, which are wrapper objects containing anything from an integer to a complex object in its own right.

In EDGSim each entity in the virtual grid is represented by one of these Actor objects, and any kind of communication, whether it is a message passed by the Information Services or a job being submitted, is represented by a Data Token. The Actors are assembled in an XML-based GUI, which displays their interconnections:

Layout of EDGSim in PTII GUI

Two of the simplifications in the EDGSim structure can be seen here. The UserInterface object currently generates all of the jobs in a run of the simulation, rather than UI objects being present at the member nodes in the grid. The Backbone is effectively the network, a single object connecting the member nodes of the grid. At present there is no concept of network topology, although large data files do take time to transfer based on file size and bandwidth, and there is a basic representation of network load.

The objects labelled Site0 to Site9 are in fact composite Actors, representing the nodes in the grid. They contain a representation of the components at that location, such as the example below.

A node of the grid in EDGSim

As can be seen here, this site has a Storage Element, and a Compute Element that manages ten worker machines. The Resource Broker and Replica Catalog are also located here. (This is largely for convenience, as they could be located at separate sites.) These are all connected to a NIC Actor, which manages the connection of these other entities to the outside world, and keeps track of the transfer of data files in and out of the local Storage Element.

In a run of EDGSim, each Storage Element can be given a quota of data files which they will have the task of permanently hosting. These files are registered with the Replica Catalog. Their logical file names are also passed to the User Interface object. The available resources are all registered with the Resource Broker.

After this initialisation phase, the User Interface begins generating jobs. These will require a certain number of CPU cycles and will occupy a given amount of memory while doing so. They will also require a set of data files, chosen randomly from those available in the SEs across the grid. These jobs are generated for a set period of time, according to a given distribution, and the simulation runs until the last job has been processed.

The job is first sent to the Resource Broker, which passes the list of logical names of required data files to the Replica Catalog. These LFNs are resolved into the list of physical file names corresponding to each, and this information is sent back to the RB. Another couple of simplifications are made at this point. Upon receipt of the PFN details, the RB merely chooses the original copies of the data files to make new replicas from, rather than any replicas that have already been made. It also does not request any CE information updates, but instead relies upon the regular push of information from the CEs.

As yet there is no representation of the option for the user to specify their preferences for the resource which will run their job. Instead the Resource Broker Actor can be set up with one of a variety of scheduling algorithms, which will be used to assign all of the jobs in that run. When this decision has been made, the job is sent to the chosen CE. At the same time, requests are sent to the SEs hosting the chosen data files to begin transfer to the destination SE.

The job will be assigned to a free machine at the CE, or queued until one becomes available. As long as at least one of the required data files is present at the local SE, the job can begin to run. Any data files that are not present will be requested by the worker machine, and the SE will attempt to acquire them. Once all files have been run over, the job has completed, and it is removed from the system. The results of the run, including information about the performance of individual jobs as well as the system as a whole, are output to a plain text file.

It should be pointed out that so far no Replica Manager has been implemented to transfer data files independently of specific requests for jobs. If a file is required at a site and is not already present on the local SE, it will be copied over. If the cache does not have the necessary space for the file, the least recently accessed file will be removed to make way for the new replica. The Replica Catalog merely registers the creation and deletion of replicas, and resolves logical file names into physical file names upon request by the Resource Broker.

The description above is EDGSim as it currently stands. Many of the simplifications in place are intended as place-holders, and will be expanded upon later. The object-based nature of the simulation should make it easier to carry out these upgrades, and also to introduce any changes in the structure of the EDG.

Click here for sample results from the running of EDGSim, and here to return to the index page.


Last modified: Tue Nov 5 14:14:59 GMT 2002