Note:
This document provides a very brief overview of XML and details a proposed XML format for Aida 1D and 2D histograms, which when properly implemented would allow Aida histogram data to be easily transferred between applications.
This proposal was put forth by Tony Johnson and Paul Spence of the Stanford Linear Accelerator Center. Please send any comments to project-AIDA-dev@cern.ch.
The Extensible Markup Language (XML) is a document processing standard proposed by the World Wide Web Consortium (W3C) that allows you to create and format your own document markups. There are two files that need to be processed by an XML-compliant application to parse XML content. They are the XML document and the Document Type Definition (DTD).
An XML document file contains the document data, which is tagged with meaningful XML elements, some of which may contain attributes. An element with one attribute is marked with the form <ElementNameTag attribute1 = "attributevalue" > Element Body </ElementNameTag>.
The DTD file specifies the rules for how the XML document elements, attributes, and other data are defined and logically related in an XML-compliant document. The DTD declares each element, their parent/child relationship to other elements, and declares the elements attributes. Comments in a DTD are noted as <!-- This is a DTD comment -->.
The blue text boxes below contain an example of a 1D and a 2D histogram XML document which meet the proposed DTD specifications, and the proposed DTD. It is recommended that you look through these files to determine the proposed XML data structure, however, the following list details some note worthy points about the structure:
a single Aida XML document can store any combination of 1 or more 1D and 2D histograms (i.e. the aida element can have 1 or more histogram1d and/or histogram2d child elements).
the structure optionally supports the storing of any number of statistic elements for each histogram in the document. The statistic elements have the statistics' name and value as attributes.
every histogram must have a child element named bincontents. In between the bincontents opening and closing tags should be a comma seperated list of strings that describe each number that is supplied for the bin data. The acceptable string values are bin, binx, biny, binz, height, error, pluserror, minuserror, entries, ignore, x, y, z. The string values x, y, x, bin ,binx, biny, binz are used to specify which bin the data belongs to. All other strings relate to the actual bin data. The parsing of the data between the opening and closing tags is left to the file reader.
Each bins data is not stored as a separate XML element since this would waste too many resources. Instead, we propose adopting a bin data format and let the XML parsers handle the trivial task of parsing the data. The data format is constrained by a bincontents element and the actual data is listed in a data element. The format rules are outlined below.
Format Rules:
1)Each bin data is ALWAYS on its own line.
2)The bin data is a row of numbers with comma delimeters. It is left up to the user to make sure that the order of the numbers matches the order specified by the bincontents element. For example if the bincontents element lists the following between its opening and closing tags: bin, height, pluserror, minuserror, entries. Then the each row of bin data should look like ####, ####, ####, ####, ####.
3) The data SHALL NOT be tabbed in to make it look nicely formatted. This, although easy to do, would be a very inefficient use of storage. So it is NOT the responsibility of implementations to strip leading whitespace.
4) The bin data for 1d histograms should be
ordered with the axis' lowest edge bin data entered first
and the axis
upper edge bin entered last.
5) The bin data for 2d histograms should
be ordered (as shown in diagram below) by starting from the x axis' lowest
edge bin
and proceeding from the y axis' lower edge bin to the y axis'
upper edge bin. Then moving to the next
lowest bin on the x axis and
proceed back up the y axis.
|
^ ^ ^
^
| ^ ^
^ ^
y | ^ ^
^ ^
| ^
^ ^ ^
|
^ ^ ^
^
_^__^__^__^__
start: x
the structure supports both fixed bin widths and variable bin widths (i.e. the axis element may or may not have a variableWidthBins element as a child). If variable bin widths are applied then a format similar to the bin data is used to save resources. The format has each bin edge value on a separate row between the opening and closing tags of the variableWidthBins element.
the structure optionally maintains the
out of range bin data (i.e. the number of overflow/underflow entries per out
of range bin) for both 1D and 2D histograms. For 1D histograms this is done
with a outOfRangeData1d element that has two attributes, underflow and
overflow. 2D histograms use outOfRangeData2d elements. In between the
outOfRangeData2d elements closing tags should be a list of numbers, one per
line, which specify the number of entries in the out of range bins for 2D
histograms. The 2d out of range bin data should be entered in a counter
clockwise direction starting from the bottom left corner, as shown in the
example diagram
below:
<<<<<
| xxxxxxx ^ 0 = in range bin
data
|
x0000x ^ x = out of range bin
data
e x0000x
^
n x0000x
^
d xxxxxxx
^
start: >>>>>
You may get a copy of the sample documents and DTD by following these links and viewing the source: 1D Histogram XML Document , 2D Histogram XML Document , Proposed DTD.
This is a small example of an Aida 1D histogram, with fixed bin widths, stored as an XML document.
<?xml version="1.0" encoding="ISO-8859-1"
?> |
This is a small example of an Aida 2D histogram stored as an XML document, with both axis set as variable width bins, and the bincontents listed as bin, height, error, entries.
<?xml version="1.0" encoding="ISO-8859-1"
?> |
This is the DTD file which specifies the rules used to make the preceding XML documents. The DTD details the precise proposed data structure for Aida 1D and 2D histograms.
<?xml version="1.0" encoding="ISO-8859-1"
?> <!-- <!-- <!-- <!-- The bincontents element is used to describe the bin data supplied for the histogram. It has no children and a single attribute, 'order', which can have values "xy" "yz". This attribute is only necessary for 2d histograms as it specifies the order that the bin data is listed in. In between the elements opening and closing tags should be a comma seperated list of strings that describe each number that is supplied for the bin data. The acceptable string values are bin, binx, biny, binz, height, error, pluserror, minuserror, entries, ignore, x, y, z. The string values x, y, x, bin ,binx, biny, binz are used to specify which bin the data belongs to. All other strings relate to the actual bin data. The parsing of the data between the opening and closing tags is left to the file reader. --> <!ELEMENT bincontents
(#PCDATA)>
1. It would waste a ton of space. Although
XML is not terse, this would be excessive even by XML standards. Visualize
this repeated 2000 or Format Rules: 1)Each bin data is ALWAYS on its own line. 2)The bin data is a row of numbers with comma delimeters. It is left up to the user to make sure that the order of the numbers matches the order specified by the bincontents element. For example if the bincontents element lists the following between its opening and closing tags: bin, height, pluserror, minuserror, entries. Then the each row of bin data should look like ####, ####, ####, ####, ####. 3)The data SHALL NOT be tabbed in to make it
look nicely formatted. This, although easy to do, would be a very
inefficient use of storage. So it 4)The bin data for 1d histograms should be
ordered with the axis' lowest edge bin data entered first
|