|
- Index of parts:
- What is a DTD?
- What is a schema?
- Why use a DTD or Schema?
- How to they go together?
- Where is there further information?
Conventions:
Some text is highlighted up with colour, this is what they mean:
- foo
- This is the name of an element or a data type that has been chosen
by me. It's an example.
- wiz
- This indicates that "wiz" is an XML-Schema defined
name for a "thing" (an element, in XML terms)
- bang
- This indicates that "bang" is an attribute to some
XML-schema element
- boff
- This is a piece of example XML-schema code.
The purpose of a Document Type Definition (DTD) is to define the legal
building blocks of any SGML-based (SGML = Standard
Generalized Markup Language) document. It defines the
document structure with a list of legal elements.
DTD's have been used since the 1970's
Schemata (plural of schema) are a "A
diagrammatic representation; an outline or model."
Something that formally describes the abstract structure of a set
of data can therefor be called schema.
An XML-schema is a document that describes the valid format of an
XML data-set. This definition include what elements are (and are
not) allowed at any point; what the attibutes for any element
may be; the number of occurances of elements; etc..
Note: XML-Schema are not known for their brevity. An XML-Schema
document for a reasonably-sized XML instance-document
will be fairly large. Disk space is cheap and
bandwidth is not a huge bottleneck, so there is no
need to worry about it. It does mean that you will
to alot of typing though.
The majority of XML documents are "well formed" rather than
"valid". The former means that there is exactly one
root element, and every sub-element
(and recursive sub-elements) have delimiting start- and
end-tags, and that they are properly nested within each
other. On the other hand, a valid document is
"well-formed" and conforms to a specified set of
production rules.
To validate an XML document, some form of validating rules need to
be provided. This can be done by any Document Type
Declaration.
An XML-Schema sounds very much like a DTD, however there is are
some critical differences, the most notable being that
XML-Schema can deal with name-spaces, and DTD's can't (see
the sidebar at
http://www-106.ibm.com/developerworks/xml/library/xml-schema/#sidebar1
for some of the limitations of a DTD)
As the main reason for using a schema instead of a DTD is the
ability to mix namespaces, it must be mentioned that XML-schema
are very dependent on namespaces - so we need to go over them
first.
- Question: What is a namespace?
- Answer: From the W3C web site
- We envision applications of Extensible Markup
Language (XML) where a single XML document may contain
elements and attributes (here referred to as a "markup
vocabulary") that are defined for and used by multiple
software modules. One motivation for this is modularity; if
such a markup vocabulary exists which is well-understood and
for which there is useful software available, it is better to
re-use this markup rather than re-invent it.
- Such documents, containing multiple markup
vocabularies, pose problems of recognition and
collision. Software modules need to be able to recognize the
tags and attributes which they are designed to process, even
in the face of "collisions" occurring when markup intended for
some other software package uses the same element type or
attribute name.
- These considerations require that document
constructs should have universal names, whose scope extends
beyond their containing document. This specification describes
a mechanism, XML namespaces, which accomplishes this.
- [Definition:] An XML namespace is a collection of
names, identified by a URI reference [RFC2396], which are used
in XML documents as element types and attribute names. XML
namespaces differ from the "namespaces" conventionally used in
computing disciplines in that the XML version has internal
structure and is not, mathematically speaking, a set. These
issues are discussed in "A. The Internal Structure of XML
Namespaces".
What this means, basically, is that the validating rules for some
elements are defined in one place, and some others in another.
- For example, HTML (and xhtml) are defined in one single place [by the
W3C people]. This can be defined with a DTD.
-
- The RDF (Resource Description Framework), on the
other hand, is specifically designed to be a framework for various
parties to share data using a common set of XML elements. In the
Bibliographic world, there is another framework (called the Dublin
Core) which is often used in conjunction with RDF.. This is far
more complex, with multiple markup vocabularies, so
requires namespaces - which requires schema.
As this is such a fundamental part of schema, I will first cover
defining what schema an XML document should use, and the various
options that can be specified.
First-off, here is an XML document that makes no reference to a
schema. It has well-formdness, but not valid.
A basic XML document. File:
basic.xml
<?xml version = "1.0" encoding = "UTF-8"?>
<vehicles>
<nickname>Bog Hopper</nickname>
<nickname>Wee Beastie</nickname>
<nickname>Count Zero</nickname>
</vehicles>
|
To provide validation, we need two things:
- A schema.
- A reference in the document to the schema-definition file.
A simple XML document, with a
schema. File:simple.xml
<?xml version = "1.0" encoding = "UTF-8"?>
<vehicles
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation = "http://lucas.ucs.ed.ac.uk/xml-schema/xmlns/simple.xsd"
>
<nickname>Bog Hopper</nickname>
<nickname>Wee Beastie</nickname>
<nickname>Count Zero</nickname>
</vehicles>
|
The schema. File:
simple.xsd
<?xml version = "1.0" encoding = "UTF-8"?>
<xsd:schema
xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
>
<xsd:element name = "vehicles">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "nickname"
type = "xsd:string"
maxOccurs = "unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
|
There are some important things to explain at this point:
- Namespace declaration in the XML file (an Instance Document):
- The line xmlns:xsi =
"http://www.w3.org/2001/XMLSchema-instance"
indicates that we want to use elements defined in the
http://www.w3.org/2001/XMLSchema-instance definition.
The actual file to load is hard-wired, so the schema is always
picked up
- The line
xsi:noNamespaceSchemaLocation =
"http://lucas.ucs.ed.ac.uk/xml-schema/xmlns/simple.xsd"
indicates that we are using the schema defined at the location
http://lucas.ucs.ed.ac.uk/xml-schema/xmlns/simple.xsd,
but we do not want to assocciate any namespace tag to the
definitions.Without it, the document has no validating
schema.
- Schema file definitions:
- The line
xmlns:xsd =
"http://www.w3.org/2001/XMLSchema" indicates that all
XML-Schema elements are to be prefixed with an xsd: tag, hence
the opening schema element is
<xsd:schema.... Again,
this is a namespace that is hard-wired, and will always be
picked up.
Ah, the meat of the document!
An XML-schema document is, itself, an XML document.. which deals
with the well-form'd-ness of the elment structure.
To review how the schema defines what is valid (and what is not),
lets work backwards from an XML instance document:
a sample instance document: file
landrover.xml
<?xml version = "1.0" encoding = "UTF-8"?>
<vehicles
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation = "http://lucas.ucs.ed.ac.uk/xml-schema/xmlns/landrover.xsd">
<vehicle>
<nickname>Count Zero</nickname>
<model>Series I, 80"</model>
<construction>
<start>
<dom>21</dom>
<month>July</month>
<year>1949</year>
</start>
<end>
<dom>9</dom>
<month>August</month>
<year>1949</year>
</end>
</construction>
<mods>
<mod>Change Engine</mod>
<mod>Change pedals</mod>
<mod>Change gearbox</mod>
<mod>Fit Rollcage</mod>
</mods>
</vehicle>
</vehicles>
|
This is a relatively simple document, and a map of how it goes
together will be something like this:

In this map, a (+) in front of an element indicates that
one-or-more instances of the element may occur. The square
bracketing to the sub-elements indicate that all the ones between
the top and bottom element should also be present.
Having got a plan of what the schema should be, here it is:
The Landrover schema: file
landrover.xsd
<?xml version = "1.0" encoding = "UTF-8"?>
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema">
<xsd:element name = "vehicles">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref = "vehicle" maxOccurs = "unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "vehicle">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "nickname" type = "xsd:string" maxOccurs = "unbounded"/>
<xsd:element name = "model" type = "xsd:string"/>
<xsd:element name = "construction">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref = "start"/>
<xsd:element ref = "end"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "mods">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "mod" type = "xsd:string" maxOccurs = "unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "start">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref = "dom"/>
<xsd:element ref = "month"/>
<xsd:element ref = "year"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "end">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref = "dom"/>
<xsd:element ref = "month"/>
<xsd:element ref = "year"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "dom" type = "xsd:string"/>
<xsd:element name = "month" type = "xsd:string"/>
<xsd:element name = "year" type = "xsd:string"/>
</xsd:schema>
|
So, what are the important points raised in this example?
- Elements must have a name and a type.
- Elements can contain simple, predefined data-types:

- Elements can be defined to occur more than once:

- Elements can reference some other element definition rather than
contain it's own name and type

note The element refered to must be "visible"
to the referring element, ie it cannot be "down"
another branch of the XML tree.
- Elements can have complex types (defined directly
within the element definition)

In addition to having reference elements and locally defined
complexTypes,
a complexType can be defined as an entity in it's
own right. (this become more important later on, when we look at
making a new type based on some other pre-exiting type).
Here is the same schema, but using a global complexType:
An alternative Landrover schema: file
landrover2.xsd
<?xml version = "1.0" encoding = "UTF-8"?>
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema">
<xsd:element name = "vehicles">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref = "vehicle" maxOccurs = "unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "vehicle">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "nickname" type = "xsd:string" maxOccurs = "unbounded"/>
<xsd:element name = "model" type = "xsd:string"/>
<xsd:element name = "construction">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref = "start"/>
<xsd:element ref = "end"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "mods">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "mod" type = "xsd:string" maxOccurs = "unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "start" type = "myBuildDate"/>
<xsd:element name = "end" type = "myBuildDate"/>
<xsd:element name = "dom" type = "xsd:string"/>
<xsd:element name = "month" type = "xsd:string"/>
<xsd:element name = "year" type = "xsd:string"/>
<xsd:complexType name = "myBuildDate">
<xsd:sequence>
<xsd:element ref = "dom"/>
<xsd:element ref = "month"/>
<xsd:element ref = "year"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
|
We've shown that you can define your own types, as shown by
the line <xsd:element
name = "start" type =
"myBuildDate"/>. There are, in fact, two
user-definable types:
ComplexType and
SimpleType.
- SimpleType
- Simple types are elements that contain data.
- They may not contain attributes or sub-elements
- New simple types are defined by deriving them from existing simple
types (built-in's and derived).
- Simpletype definitions are used when a new data type needs to be
defined, where this new type is a modification of
some other existing simpleType-type.
- See the fuller explanation below for
further details.
- ComplexType
- Complex types are elements that allow
sub-elements and/or
attributes.
- Complex types are defined by listing the elements and/or
atributes nested within them.
- See the fuller explanation below for
further details.
simpleType is used to create a new
datatype, one which is based on an existing simple-type. For example, we
could be more definitive in what we mean by
dom (DayOftheMonth):
An integer-only DayOftheMonth
element
<xsd:element name="dom" type="xsd:int" />
|
DayOftheMonth, as an Integer derivitive
<xsd:element name="dom" type="mySimpleDayOfMonth" />
<xsd:simpleType name="mySimpleDayOfMonth" >
<xsd:restriction base="xsd:positiveInteger" >
<!-- positiveInteger defines the minimum to be 1 -->
<xsd:maxInclusive value="31" />
</xsd:restriction >
</xsd:simpleType >
|
complexType is used to define a
complex type. The element requires an attribute called
name, which is uded to refer to
the complexType definition. The element then contains the list
of sub-elements
There are three examples of
complexType definition in the
main example, so I won't repeat them.
- This is, however, the time to mention what the content of
a complexType is:
- There may be an
annotation
- This must be followed by one of the following:
- simpleContent
- complexContent
- In sequence, the following:
- zero or one from the following grouping terms:
- group
- all
- choice
- sequence
- followed by any number of either
- attibute
- attributeGroup
- then zero or one anyAttribute
The simple explanations:
-
simpleContent is analogous to the
simpleType element - it's when you want
to modify some other "simple" data type, restricting
or extending it in some particular way.
-
complexContent is analogous to the
complexType element - it's when
you want to create a complex element.
- The collections:
- group
- a collection of
elements. A group is usually used to declare a common
group of elements that are referenced from more than one
place in the schema. (The
myBuildDate
complexType could have been
done this way).
- sequence
- all the named elements must appear in the sequence listed.
- choice
- one, and onle one, of the elements listed must appear.
- all
- all the named elements must appear, however they may be in any order.
Here is an example of a
complexType, using s
simpleContent We will
modify the model
element to include the attribute
aka:
A (farily simple) complex element
<xsd:element name = "model">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base = "xsd:string">
<xsd:attribute name = "aka" use = "required" type = "xsd:string"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
|
the XML element for the modified model
definition
<model aka="80" >Series I, 80"</model>
|
For the keen, here is a version of the schema of the zblsa service:
A version of the zblsa schema. File
zblsa.xsd (and there is a
matching XML data
file)
<?xml version = "1.0" encoding = "UTF-8"?>
<schema xmlns = "http://www.w3.org/2001/XMLSchema"
targetNamespace = "http://lucas.ucs.ed.ac.uk/test/"
xmlns:zblsa = "http://lucas.ucs.ed.ac.uk/test/"
xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
version = "0.4"
elementFormDefault = "qualified">
<element name = "ZBLSA">
<annotation>
<appinfo>
<xsd:documentation>The root element</xsd:documentation>
</appinfo>
</annotation>
<complexType>
<sequence>
<element ref = "zblsa:source" maxOccurs = "unbounded"/>
</sequence>
</complexType>
</element>
<element name = "search">
<annotation>
<appinfo>
<xsd:documentation>The data about a search on the data
providersdata-set</xsd:documentation>
<xsd:documentation>The following Dublin Core elements are
used: dc:Description; dc:Type; dc:Format;
dc:Rights</xsd:documentation>
<xsd:documentation>The URI attribute contains the URI
request that was used to query the data providers
system</xsd:documentation>
<xsd:documentation>attempted means that the request was
attempted; available means that some form of reply was recieved;
result means that we got some results; and verified means that we are
sure that there is something there (but no mention is made of how
useful :)</xsd:documentation>
</appinfo>
</annotation>
<complexType>
<sequence>
<any namespace = "http://purl.org/dc/elements/1.1/"
processContents = "skip" minOccurs = "0"
maxOccurs = "unbounded"/>
<element name = "genre" type = "string" minOccurs = "0">
<annotation>
<appinfo>
<xsd:documetnation>This is the genre for the
search</xsd:documetnation>
</appinfo>
</annotation>
</element>
<element name = "field" minOccurs = "0"
maxOccurs = "unbounded">
<annotation>
<appinfo>
<xsd:documentation>This is the data that was used to
search the data providers data-set.</xsd:documentation>
<xsd:documentation>This is usually the same as the field
requested.</xsd:documentation>
<xsd:documentation>The attibute "name" states
the name of the field that was searched in the data providers
data-set</xsd:documentation>
</appinfo>
</annotation>
<complexType>
<simpleContent>
<extension base = "string">
<attribute name = "name" use = "optional" type = "string"/>
</extension>
</simpleContent>
</complexType>
</element>
<element name = "datalist" minOccurs = "0">
<annotation>
<appinfo>
<xsd:documentation>If present, this indicates that there
is some physical data.</xsd:documentation>
</appinfo>
</annotation>
<complexType>
<sequence>
<element name = "data" maxOccurs = "unbounded">
<annotation>
<appinfo>
<xsd:documentation>This is a single unit of data,
which may appear in a number of versions</xsd:documentation>
</appinfo>
</annotation>
<complexType>
<sequence>
<element name = "version" maxOccurs = "unbounded">
<annotation>
<appinfo>
<xsd:documentation>This is a version of the
data</xsd:documentation>
<xsd:documentation>The type attribute indicates the
mime-type of the data</xsd:documentation>
</appinfo>
</annotation>
<complexType mixed = "true">
<sequence>
<any namespace = "http://www.w3.org/2001/XMLSchema"
processContents = "skip"
minOccurs = "0" maxOccurs = "unbounded"/>
</sequence>
<attribute name = "type" use = "required" type = "string"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
</sequence>
</complexType>
</element>
</sequence>
<attribute name = "URI" use = "required" type = "anyURI"/>
<attribute name = "attempted" use = "required" type = "string"/>
<attribute name = "available" use = "required" type = "string"/>
<attribute name = "result" use = "required" type = "string"/>
<attribute name = "verified" use = "required" type = "string"/>
</complexType>
</element>
<element name = "source">
<annotation>
<appinfo>
<xsd:documentation>information about the data
provider</xsd:documentation>
<xsd:documentation>The URI refers to a page of information
about the data provider</xsd:documentation>
<xsd:documentation>The following Dublin Core tags are used:
dc:Title; dc:Description; dc:Rights</xsd:documentation>
</appinfo>
</annotation>
<complexType>
<sequence>
<any namespace = "http://purl.org/dc/elements/1.1/"
processContents = "skip" maxOccurs = "unbounded"/>
<element ref = "zblsa:infoURL"/>
<element ref = "zblsa:logoURL"/>
<element ref = "zblsa:search"/>
</sequence>
<attribute name = "URI" use = "optional" type = "anyURI"/>
</complexType>
</element>
<element name = "infoURL" type = "anyURI">
<annotation>
<appinfo>
<xsd:documentation>The URI that refers to information about
the data provider</xsd:documentation>
</appinfo>
</annotation>
</element>
<element name = "logoURL" type = "anyURI">
<annotation>
<appinfo>
<xsd:documentation>A URI that refers to a logo for the data
provider</xsd:documentation>
</appinfo>
</annotation>
</element>
</schema>
|
(keep going....................)
Definitive referenced
The main list of references is at
http://www.w3.org/XML/Schema.
I did my work against the 20010205 version, defined at:
Material written and maintailed by Eric van der Vlist
Validators
I also verified by work with the following validators:
XML "portals"
- Main XML portal places
- www.xml.com
- www.xml.org
- www.xml.net (in the future)
- Portals that have information on XML
- www.w3c.org
- www.oasis-open.org
- xml.apache.org
Various intro's written by me
|