XML Schema Toolkit: An Explanation

Author:Alex Selkirk
Date:9th October 2019

Toolkit contents

Schema Toolkit consists of a main application, SchemaCoder.exe, and a number of components: XMLParser.dll (which parses schema-valid XML), XMLSchema.dll (containing schema structure code) and XMLDataTypes.dll (schema types). Functionality common to all XML components is contained in a static library, XMLLibrary.lib. There is also a sample application, TestDisplay.exe.

A Short Description

The schema toolkit has two parts: the first is an application to convert XML schemas into C++ code. This involves either creating classes for the elements (and attributes) defined in the schema, or converting them to basic variable types, such as string or integer. These classes will compile without any further changes (using Microsoft's Visual C++ application) to form a dynamic link library (DLL). However to provide any useful functionality, the classes will have to be extended by a programmer. How this is done is demonstrated by a fully-functional example application that displays text and diagrams from a number of simple XML display vocabularies.

The second part of the schema toolkit is to create the functionality to run the created code once it has been compiled. The following diagram shows how the constituent parts fit together for a generic XML application. The application calls functionality within XMLParser.dll. This DLL uses the Microsoft XML Parser (MSXML3.dll) to parse the XML document, but it builds an object heirarchy [from the SAX events], calling the created schema dynamic link libraries when needed by the document. At the end of the process, the application is given a pointer to the object representing the base element.

Toolkit Architecture

An example of the framework in action is when creating the C++ object code itself. The application is called SchemaCoder.exe, and the schema DLLs used are XMLSchema.dll and XMLDataTypes.dll. XMLDataTypes.dll contains the basic data types defined by XML Schema, such as string, date, long, and boolean. XMLSchema.dll implements the structural information of the schema, such as the choice, sequence, element, complexType and group elements. Together SchemaCoder, XMLSchema.dll and XMLDataTypes.dll read the schema (which is an XML document), and present the user with a configurable set of classes to validate and represent an XML instance of the schema.

SchemaCoder Architecture

A hypothetical future web browser that implements XML display vocabularies such as XHTML, SVG and SMIL could be built using the framework and it would have the following architecture:-

Hypothetical Browser Architecture

The skeleton code for the vocabularies XHTML.dll, SVG.dll and SMIL.dll would be generated by the schema toolkit from each of their schemas. There would of course be a lot of work still needed to generate the appropriate functionality for displaying text and images (XHTML), vector graphics (SVG) or creating animation (SMIL). But the information contained by their schemas will have already been incorporated and an appropriate structure created.

Motivation

Microsoft's web browser API shows that complete implementations of markup languages create tree-like structures of specific objects representing each element. [This also shows that a number of additional classes and functionality must be created for a working implementation.] In fact an inspection of any API that implements an XML vocabulary suggests that there is an object structure that resembles its tree of elements. This suggests that all schema-based XML APIs could be created from one framework. This would be far more powerful because each API could be written to recognize and work with other APIs.

Alternative Approaches

Conventional XML Parsing and Validation

The conventional process for handling schema-valid XML is to obtain the schema document, pass it through a validator, which will work out the allowed structure for XML using this schema. Then the XML document will be parsed with reference to the schema. The result is typically a DOM-like tree of element and attribute objects which contain additional information about the type of each element or attribute. Below is the comparison of a conventional DOM structure with the type of structure that the Schema Toolkit produces (in this case applied to an XHTML document).

Conventional DOM Structure New Structure, applied to XHTML

Web Services

Web services consists of a number of initiatives, but at its heart is the concept of carrying out remote procedure calls over XML. The idea is that a current API, such as a COM type library can be used to automatically create an XML schema. Then the COM objects in the type library can communicate remotely over XML. The advantage with this approach is that existing APIs (using DCOM, CORBA, Java) can be leveraged and can now to some extent interoperate. The drawback is that it is difficult to integrate pure XML specifications and standards. In other words, it leaves integration to the big companies. The schema toolkit allows anyone to implement a complex XML specification or standard.

Links to Further Information