Deprecated - Java-like DOM


	Deprecated - Java-like DOM

Earlier, Xerces-C++ has provided a set of C++ DOM interfaces that is very similar in design and use, to the Java DOM API bindings. It uses smart pointer interface and uses reference counting in memory management.

Currently, such interface has been deprecated and is provided just as a viable alternative for those users who like the idea of Java-like smart pointer design. Please note that such interface may not be kept up to date to the latest W3C DOM specification.

Users are recommended to migrate to the Apache Recommended DOM C++ binding.


	Using this set of deprecated API


	Accessing API from application code


	// C++ #include <xercesc/dom/deprecated/DOM.hpp>


	// Compared to Java import org.w3c.dom.*

The header file <dom/deprecated/DOM.hpp> includes all the individual headers for this set of deprecated DOM API classes.


	Class Names

The C++ class names are prefixed with "DOM_". The intent is to prevent conflicts between DOM class names and other names that may already be in use by an application or other libraries that a DOM based application must link with.

The use of C++ namespaces would also have solved this conflict problem, but for the fact that many compilers do not yet support them.


	DOM_Document myDocument; // C++ DOM_Node aNode; DOM_Text someText;


	Document myDocument; // Compared to Java Node aNode; Text someText;

If you wish to use the Java class names in C++, then you need to typedef them in C++. This is not advisable for the general case - conflicts really do occur - but can be very useful when converting a body of existing Java code to C++.


	typedef DOM_Document Document; typedef DOM_Node Node; Document myDocument; // Now C++ usage is // indistinguishable from Java Node aNode;


	Objects and Memory Management

This deprecated C++ DOM implementation uses automatic memory management, implemented using reference counting. As a result, the C++ code for most DOM operations is very similar to the equivalent Java code, right down to the use of factory methods in the DOM document class for nearly all object creation, and the lack of any explicit object deletion.

Consider the following code snippets


	// This is C++ DOM_Node aNode; aNode = someDocument.createElement("ElementName"); DOM_Node docRootNode = someDoc.getDocumentElement(); docRootNode.AppendChild(aNode);


	// This is Java Node aNode; aNode = someDocument.createElement("ElementName"); Node docRootNode = someDoc.getDocumentElement(); docRootNode.AppendChild(aNode);

The Java and the C++ are identical on the surface, except for the class names, and this similarity remains true for most DOM code.

However, Java and C++ handle objects in somewhat different ways, making it important to understand a little bit of what is going on beneath the surface.

In Java, the variable aNode is an object reference , essentially a pointer. It is initially == null, and references an object only after the assignment statement in the second line of the code.

In C++ the variable aNode is, from the C++ language's perspective, an actual live object. It is constructed when the first line of the code executes, and DOM_Node::operator = () executes at the second line. The C++ class DOM_Node essentially a form of a smart-pointer; it implements much of the behavior of a Java Object Reference variable, and delegates the DOM behaviors to an implementation class that lives behind the scenes.

Key points to remember when using the C++ DOM classes:

Create them as local variables, or as member variables of some other class. Never "new" a DOM object into the heap or make an ordinary C pointer variable to one, as this will greatly confuse the automatic memory management.
The "real" DOM objects - nodes, attributes, CData sections, whatever, do live on the heap, are created with the create... methods on class DOM_Document. DOM_Node and the other DOM classes serve as reference variables to the underlying heap objects.
The visible DOM classes may be freely copied (assigned), passed as parameters to functions, or returned by value from functions.
Memory management of the underlying DOM heap objects is automatic, implemented by means of reference counting. So long as some part of a document can be reached, directly or indirectly, via reference variables that are still alive in the application program, the corresponding document data will stay alive in the heap. When all possible paths of access have been closed off (all of the application's DOM objects have gone out of scope) the heap data itself will be automatically deleted.
There are restrictions on the ability to subclass the DOM classes.


	String Type - DOMString

Class DOMString provides the mechanism for passing string data to and from the DOM API. DOMString is not intended to be a completely general string class, but rather to meet the specific needs of the DOM API.

The design derives from two primary sources: from the DOM's CharacterData interface and from class java.lang.string.

Main features are:

It stores Unicode text.
Automatic memory management, using reference counting.
DOMStrings are mutable - characters can be inserted, deleted or appended.

When a string is passed into a method of the DOM, when setting the value of a Node, for example, the string is cloned so that any subsequent alteration or reuse of the string by the application will not alter the document contents. Similarly, when strings from the document are returned to an application via the DOM API, the string is cloned so that the document can not be inadvertently altered by subsequent edits to the string.

The ICU classes are a more general solution to UNICODE character handling for C++ applications. ICU is an Open Source Unicode library, available at the IBM DeveloperWorks website.


	Equality Testing

The DOMString equality operators (and all of the rest of the DOM class conventions) are modeled after the Java equivalents. The equals() method compares the content of the string, while the == operator checks whether the string reference variables (the application program variables) refer to the same underlying string in memory. This is also true of DOM_Node, DOM_Element, etc., in that operator == tells whether the variables in the application are referring to the same actual node or not. It's all very Java-like

bool operator == () is true if the DOMString variables refer to the same underlying storage.
bool equals() is true if the strings contain the same characters.

Here is an example of how the equality operators work:

DOMString a = "Hello";
    DOMString b = a;
    DOMString c = a.clone();
    if (b == a)           //  This is true
    if (a == c)           //  This is false
    if (a.equals(c))       //  This is true
    b = b + " World";
    if (b == a)           // Still true, and the string's
                          //    value is "Hello World"
    if (a.equals(c))      // false.  a is "Hello World";
                          //    c is still "Hello".


	Downcasting

Application code sometimes must cast an object reference from DOM_Node to one of the classes deriving from DOM_Node, DOM_Element, for example. The syntax for doing this in C++ is different from that in Java.


	// This is C++ DOM_Node aNode = someFunctionReturningNode(); DOM_Element el = (DOM_Element &) aNode;


	// This is Java Node aNode = someFunctionReturningNode(); Element el = (Element) aNode;

The C++ cast is not type-safe; the Java cast is checked for compatible types at runtime. If necessary, a type-check can be made in C++ using the node type information:

// This is C++

    DOM_Node       aNode = someFunctionReturningNode();
    DOM_Element    el;    // by default, el will == null.

    if (anode.getNodeType() == DOM_Node::ELEMENT_NODE)
       el = (DOM_Element &) aNode;
    else
       // aNode does not refer to an element.
       // Do something to recover here.


	Subclassing

The C++ DOM classes, DOM_Node, DOM_Attr, DOM_Document, etc., are not designed to be subclassed by an application program.

As an alternative, the DOM_Node class provides a User Data field for use by applications as a hook for extending nodes by referencing additional data or objects. See the API description for DOM_Node for details.


	DOMParser


	Constructing a DOMParser

In order to use Xerces-C++ to parse XML files using the deprecated DOM, you will need to create an instance of the DOMParser class. The example below shows the code you need in order to create an instance of the DOMParser.

    #include <xercesc/dom/deprecated/DOMParser.hpp>
    #include <xercesc/dom/deprecated/DOM.hpp>
    #include <xercesc/sax/HandlerBase.hpp>
    #include <xercesc/util/XMLString.hpp>

    #if defined(XERCES_NEW_IOSTREAMS)
    #include <iostream>
    #else
    #include <iostream.h>
    #endif

    XERCES_CPP_NAMESPACE_USE

    int main (int argc, char* args[]) {

        try {
            XMLPlatformUtils::Initialize();
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Error during initialization! :\n"
                 << message << "\n";
            XMLString::release(&message);
            return 1;
        }

        char* xmlFile = "x1.xml";
        DOMParser* parser = new DOMParser();
        parser->setValidationScheme(DOMParser::Val_Always);    // optional.
        parser->setDoNamespaces(true);    // optional

        ErrorHandler* errHandler = (ErrorHandler*) new HandlerBase();
        parser->setErrorHandler(errHandler);

        try {
            parser->parse(xmlFile);
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (const DOM_DOMException& toCatch) {
            cout << "Exception message is: \n"
                 << toCatch.code << "\n";
            return -1;
        }
        catch (...) {
            cout << "Unexpected Exception \n" ;
            return -1;
        }

        delete parser;
        delete errHandler;
        return 0;
    }


	DOMParser Supported Features

The behavior of the DOMParser is dependant on the values of the following features. All of the features below are set using the "setter" methods (e.g. setDoNamespaces), and are queried using the corresponding "getter" methods (e.g. getDoNamespaces). The following only gives you a quick summary of supported features. Please refer to API Documentation for complete detail.

None of these features can be modified in the middle of a parse, or an exception will be thrown.

void setCreateEntityReferenceNodes(const bool)
true:	Create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only.
false:	Do not create EntityReference nodes in the DOM tree. No EntityReference nodes will be created, only the nodes corresponding to their fully expanded sustitution text will be created.
default:	true
note:	This feature only affects the appearance of EntityReference nodes in the DOM tree. The document will always contain the entity reference child nodes.

void setExpandEntityReferences(const bool) (deprecated) please use setCreateEntityReferenceNodes
true:	Do not create EntityReference nodes in the DOM tree. No EntityReference nodes will be created, only the nodes corresponding to their fully expanded sustitution text will be created.
false:	Create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only.
default:	false
see:	setCreateEntityReferenceNodes

void setIncludeIgnorableWhitespace(const bool)
true:	Include text nodes that can be considered "ignorable whitespace" in the DOM tree.
false:	Do not include ignorable whitespace in the DOM tree.
default:	true
note:	The only way that the parser can determine if text is ignorable is by reading the associated grammar and having a content model for the document. When ignorable whitespace text nodes are included in the DOM tree, they will be flagged as ignorable; and the method DOMText::isIgnorableWhitespace() will return true for those text nodes.

void setDoNamespaces(const bool)
true:	Perform Namespace processing
false:	Do not perform Namespace processing
default:	false
note:	If the validation scheme is set to Val_Always or Val_Auto, then the document must contain a grammar that supports the use of namespaces
see:	setValidationScheme

void setDoValidation(const bool) (deprecated) please use setValidationScheme
true:	Report all validation errors.
false:	Do not report validation errors.
default:	see the default of setValidationScheme
see:	setValidationScheme

void setValidationScheme(const ValSchemes)
Val_Auto:	The parser will report validation errors only if a grammar is specified.
Val_Always:	The parser will always report validation errors.
Val_Never:	Do not report validation errors.
default:	Val_Auto
note:	If set to Val_Always, the document must specify a grammar. If this feature is set to Val_Never and document specifies a grammar, that grammar might be parsed but no validation of the document contents will be performed.
see:	setLoadExternalDTD

void setDoSchema(const bool)
true:	Enable the parser's schema support.
false:	Disable the parser's schema support.
default:	false
note	If set to true, namespace processing must also be turned on.
see:	setDoNamespaces

void setValidationSchemaFullChecking(const bool)
true:	Enable full schema constraint checking, including checking which may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation restriction checking are controlled by this option.
false:	Disable full schema constraint checking .
default:	false
note:	This feature checks the Schema grammar itself for additional errors that are time-consuming or memory intensive. It does not affect the level of checking performed on document instances that use Schema grammars.
see:	setDoSchema

void setLoadExternalDTD(const bool)
true:	Load the External DTD .
false:	Ignore the external DTD completely.
default:	true
note	This feature is ignored and DTD is always loaded if the validation scheme is set to Val_Always or Val_Auto.
see:	setValidationScheme

void setExitOnFirstFatalError(const bool)
true:	Stops parse on first fatal error.
false:	Attempt to continue parsing after a fatal error.
default:	true
note:	The behavior of the parser when this feature is set to false is undetermined! Therefore use this feature with extreme caution because the parser may get stuck in an infinite loop or worse.

void setValidationConstraintFatal(const bool)
true:	The parser will treat validation error as fatal and will exit depends on the state of setExitOnFirstFatalError
false:	The parser will report the error and continue processing.
default:	false
note:	Setting this true does not mean the validation error will be printed with the word "Fatal Error". It is still printed as "Error", but the parser will exit if setExitOnFirstFatalError is set to true.
see:	setExitOnFirstFatalError

void useCachedGrammarInParse(const bool)
true:	Use cached grammar if it exists in the pool.
false:	Parse the schema grammar.
default:	false
note:	The getter function for this method is called isUsingCachedGrammarInParse
note:	If the grammar caching option is enabled, this option is set to true automatically. Any setting to this option by the users is a no-op.
see:	cacheGrammarFromParse

void cacheGrammarFromParse(const bool)
true:	cache the grammar in the pool for re-use in subsequent parses.
false:	Do not cache the grammar in the pool
default:	false
note:	The getter function for this method is called isCachingGrammarFromParse
note:	If set to true, the useCachedGrammarInParse is also set to true automatically.
see:	useCachedGrammarInParse

void setStandardUriConformant(const bool)
true:	Force standard uri conformance.
false:	Do not force standard uri conformance.
default:	false
note:	If set to true, malformed uri will be rejected and fatal error will be issued.

void setCalculateSrcOfs(const bool)
true:	Enable src offset calculation.
false:	Disable src offset calculation.
default:	false
note:	If set to true, the user can inquire about the current src offset within the input source. Setting it to false (default) improves the performance.

*void setExternalSchemaLocation(const XMLCh const)**
Description	The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. Similar situation happens to <import> element in schema documents. This property allows the user to specify a list of schemas to use. If the targetNamespace of a schema specified using this method matches the targetNamespace of a schema occurring in the instance document in schemaLocation attribute, or if the targetNamespace matches the namespace attribute of <import> element, the schema specified by the user using this property will be used (i.e., the schemaLocation attribute in the instance document or on the <import> element will be effectively ignored).
Value	The syntax is the same as for schemaLocation attributes in instance documents: e.g, "http://www.example.com file_name.xsd". The user can specify more than one XML Schema in the list.
Value Type	XMLCh*

*void setExternalNoNamespaceSchemaLocation(const XMLCh const)**
Description	The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. This property allows the user to specify the no target namespace XML Schema Location externally. If specified, the instance document's noNamespaceSchemaLocation attribute will be effectively ignored.
Value	The syntax is the same as for the noNamespaceSchemaLocation attribute that may occur in an instance document: e.g."file_name.xsd".
Value Type	XMLCh*

*void useScanner(const XMLCh const)**
Description	This property allows the user to specify the name of the XMLScanner to use for scanning XML documents. If not specified, the default scanner "IGXMLScanner" is used.
Value	The recognized scanner names are: 1."WFXMLScanner" - scanner that performs well-formedness checking only. 2. "DGXMLScanner" - scanner that handles XML documents with DTD grammar information. 3. "SGXMLScanner" - scanner that handles XML documents with XML schema grammar information. 4. "IGXMLScanner" - scanner that handles XML documents with DTD or/and XML schema grammar information. Users can use the predefined constants defined in XMLUni directly (fgWFXMLScanner, fgDGXMLScanner, fgSGXMLScanner, or fgIGXMLScanner) or a string that matches the value of one of those constants.
Value Type	XMLCh*
note:	See Use Specific Scanner for more programming details.