DOM Programming Guide


	Design Objectives

The C++ DOM implementation is based on the Apache Recommended DOM C++ binding.

The design objective aims at meeting the following requirements:

Reduced memory footprint.
Fast - especially for use in server style and multi-threaded applications.
Good scalability on multiprocessor systems.
More C++ like and less Java like.


	DOM Level 3 Support in Xerces-C++

The Xerces-C++ 2.6.0 contains a partial implementation of the W3C Document Object Model Level 3. This implementation is experimental. See the document DOM Level 3 Support for details.


	Using DOM API


	Accessing API from application code


	#include <xercesc/dom/DOM.hpp>

The header file <dom/DOM.hpp> includes all the individual headers for the DOM API classes.


	Class Names

The DOM class names are prefixed with "DOM" (if not already), e.g. "DOMNode". The intent is to prevent conflicts between DOM class names and other names that may already be in use by an application or other libraries that a DOM based application must link with.


	DOMDocument* myDocument; DOMNode* aNode; DOMText* someText;


	Objects Management

Applications would use normal C++ pointers to directly access the implementation objects for Nodes in C++ DOM.

Consider the following code snippets


	DOMNode* aNode; DOMNode* docRootNode; aNode = someDocument->createElement(anElementName); docRootNode = someDocument->getDocumentElement(); docRootNode->appendChild(aNode);


	Memory Management

The C++ DOM implementation provides a release() method for releasing any "orphaned" resources that were created through createXXXX factory method. Memory for any returned object are owned by implementation. Please see Apache Recommended DOM C++ binding for details.


	Objects created by DOMImplementation::createXXXX

Users must call the release() function when finished using any objects that were created by the DOMImplementation::createXXXX (e.g. DOMBuilder, DOMWriter, DOMDocument, DOMDocumentType).

Acesss to a released object will lead to unexpected behaviour.

When a DOMDocument is released, all its associated children AND any objects it owned (e.g. DOMRange, DOMTreeWalker, DOMNodeIterator or any orphaned nodes) will also be released.

When a DOMDocument is cloned, the cloned document has nothing related to the original master document and need to be released explicitly.

When a DOMDocumentType has been inserted into a DOMDocument and thus has a owner, it will then be released automatically when its owner document is released. DOMException::INVALID_ACCESS_ERR will be raised if releasing such owned node.


	Objects created by DOMDocument::createXXXX

Users can call the release() function to indicate the release of any orphaned nodes. When an orphaned Node is released, its associated children will also be released. Acesss to a released Node will lead to unexpected behaviour. These orphaned Nodes will eventually be released, if not already done so, when its owner document is released

DOMException::INVALID_ACCESS_ERR will be raised if releasing a Node that has a parent (has a owner).


	Objects created by DOMDocumentRange::createRange or DOMDocumentTraversal::createXXXX

Users can call release() function when finished using the DOMRange, DOMNodeIterator, DOMTreeWalker. Acesss to a released object will lead to unexpected behaviour. These objects will eventually be released, if not already done so, when its owner document is released

Here is an example

    //
    //  Create a small document tree
    //

    {
        XMLCh* tempStr[100];

        XMLString::transcode("Range", tempStr, 99);
        DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(tempStr, 0);

        XMLString::transcode("root", tempStr, 99);
        DOMDocument*   doc = impl->createDocument(0, tempStr, 0);
        DOMElement*   root = doc->getDocumentElement();

        XMLString::transcode("FirstElement", tempStr, 99);
        DOMElement*   e1 = doc->createElement(tempStr);
        root->appendChild(e1);

        XMLString::transcode("SecondElement", tempStr, 99);
        DOMElement*   e2 = doc->createElement(tempStr);
        root->appendChild(e2);

        XMLString::transcode("aTextNode", tempStr, 99);
        DOMText*       textNode = doc->createTextNode(tempStr);
        e1->appendChild(textNode);

        // optionally, call release() to release the resource associated with the range after done
        DOMRange* range = doc->createRange();
        range->release();

        // removedElement is an orphaned node, optionally call release() to release associated resource
        DOMElement* removedElement = root->removeChild(e2);
        removedElement->release();

        // no need to release this returned object which is owned by implementation
        XMLString::transcode("*", tempStr, 99);
        DOMNodeList*    nodeList = doc->getElementsByTagName(tempStr);

        // done with the document, must call release() to release the entire document resources
        doc->release();
    };


	String Type

The C++ DOM uses the plain, null-terminated (XMLCh *) utf-16 strings as the String type. The (XMLCh*) utf-16 type string has low overhead.


	//C++ DOM const XMLCh* nodeValue = aNode->getNodeValue();

All the string data would remain in memory until the document object is released. But such string data may be RECYCLED by the implementation if necessary. Users should make appropriate copy of any returned string for safe reference.

For example after a DOMNode has been released, the memory allocated for its node value will be recycled by the implementation.

   XMLCh xfoo[] = {chLatin_f, chLatin_o, chLatin_o, chNull};

   // pAttr has node value = "foo"
   // fNodeValue has "foo"
   pAttr->setNodeValue(xfoo);
   const XMLCh* fNodeValue = pAttr->getNodeValue();

   // fNodeValue has "foo"
   // make a copy of the string for future reference
   XMLCh* oldNodeValue = XMLString::replicate(fNodeValue);

   // release the node pAttr
   pAttr->release()

   // other operations
   :
   :

   // implementation may have recycled the memory of the pAttr already
   // so it's not safe to expect fNodeValue still have "foo"
   if (XMLString::compareString(xfoo, fNodeValue))
       printf("fNodeValue has some other content\n");

   // should use your own safe copy
   if (!XMLString::compareString(xfoo, oldNodeValue))
       printf("Use your own copy of the oldNodeValue if want to reference the string later\n");

   // delete your own replicated string when done
   XMLString::release(&oldNodeValue);

Or if DOMNode::setNodeValue() is called to set a new node value, the implementation will simply overwrite the node value memory area. So any previous pointers will now have the new value automatically. Users should make appropriate copy of any previous returned string for safe reference. For example

   XMLCh xfoo[] = {chLatin_f, chLatin_o, chLatin_o, chNull};
   XMLCh xfee[] = {chLatin_f, chLatin_e, chLatin_e, chNull};

   // pAttr has node value = "foo"
   pAttr->setNodeValue(xfoo);
   const XMLCh* fNodeValue = pAttr->getNodeValue();

   // fNodeValue has "foo"
   // make a copy of the string for future reference
   XMLCh* oldNodeValue = XMLString::replicate(fNodeValue);

   // now set pAttr with a new node value "fee"
   pAttr->setNodeValue(xfee);

   // should not rely on fNodeValue for the old node value, it may not compare
   if (XMLString::compareString(xfoo, fNodeValue))
       printf("Should not rely on fNodeValue for the old node value\n");

   // should use your own safe copy
   if (!XMLString::compareString(xfoo, oldNodeValue))
       printf("Use your own copy of the oldNodeValue if want to reference the string later\n");

   // delete your own replicated string when done
   XMLString::release(&oldNodeValue);

This is to prevent memory growth when DOMNode::setNodeValue() is being called hundreds of times. This design allows users to actively select which returned string should stay in memory by manually copying the string to application's own heap.


	XercesDOMParser


	Constructing a XercesDOMParser

In order to use Xerces-C++ to parse XML files using DOM, you can create an instance of the XercesDOMParser class. The example below shows the code you need in order to create an instance of the XercesDOMParser.

    #include <xercesc/parsers/XercesDOMParser.hpp>
    #include <xercesc/dom/DOM.hpp>
    #include <xercesc/sax/HandlerBase.hpp>
    #include <xercesc/util/XMLString.hpp>
    #include <xercesc/util/PlatformUtils.hpp>

    #if defined(XERCES_NEW_IOSTREAMS)
    #include <iostream>
    #else
    #include <iostream.h>
    #endif

    XERCES_CPP_NAMESPACE_USE

    int main (int argc, char* args[]) {

        try {
            XMLPlatformUtils::Initialize();
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Error during initialization! :\n"
                 << message << "\n";
            XMLString::release(&message);
            return 1;
        }

        XercesDOMParser* parser = new XercesDOMParser();
        parser->setValidationScheme(XercesDOMParser::Val_Always);    // optional.
        parser->setDoNamespaces(true);    // optional

        ErrorHandler* errHandler = (ErrorHandler*) new HandlerBase();
        parser->setErrorHandler(errHandler);

        char* xmlFile = "x1.xml";

        try {
            parser->parse(xmlFile);
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (const DOMException& toCatch) {
            char* message = XMLString::transcode(toCatch.msg);
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (...) {
            cout << "Unexpected Exception \n" ;
            return -1;
        }

        delete parser;
        delete errHandler;
        return 0;
    }


	XercesDOMParser Supported Features

The behavior of the XercesDOMParser is dependant on the values of the following features. All of the features below are set using the "setter" methods (e.g. setDoNamespaces), and are queried using the corresponding "getter" methods (e.g. getDoNamespaces). The following only gives you a quick summary of supported features. Please refer to API Documentation for complete detail.

void setCreateEntityReferenceNodes(const bool)
true:	Create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only.
false:	Do not create EntityReference nodes in the DOM tree. No EntityReference nodes will be created, only the nodes corresponding to their fully expanded substitution text will be created.
default:	true
note:	This feature only affects the appearance of EntityReference nodes in the DOM tree. The document will always contain the entity reference child nodes.

void setExpandEntityReferences(const bool) (deprecated) please use setCreateEntityReferenceNodes
true:	Do not create EntityReference nodes in the DOM tree. No EntityReference nodes will be created, only the nodes corresponding to their fully expanded sustitution text will be created.
false:	Create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only.
default:	false
see:	setCreateEntityReferenceNodes

void setIncludeIgnorableWhitespace(const bool)
true:	Include text nodes that can be considered "ignorable whitespace" in the DOM tree.
false:	Do not include ignorable whitespace in the DOM tree.
default:	true
note:	The only way that the parser can determine if text is ignorable is by reading the associated grammar and having a content model for the document. When ignorable whitespace text nodes are included in the DOM tree, they will be flagged as ignorable; and the method DOMText::isIgnorableWhitespace() will return true for those text nodes.

void setDoNamespaces(const bool)
true:	Perform Namespace processing.
false:	Do not perform Namespace processing.
default:	false
note:	If the validation scheme is set to Val_Always or Val_Auto, then the document must contain a grammar that supports the use of namespaces.
see:	setValidationScheme

void setDoValidation(const bool) (deprecated) please use setValidationScheme
true:	Report all validation errors.
false:	Do not report validation errors.
default:	see the default of setValidationScheme
see:	setValidationScheme

void setValidationScheme(const ValSchemes)
Val_Auto:	The parser will report validation errors only if a grammar is specified.
Val_Always:	The parser will always report validation errors.
Val_Never:	Do not report validation errors.
default:	Val_Auto
note:	If set to Val_Always, the document must specify a grammar. If this feature is set to Val_Never and document specifies a grammar, that grammar might be parsed but no validation of the document contents will be performed.
see:	setLoadExternalDTD

void setDoSchema(const bool)
true:	Enable the parser's schema support.
false:	Disable the parser's schema support.
default:	false
note	If set to true, namespace processing must also be turned on.
see:	setDoNamespaces

void setValidationSchemaFullChecking(const bool)
true:	Enable full schema constraint checking, including checking which may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation restriction checking are controlled by this option.
false:	Disable full schema constraint checking.
default:	false
note:	This feature checks the Schema grammar itself for additional errors that are time-consuming or memory intensive. It does not affect the level of checking performed on document instances that use Schema grammars.
see:	setDoSchema

void setLoadExternalDTD(const bool)
true:	Load the External DTD .
false:	Ignore the external DTD completely.
default:	true
note	This feature is ignored and DTD is always loaded if the validation scheme is set to Val_Always or Val_Auto.
see:	setValidationScheme

void setExitOnFirstFatalError(const bool)
true:	Stops parse on first fatal error.
false:	Attempt to continue parsing after a fatal error.
default:	true
note:	The behavior of the parser when this feature is set to false is undetermined! Therefore use this feature with extreme caution because the parser may get stuck in an infinite loop or worse.

void setValidationConstraintFatal(const bool)
true:	The parser will treat validation error as fatal and will exit depends on the state of setExitOnFirstFatalError
false:	The parser will report the error and continue processing.
default:	false
note:	Setting this true does not mean the validation error will be printed with the word "Fatal Error". It is still printed as "Error", but the parser will exit if setExitOnFirstFatalError is set to true.
see:	setExitOnFirstFatalError

void useCachedGrammarInParse(const bool)
true:	Use cached grammar if it exists in the pool.
false:	Parse the schema grammar.
default:	false
note:	The getter function for this method is called isUsingCachedGrammarInParse.
note:	If the grammar caching option is enabled, this option is set to true automatically. Any setting to this option by the users is a no-op.
see:	cacheGrammarFromParse

void cacheGrammarFromParse(const bool)
true:	Cache the grammar in the pool for re-use in subsequent parses.
false:	Do not cache the grammar in the pool
default:	false
note:	The getter function for this method is called isCachingGrammarFromParse
note:	If set to true, the useCachedGrammarInParse is also set to true automatically.
see:	useCachedGrammarInParse

void setStandardUriConformant(const bool)
true:	Force standard uri conformance.
false:	Do not force standard uri conformance.
default:	false
note:	If set to true, malformed uri will be rejected and fatal error will be issued.

void setCalculateSrcOfs(const bool)
true:	Enable src offset calculation.
false:	Disable src offset calculation.
default:	false
note:	If set to true, the user can inquire about the current src offset within the input source. Setting it to false (default) improves the performance.

void setIdentityConstraintChecking(const bool);
true:	Enable identity constraint checking.
false:	Disable identity constraint checking.
default:	true

void setGenerateSyntheticAnnotations(const bool);
true:	Enable generation of synthetic annotations. A synthetic annotation will be generated when a schema component has non-schema attributes but no child annotation.
false:	Disable generation of synthetic annotations.
default:	false

setValidateAnnotation
true:	Enable validation of annotations.
false:	Disable validation of annotations.
default:	false

setCreateSchemaInfo
true:	Enable storing of PSVI information in element and attribute nodes.
false:	Disable storing of PSVI information in element and attribute nodes.
default:	false

setCreateCommentNodes
true:	Enable the parser to create comment nodes in the DOM tree being produced.
false:	Disable comment nodes being produced.
default:	true


	XercesDOMParser Supported Properties

The behavior of the XercesDOMParser is dependant on the values of the following properties. All of the properties below are set using the "setter" methods (e.g. setExternalSchemaLocation), and are queried using the corresponding "getter" methods (e.g. getExternalSchemaLocation). The following only gives you a quick summary of supported features. Please refer to API Documentation for complete details.

*void setExternalSchemaLocation(const XMLCh)**
Description	The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. Similar situation happens to <import> element in schema documents. This property allows the user to specify a list of schemas to use. If the targetNamespace of a schema specified using this method matches the targetNamespace of a schema occurring in the instance document in schemaLocation attribute, or if the targetNamespace matches the namespace attribute of <import> element, the schema specified by the user using this property will be used (i.e., the schemaLocation attribute in the instance document or on the <import> element will be effectively ignored).
Value	The syntax is the same as for schemaLocation attributes in instance documents: e.g, "http://www.example.com file_name.xsd". The user can specify more than one XML Schema in the list.
Value Type	XMLCh*

*void setExternalNoNamespaceSchemaLocation(const XMLCh const)**
Description	The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. This property allows the user to specify the no target namespace XML Schema Location externally. If specified, the instance document's noNamespaceSchemaLocation attribute will be effectively ignored.
Value	The syntax is the same as for the noNamespaceSchemaLocation attribute that may occur in an instance document: e.g."file_name.xsd".
Value Type	XMLCh*

*void useScanner(const XMLCh const)**
Description	This property allows the user to specify the name of the XMLScanner to use for scanning XML documents. If not specified, the default scanner "IGXMLScanner" is used.
Value	The recognized scanner names are: 1."WFXMLScanner" - scanner that performs well-formedness checking only. 2. "DGXMLScanner" - scanner that handles XML documents with DTD grammar information. 3. "SGXMLScanner" - scanner that handles XML documents with XML schema grammar information. 4. "IGXMLScanner" - scanner that handles XML documents with DTD or/and XML schema grammar information. Users can use the predefined constants defined in XMLUni directly (fgWFXMLScanner, fgDGXMLScanner, fgSGXMLScanner, or fgIGXMLScanner) or a string that matches the value of one of those constants.
Value Type	XMLCh*
note:	See Use Specific Scanner for more programming details.

*void useImplementation(const XMLCh const)**
Description	This property allows the user to specify a set of features which the parser will then use to acquire an implementation from which it will create the DOMDocument to use when reading in an XML file.
Value Type	XMLCh*

*setSecurityManager(Security Manager const)**
Description	Certain valid XML and XML Schema constructs can force a processor to consume more system resources than an application may wish. In fact, certain features could be exploited by malicious document writers to produce a denial-of-service attack. This property allows applications to impose limits on the amount of resources the processor will consume while processing these constructs.
Value	An instance of the SecurityManager class (see `xercesc/util/SecurityManager`). This class's documentation describes the particular limits that may be set. Note that, when instantiated, default values for limits that should be appropriate in most settings are provided. The default implementation is not thread-safe; if thread-safety is required, the application should extend this class, overriding methods appropriately. The parser will not adopt the SecurityManager instance; the application is responsible for deleting it when it is finished with it. If no SecurityManager instance has been provided to the parser (the default) then processing strictly conforming to the relevant specifications will be performed.
Value Type	SecurityManager*


	DOMBuilder


	Constructing a DOMBuilder

DOMBuilder is a new interface introduced by the W3C DOM Level 3.0 Abstract Schemas and Load and Save Specification. DOMBuilder provides the "Load" interface for parsing XML documents and building the corresponding DOM document tree from various input sources.

A DOMBuilder instance is obtained from the DOMImplementationLS interface by invoking its createDOMBuilder method. For example:

    #include <xercesc/dom/DOM.hpp>
    #include <xercesc/util/XMLString.hpp>
    #include <xercesc/util/PlatformUtils.hpp>

    #if defined(XERCES_NEW_IOSTREAMS)
    #include <iostream>
    #else
    #include <iostream.h>
    #endif

    XERCES_CPP_NAMESPACE_USE
    
    int main (int argc, char* args[]) {

        try {
            XMLPlatformUtils::Initialize();
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Error during initialization! :\n"
                 << message << "\n";
            XMLString::release(&message);
            return 1;
        }


        XMLCh tempStr[100];
        XMLString::transcode("LS", tempStr, 99);
        DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
        DOMBuilder* parser = ((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS, 0);

        // optionally you can set some features on this builder
        if (parser->canSetFeature(XMLUni::fgDOMValidation, true))
            parser->setFeature(XMLUni::fgDOMValidation, true);
        if (parser->canSetFeature(XMLUni::fgDOMNamespaces, true))
            parser->setFeature(XMLUni::fgDOMNamespaces, true);
        if (parser->canSetFeature((XMLUni::fgDOMDatatypeNormalization, true))
            parser->setFeature(XMLUni::fgDOMDatatypeNormalization, true);


        // optionally you can implement your DOMErrorHandler (e.g. MyDOMErrorHandler)
        // and set it to the builder
        MyDOMErrorHandler* errHandler = new myDOMErrorHandler();
        parser->setErrorHandler(errHandler);

        char* xmlFile = "x1.xml";
        DOMDocument *doc = 0;

        try {
            doc = parser->parseURI(xmlFile);
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (const DOMException& toCatch) {
            char* message = XMLString::transcode(toCatch.msg);
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (...) {
            cout << "Unexpected Exception \n" ;
            return -1;
        }

        parser->release();
        delete errHandler;
        return 0;
    }

Please refer to the API Documentation and the sample DOMCount for more detail.


	How to interchange DOMInputSource and SAX InputSource?

DOM L3 has introduced a DOMInputSource which is similar to the SAX InputSource. The Xerces-C++ internals (XMLScanner, Reader, etc.) use the SAX InputSource to process the xml data. In order to support DOM L3, we need to provide a mechanism to allow the Xerces-C++ internals to talk to a DOMInputSource object. Similarly, Xerces-C++ provides some framework classes for specialized types of input source (i.e. LocalFileInputSource, etc.) that are derived from the SAX InputSource. In DOM L3, to allow users implementing their own DOMEntityResolver(s), which return a DOMInputSource, to utilize these framework classes, we need to provide a mechanism to map a SAX InputSource to a DOMInputSource. We are introducing to wrapper classes to interchange DOMInputSource and SAXInputSource.


	Wrapper4DOMInputSource

Wraps a DOMInputSource object to a SAX InputSource.

    #include <xercesc/dom/DOMInputSource.hpp>
    #include <xercesc/framework/Wrapper4DOMInputSource.hpp>

    class DBInputSource: public DOMInputSource
    {
    ...
    };

    ...
    DOMInputSource *domIS = new DBInputSource;
    Wrapper4DOMInputSource domISWrapper(domIS);
    XercesDOMParser parser;

    parser.parse(domISWrapper);


	Wrapper4InputSource

Wraps a SAX InputSource object to a DOMInputSource.

    #include <xercesc/framework/WrapperInputSource.hpp>
    #include <xercesc/framework/LocalFileInputSource.hpp>

    DOMInputSource* MyEntityResolver::resolveEntity(const XMLCh* const publicId,
                                                    const XMLCh* const systemId,
                                                    const XMLCh* const baseURI)
    {
        return new Wrapper4InputSource(new LocalFileInputSource(baseURI, systemId));
    }

Please refer to the API Documentation for more detail.


	DOMBuilder Supported Features

The behavior of the DOMBuilder is dependant on the values of the following features. All of the features below can be set using the function DOMBuilder::setFeature(cons XMLCh* const, const bool). And can be queried using the function bool DOMBuilder::getFeature(const XMLCh* const). User can also call DOMBuilder::canSetFeature(const XMLCh* const, const bool) to query whether setting a feature to a specific value is supported


	DOM Features

cdata-sections
true:	Keep CDATASection nodes in the document.
false:	Not Supported.
default:	true
XMLUni Predefined Constant:	fgDOMCDATASection
note:	Setting this feature to false is not supported.
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

comments
true:	Keep Comment nodes in the document.
false:	Discard Comment nodes in the document.
default:	true
XMLUni Predefined Constant:	fgDOMComments
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

charset-overrides-xml-encoding
true:	If a higher level protocol such as HTTP [IETF RFC 2616] provides an indication of the character encoding of the input stream being processed, that will override any encoding specified in the XML declaration or the Text declaration (see also [XML 1.0] 4.3.3 "Character Encoding in Entities"). Explicitly setting an encoding in the DOMInputSource overrides encodings from the protocol.
false:	Any character set encoding information from higher level protocols is ignored by the parser.
default:	true
XMLUni Predefined Constant:	fgDOMCharsetOverridesXMLEncoding
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

datatype-normalization
true:	Let the validation process do its datatype normalization that is defined in the used schema language.
false:	Disable datatype normalization. The XML 1.0 attribute value normalization always occurs though.
default:	false
XMLUni Predefined Constant:	fgDOMDatatypeNormalization
note:	Note that setting this feature to true does not affect the DTD normalization operation which always takes place, in accordance to XML 1.0 (Second Edition).
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification
see:	XML 1.0 (Second Edition).

entities
true:	Create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only.
false:	Do not create EntityReference nodes in the DOM tree. No EntityReference nodes will be created, only the nodes corresponding to their fully expanded sustitution text will be created.
default:	true
XMLUni Predefined Constant:	fgDOMEntities
note:	This feature only affects the appearance of EntityReference nodes in the DOM tree. The document will always contain the entity reference child nodes.
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

canonical-form
true:	Not Supported.
false:	Do not canonicalize the document.
default:	false
XMLUni Predefined Constant:	fgDOMCanonicalForm
note:	Setting this feature to true is not supported.
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

infoset
true:	Not Supported.
false:	No effect.
default:	false
XMLUni Predefined Constant:	fgDOMInfoset
note:	Setting this feature to true is not supported.
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

namespaces
true:	Perform Namespace processing
false:	Do not perform Namespace processing
default:	false
XMLUni Predefined Constant:	fgDOMNamespaces
note:	If the validation is on, then the document must contain a grammar that supports the use of namespaces
see:	validation
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

namespace-declarations
true:	Include namespace declaration attributes, specified or defaulted from the schema or the DTD, in the document.
false:	Not Supported.
default:	true
XMLUni Predefined Constant:	fgDOMNamespaceDeclarations
note:	Setting this feature to false is not supported.
see:	namespaces
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

supported-mediatypes-only
true:	Not Supported.
false:	Don't check the media type, accept any type of data.
default:	false
XMLUni Predefined Constant:	fgDOMSupportedMediatypesOnly
note:	Setting this feature to true is not supported.
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

validate-if-schema
true:	When validation is true, the parser will validate the document only if a grammar is specified.
false:	Validation is determined by the state of the validation feature.
default:	false
XMLUni Predefined Constant:	fgDOMValidateIfSchema
see:	validation
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

validation
true:	Report all validation errors.
false:	Do not report validation errors.
default:	false
XMLUni Predefined Constant:	fgDOMValidation
note:	If this feature is set to true, the document must specify a grammar. If this feature is set to false and document specifies a grammar, that grammar might be parsed but no validation of the document contents will be performed.
see:	validate-if-schema
see:	http://apache.org/xml/features/nonvalidating/load-external-dtd
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

whitespace-in-element-content
true:	Include text nodes that can be considered "ignorable whitespace" in the DOM tree.
false:	Do not include ignorable whitespace in the DOM tree.
default:	true
XMLUni Predefined Constant:	fgDOMWhitespaceInElementContent
note:	The only way that the parser can determine if text is ignorable is by reading the associated grammar and having a content model for the document. When ignorable whitespace text nodes are included in the DOM tree, they will be flagged as ignorable; and the method DOMText::isIgnorableWhitespace() will return true for those text nodes.
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification


	Xerces Features

http://apache.org/xml/features/validation/schema
true:	Enable the parser's schema support.
false:	Disable the parser's schema support.
default:	false
XMLUni Predefined Constant:	fgXercesSchema
note	If set to true, namespace processing must also be turned on.
see:	namespaces

http://apache.org/xml/features/validation/schema-full-checking
true:	Enable full schema constraint checking, including checking which may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation restriction checking are controlled by this option.
false:	Disable full schema constraint checking.
default:	false
XMLUni Predefined Constant:	fgXercesSchemaFullChecking
note:	This feature checks the Schema grammar itself for additional errors that are time-consuming or memory intensive. It does not affect the level of checking performed on document instances that use Schema grammars.
see:	http://apache.org/xml/features/validation/schema

http://apache.org/xml/features/nonvalidating/load-external-dtd
true:	Load the External DTD.
false:	Ignore the external DTD completely.
default:	true
XMLUni Predefined Constant:	fgXercesLoadExternalDTD
note	This feature is ignored and DTD is always loaded when validation is on.
see:	validation

http://apache.org/xml/features/continue-after-fatal-error
true:	Attempt to continue parsing after a fatal error.
false:	Stops parse on first fatal error.
default:	false
XMLUni Predefined Constant:	fgXercesContinueAfterFatalError
note:	The behavior of the parser when this feature is set to true is undetermined! Therefore use this feature with extreme caution because the parser may get stuck in an infinite loop or worse.

http://apache.org/xml/features/validation-error-as-fatal
true:	The parser will treat validation error as fatal and will exit depends on the state of http://apache.org/xml/features/continue-after-fatal-error.
false:	The parser will report the error and continue processing.
default:	false
XMLUni Predefined Constant:	fgXercesValidationErrorAsFatal
note:	Setting this true does not mean the validation error will be printed with the word "Fatal Error". It is still printed as "Error", but the parser will exit if http://apache.org/xml/features/continue-after-fatal-error is set to false.
see:	http://apache.org/xml/features/continue-after-fatal-error

http://apache.org/xml/features/validation/use-cachedGrammarInParse
true:	Use cached grammar if it exists in the pool.
false:	Parse the schema grammar.
default:	false
XMLUni Predefined Constant:	fgXercesUseCachedGrammarInParse
note:	If http://apache.org/xml/features/validation/cache-grammarFromParse is enabled, this feature is set to true automatically. Any setting to this feature by the users is a no-op.
see:	http://apache.org/xml/features/validation/cache-grammarFromParse

http://apache.org/xml/features/validation/cache-grammarFromParse
true:	Cache the grammar in the pool for re-use in subsequent parses.
false:	Do not cache the grammar in the pool
default:	false
XMLUni Predefined Constant:	fgXercesCacheGrammarFromParse
note:	If set to true, the http://apache.org/xml/features/validation/use-cachedGrammarInParse is also set to true automatically.
see:	http://apache.org/xml/features/validation/use-cachedGrammarInParse

http://apache.org/xml/features/standard-uri-conformant
true:	Force standard uri conformance.
false:	Do not force standard uri conformance.
default:	false
XMLUni Predefined Constant:	fgXercesStandardUriConformant
note:	If set to true, malformed uri will be rejected and fatal error will be issued.

http://apache.org/xml/features/calculate-src-ofs
true:	Enable src offset calculation.
false:	Disable src offset calculation.
default:	false
XMLUni Predefined Constant:	fgXercesCalculateSrcOfs
note:	If set to true, the user can inquire about the current src offset within the input source. Setting it to false (default) improves the performance.

http://apache.org/xml/features/validation/identity-constraint-checking
true:	Enable identity constraint checking.
false:	Disable identity constraint checking.
default:	true
XMLUni Predefined Constant:	fgXercesIdentityConstraintChecking

http://apache.org/xml/features/generate-synthetic-annotations
true:	Enable generation of synthetic annotations. A synthetic annotation will be generated when a schema component has non-schema attributes but no child annotation.
false:	Disable generation of synthetic annotations.
default:	false
XMLUni Predefined Constant:	fgXercesGenerateSyntheticAnnotations

http://apache.org/xml/features/validate-annotations
true:	Enable validation of annotations.
false:	Disable validation of annotations.
default:	false
XMLUni Predefined Constant:	fgXercesValidateAnnotations

http://apache.org/xml/features/dom-has-psvi-info
true:	Enable storing of PSVI information in element and attribute nodes.
false:	Disable storing of PSVI information in element and attribute nodes.
default:	false
XMLUni Predefined Constant:	fgXercesDOMHasPSVIInfo

http://apache.org/xml/features/dom/user-adopts-DOMDocument
true:	The caller will adopt the DOMDocument that is returned from the parse method and thus is responsible to call DOMDocument::release() to release the associated memory. The parser will not release it. The ownership is transferred from the parser to the caller.
false:	The returned DOMDocument from the parse method is owned by the parser and thus will be deleted when the parser is released.
default:	false
XMLUni Predefined Constant:	fgXercesUserAdoptsDOMDocument
see:	DOMBuilder API Documentation, (DOMBuilder::parse and DOMBuilder::resetDocumentPool)


	DOMBuilder Supported Properties

The behavior of the DOMBuilder is dependant on the values of the following properties. All of the properties below can be set using the function DOMBuilder::setProperty(const XMLCh* const, void*). It takes a void pointer as the property value. Application is required to initialize this void pointer to a correct type. Please check the column "Value Type" below to learn exactly what type of property value each property expects for processing. Passing a void pointer that was initialized with a wrong type will lead to unexpected result. If the same property is set more than once, the last one takes effect.

Property values can be queried using the function void* DOMBuilder::getFeature(const XMLCh* const). The parser owns the returned pointer, and the memory allocated for the returned pointer will be destroyed when the parser is released. To ensure accessibility of the returned information after the parser is released, callers need to copy and store the returned information somewhere else. Since the returned pointer is a generic void pointer, check the column "Value Type" below to learn exactly what type of object each property returns for replication.


	Xerces Properties

http://apache.org/xml/properties/schema/external-schemaLocation
Description	The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. Similar situation happens to <import> element in schema documents. This property allows the user to specify a list of schemas to use. If the targetNamespace of a schema specified using this method matches the targetNamespace of a schema occurring in the instance document in schemaLocation attribute, or if the targetNamespace matches the namespace attribute of <import> element, the schema specified by the user using this property will be used (i.e., the schemaLocation attribute in the instance document or on the <import> element will be effectively ignored).
Value	The syntax is the same as for schemaLocation attributes in instance documents: e.g, "http://www.example.com file_name.xsd". The user can specify more than one XML Schema in the list.
Value Type	XMLCh*
XMLUni Predefined Constant:	fgXercesSchemaExternalSchemaLocation

http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation
Description	The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. This property allows the user to specify the no target namespace XML Schema Location externally. If specified, the instance document's noNamespaceSchemaLocation attribute will be effectively ignored.
Value	The syntax is the same as for the noNamespaceSchemaLocation attribute that may occur in an instance document: e.g."file_name.xsd".
Value Type	XMLCh*
XMLUni Predefined Constant:	fgXercesSchemaExternalNoNamespaceSchemaLocation

http://apache.org/xml/properties/scannerName
Description	This property allows the user to specify the name of the XMLScanner to use for scanning XML documents. If not specified, the default scanner "IGXMLScanner" is used.
Value	The recognized scanner names are: 1."WFXMLScanner" - scanner that performs well-formedness checking only. 2. "DGXMLScanner" - scanner that handles XML documents with DTD grammar information. 3. "SGXMLScanner" - scanner that handles XML documents with XML schema grammar information. 4. "IGXMLScanner" - scanner that handles XML documents with DTD or/and XML schema grammar information. Users can use the predefined constants defined in XMLUni directly (fgWFXMLScanner, fgDGXMLScanner, fgSGXMLScanner, or fgIGXMLScanner) or a string that matches the value of one of those constants.
Value Type	XMLCh*
XMLUni Predefined Constant:	fgXercesScannerName
note:	See Use Specific Scanner for more programming details.

http://apache.org/xml/properties/parser-use-DOMDocument-from-Implementation
Description	This property allows the user to specify a set of features which the parser will then use to acquire an implementation from which it will create the DOMDocument to use when reading in an XML file.
Value Type	XMLCh*
XMLUni Predefined Constant:	fgXercesParserUseDocumentFromImplementation

http://apache.org/xml/properties/security-manager
Description	Certain valid XML and XML Schema constructs can force a processor to consume more system resources than an application may wish. In fact, certain features could be exploited by malicious document writers to produce a denial-of-service attack. This property allows applications to impose limits on the amount of resources the processor will consume while processing these constructs.
Value	An instance of the SecurityManager class (see `xercesc/util/SecurityManager`). This class's documentation describes the particular limits that may be set. Note that, when instantiated, default values for limits that should be appropriate in most settings are provided. The default implementation is not thread-safe; if thread-safety is required, the application should extend this class, overriding methods appropriately. The parser will not adopt the SecurityManager instance; the application is responsible for deleting it when it is finished with it. If no SecurityManager instance has been provided to the parser (the default) then processing strictly conforming to the relevant specifications will be performed.
Value Type	SecurityManager*
XMLUni Predefined Constant:	fgXercesSecurityManager


	DOMWriter


	Constructing a DOMWriter

DOMWriter is a new interface introduced by the W3C DOM Level 3.0 Abstract Schemas and Load and Save Specification. DOMWriter provides the "Save" interface for serializing (writing) a DOM document into XML data. The XML data can be written to various type of output stream.

A DOMWriter instance is obtained from the DOMImplementationLS interface by invoking its createDOMWriter method. For example:

    #include <xercesc/dom/DOM.hpp>
    #include <xercesc/util/XMLString.hpp>
    #include <xercesc/util/PlatformUtils.hpp>
    
    #if defined(XERCES_NEW_IOSTREAMS)
    #include <iostream>
    #else
    #include <iostream.h>
    #endif

    XERCES_CPP_NAMESPACE_USE
    int serializeDOM(DOMNode* node) {

        XMLCh tempStr[100];
        XMLString::transcode("LS", tempStr, 99);
        DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
        DOMWriter* theSerializer = ((DOMImplementationLS*)impl)->createDOMWriter();

        // optionally you can set some features on this serializer
        if (theSerializer->canSetFeature(XMLUni::fgDOMWRTDiscardDefaultContent, true))
            theSerializer->setFeature(XMLUni::fgDOMWRTDiscardDefaultContent, true);

        if (theSerializer->canSetFeature(XMLUni::fgDOMWRTFormatPrettyPrint, true))
             theSerializer->setFeature(XMLUni::fgDOMWRTFormatPrettyPrint, true);

        // optionally you can implement your DOMWriterFilter (e.g. MyDOMWriterFilter)
        // and set it to the serializer
        DOMWriterFilter* myFilter = new myDOMWriterFilter();
        theSerializer->setFilter(myFilter);

        // optionally you can implement your DOMErrorHandler (e.g. MyDOMErrorHandler)
        // and set it to the serializer
        DOMErrorHandler* errHandler = new myDOMErrorHandler();
        theSerializer->setErrorHandler(myErrorHandler);

        // StdOutFormatTarget prints the resultant XML stream
        // to stdout once it receives any thing from the serializer.
        XMLFormatTarget *myFormTarget = new StdOutFormatTarget();

        try {
            // do the serialization through DOMWriter::writeNode();
            theSerializer->writeNode(myFormTarget, *node);
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (const DOMException& toCatch) {
            char* message = XMLString::transcode(toCatch.msg);
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (...) {
            cout << "Unexpected Exception \n" ;
            return -1;
        }


        theSerializer->release();
        delete myErrorHandler;
        delete myFilter;
        delete myFormTarget;
        return 0;
    }

Please refer to the API Documentation and the sample DOMPrint for more detail.


	How does DOMWriter handle built-in entity Reference in node value?

Say for example you parse the following xml document using XercesDOMParser or DOMBuilder


	<root> <Test attr=" > ' < > & " ' "></Test> <Test attr=' > " < > & " ' '></Test> <Test> > " ' < > & " ' </Test> <Test><![CDATA[< > & " ' < > & " ' ] ]></Test> </root>

According to XML 1.0 spec, 4.4 XML Processor Treatment of Entities and References, the parser will expand the entity reference as follows


	<root> <Test attr=" > ' < > & " ' "></Test> <Test attr=' > " < > & " ' '></Test> <Test> > " ' < > & " ' </Test> <Test><![CDATA[< > & " ' < > & " ' ] ]></Test> </root>

and pass such DOMNode to DOMWriter for serialization. From DOMWriter perspective, it does not know what the original string was. All it sees is above DOMNode from the parser. But since the DOMWriter is supposed to generate something that is parsable if sent back to the parser, it cannot print such string as is. Thus the DOMWriter is doing some "touch up", just enough, to get the string parsable.

So for example since the appearance of < and & in text value will lead to not well-form XML error, the DOMWriter fixes them to < and & respectively; while the >, ' and " in text value are ok to the parser, so DOMWriter does not do anything to them. Similarly the DOMWriter fixes some of the characters for the attribute value but keep everything in CDATA.

So the string that is generated by DOMWriter will look like this


	<root> <Test attr=" > ' < > & " ' "/> <Test attr=" > " < > & " ' "/> <Test> > " ' < > & " ' </Test> <Test><![CDATA[< > & " ' < > & " ' ] ]></Test> </root>

To summarize, here is the table that summarize how built-in entity refernece are handled for different Node Type:

Input/Output	<	>	&	"	'	<	>	&	"	'
Attribute	N/A	>	N/A	"	'	<	>	&	"	'
Text	N/A	>	N/A	"	'	<	>	&	"	'
CDATA	<	>	&	"	'	<	>	&	"	'


	DOMWriter Supported Features

The behavior of the DOMWriter is dependant on the values of the following features. All of the features below can be set using the function DOMWriter::setFeature(cons XMLCh* const, bool). And can be queried using the function bool DOMWriter::getFeature(const XMLCh* const). User can also call DOMWriter::canSetFeature(const XMLCh* const, bool) to query whether setting a feature to a specific value is supported


	DOM Features

discard-default-content
true:	Use whatever information available to the implementation (i.e. XML schema, DTD, the specified flag on Attr nodes, and so on) to decide what attributes and content should be discarded or not.
false:	Keep all attributes and all content.
default:	true
XMLUni Predefined Constant:	fgDOMWRTDiscardDefaultContent
note:	Note that the specified flag on Attr nodes in itself is not always reliable, it is only reliable when it is set to false since the only case where it can be set to false is if the attribute was created by the implementation. The default content won't be removed if an implementation does not have any information available.
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

entities
true:	EntityReference nodes are serialized as an entity reference of the form "&entityName;" in the output.
false:	EntityReference nodes are serialized as expanded sustitution text, unless the corresponding entity definition is not found.
default:	true
XMLUni Predefined Constant:	fgDOMWRTEntities
note:	This feature only affects the output XML stream. The dom tree to be serialized will not be changed.
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

canonical-form
true:	Not Supported.
false:	Do not canonicalize the output.
default:	false
XMLUni Predefined Constant:	fgDOMWRTCanonicalForm
note:	Setting this feature to true is not supported.
see:	format-pretty-print
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

format-pretty-print
true:	Formatting the output by adding whitespace to produce a pretty-printed, indented, human-readable form. The exact form of the transformations is not specified by this specification.
false:	Don't pretty-print the result.
default:	false
XMLUni Predefined Constant:	fgDOMWRTFormatPrettyPrint
note:	Setting this feature to true will set the feature canonical-form to false.
see:	canonical-form
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

normalize-characters
true:	Not Supported.
false:	Do not perform character normalization.
note:	Setting this feature to true is not supported.
default:	false
XMLUni Predefined Constant:	fgDOMWRTNormalizeCharacters
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

split-cdata-sections
true:	Split CDATA sections containing the CDATA section termination marker ']]>', or unrepresentable characters in the output encoding. When a CDATA section is split a warning is issued.
false:	Signal an error if a CDATASection contains CDATA section termination marker ']]>', or an unrepresentable character.
default:	true
XMLUni Predefined Constant:	fgDOMWRTSplitCdataSections
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

validation
true:	Not Supported.
false:	Do not report validation errors.
note:	Setting this feature to true is not supported.
default:	false
XMLUni Predefined Constant:	fgDOMWRTValidation
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

whitespace-in-element-content
true:	Include text nodes that can be considered "ignorable whitespace" in the DOM tree.
false:	Not Supported.
note:	Setting this feature to false is not supported.
default:	true
XMLUni Predefined Constant:	fgDOMWRTWhitespaceInElementContent
see:	DOM Level 3.0 Abstract Schemas and Load and Save Specification

xml declaration
true:	Include xml declaration.
false:	Do not include xml declaration.
default:	true
XMLUni Predefined Constant:	fgDOMXMLDeclaration


	Xerces Features

byte-order-mark
true:	Enable the writing of the Byte-Order-Mark (BOM), in the resultant XML stream.
false:	Disable the writing of BOM.
note:	The BOM is written at the beginning of the resultant XML stream, if and only if a DOMDocumentNode is rendered for serialization, and the output encoding is among the encodings listed here (alias acceptable), UTF-16, UTF-16LE, UTF-16BE, UCS-4, UCS-4LE, and UCS-4BE. In the case of UTF-16/UCS-4, the system directive, ENDIANMODE_LITTLE and ENDIANMODE_BIG (which denotes the host machine's endian mode), is refered to determine the appropriate BOM to be written.
default:	false
XMLUni Predefined Constant:	fgDOMWRTBOM
see:	XML 1.0 Appendix F for more information about BOM.


	Deprecated - Java-like DOM

Earlier, Xerces-C++ has provided a set of C++ DOM interfaces that is very similar in design and use, to the Java DOM API bindings. Currently, such interface has been deprecated. See this document for its programming details.