Context Navigation

OgreCompiler2Pass.h @ 692

Revision 692, 25.9 KB checked in by mattausch, 19 years ago (diff)
adding ogre 1.2 and dependencies

Rev	Line
[692]	1	/*
	2	-----------------------------------------------------------------------------
	3	This source file is part of OGRE
	4	(Object-oriented Graphics Rendering Engine)
	5	For the latest info, see http://www.stevestreeting.com/ogre/
	6
	7	Copyright (c) 2000-2005 The OGRE Team
	8	Also see acknowledgements in Readme.html
	9
	10	This program is free software; you can redistribute it and/or modify it under
	11	the terms of the GNU General Public License as published by the Free Software
	12	Foundation; either version 2 of the License, or (at your option) any later
	13	version.
	14
	15	This program is distributed in the hope that it will be useful, but WITHOUT
	16	ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
	17	FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
	18
	19	You should have received a copy of the GNU General Public License along with
	20	this program; if not, write to the Free Software Foundation, Inc., 59 Temple
	21	Place - Suite 330, Boston, MA 02111-1307, USA, or go to
	22	http://www.gnu.org/copyleft/gpl.html.
	23	-----------------------------------------------------------------------------
	24	*/
	25
	26
	27	#ifndef __Compiler2Pass_H__
	28	#define __Compiler2Pass_H__
	29
	30	#include <vector>
	31	#include "OgrePrerequisites.h"
	32
	33	namespace Ogre {
	34
	35	/** Compiler2Pass is a generic 2 pass compiler/assembler
	36	@remarks
	37	provides a tokenizer in pass 1 and relies on the subclass to provide the virtual method for pass 2
	38
	39	PASS 1 - tokenize source: this is a simple brute force lexical scanner/analyzer that also parses
	40	the formed token for proper semantics and context in one pass
	41	it uses Look Ahead Left-Right (LALR) ruling based on Backus - Naur Form (BNF) notation for semantic
	42	checking and also performs context checking allowing for language dialects
	43
	44	PASS 2 - generate application specific instructions ie native instructions based on the tokens in the instruction container.
	45
	46	@par
	47	this class must be subclassed with the subclass providing some implementation details for Pass 2. The subclass
	48	is responsible for setting up the token libraries along with defining the language syntax and handling
	49	token actions during the second pass.
	50
	51	@par
	52	The sub class normally supplies a simplified BNF text description in its constructor prior to doing any parsing/tokenizing of source.
	53	The simplified BNF text description defines the language syntax and rule structure.
	54	The meta-symbols used in the BNF text description are:
	55
	56	::= meaning "is defined as". "::=" starts the definition of a rule. The left side of ::= must contain an <identifier>
	57
	58	<> angle brackets are used to surround syntax rule names. A syntax rule name is also called a non-terminal in that
	59	it does not generate a terminal token in the instruction container for pass 2 processing.
	60
	61	\| meaning "or". if the item on the left of the \| fails then the item on the right is tested.
	62	Example: <true_false> ::= 'true' \| 'false';
	63	whitespace is used to imply AND operation between left and right items.
	64	Example: <terrain_shadaws> ::= 'terrain_shadows' <true_false>
	65	the 'terrain_shadows' terminal token must be found and <true_false> rule must pass in order for <terrain_shadows> rule
	66	to pass.
	67
	68
	69	[] optional rule identifier is enclosed in meta symbols [ and ].
	70	Note that only one identifier or terminal token can take [] modifier.
	71	{} repetitive identifier (zero or more times) is enclosed in meta symbols { and }
	72	Note that only one identifier or terminal token can take {} modifier.
	73	'' terminal tokens are surrounded by single quotes. A terminal token is always one or more characters.
	74	For example: 'Colour' defines a character sequence that must be matched in whole. Note that matching is case
	75	sensitive.
	76	@ turn on single character scanning and don't skip white space.
	77	Mainly used for label processing that allow white space.
	78	Example: '@ ' prevents the white space between the quotes from being skipped
	79	-'' no terminal token is generated when a - precedes the first single quote but the text in between the quotes is still
	80	tested against the characters in the source being parsed.
	81	(?! ) negative lookahead (not test) inspired by Perl 5. Scans ahead for a non-terminal or terminal expression
	82	that should fail in order to make the rule production pass.
	83	Does not generate a token or advance the cursur. If the lookahead result fails ie token is found,
	84	then the current rule fails and rollback occurs. Mainly used to solve multiple contexts of a token.
	85	An Example of where not test is used to solve multiple contexts:
	86
	87	<rule> ::= <identifier> "::=" <expression>
	88	<expression> ::= <and_term> { <or_term> }
	89	<or_term> ::= "\|" <and_term>
	90	<and_term> ::= <term> { <term> }
	91	<term> ::= <identifier_right> \| <terminal_symbol> \| <repeat_expression> \| <optional_expression>
	92	<identifier_right> ::= <identifier> (?!"::=")
	93
	94	<identiefier> appears on both sides of the ::= so (?!"::=") test to make sure that ::= is not on the
	95	right which would indicate that a new rule was being formed.
	96
	97	Works on both terminals and non-terminals.
	98	Note: lookahead failure cause the whole rule to fail and rollback to occur
	99
	100	<#name> # indicates that a numerical value is to be parsed to form a terminal token. Name is optional and is just a descriptor
	101	to help with understanding what the value will be used for.
	102	Example: <Colour> ::= <#red> <#green> <#blue>
	103
	104	() parentheses enclose a set of characters that can be used to generate a user identifier. for example:
	105	(0123456789) matches a single character found in that set.
	106	An example of a user identifier:
	107
	108	<Label> ::= <Character> {<Character>}
	109	<Character> ::= (abcdefghijklmnopqrstuvwxyz)
	110
	111	This will generate a rule that accepts one or more lowercase letters to make up the Label. The User identifier
	112	stops collecting the characters into a string when a match cannot be found in the rule.
	113	*/
	114	class _OgreExport Compiler2Pass
	115	{
	116
	117	protected:
	118
	119	// BNF operation types
	120	enum OperationType {otUNKNOWN, otRULE, otAND, otOR, otOPTIONAL, otREPEAT, otDATA, otNOT_TEST, otEND};
	121
	122	/** structure used to build rule paths
	123
	124	*/
	125	struct TokenRule
	126	{
	127	OperationType operation;
	128	size_t tokenID;
	129
	130	TokenRule(void) : operation(otUNKNOWN), tokenID(0) {}
	131	TokenRule(const OperationType ot, const size_t token)
	132	: operation(ot), tokenID(token) {}
	133	};
	134
	135	typedef std::vector<TokenRule> TokenRuleContainer;
	136	typedef TokenRuleContainer::iterator TokenRuleIterator;
	137
	138	static const size_t SystemTokenBase = 1000;
	139	enum SystemRuleToken {
	140	_no_token_ = SystemTokenBase,
	141	_character_,
	142	_value_,
	143	_no_space_skip_
	144	};
	145
	146	enum BNF_ID {BNF_UNKOWN = 0,
	147	BNF_SYNTAX, BNF_RULE, BNF_IDENTIFIER, BNF_IDENTIFIER_RIGHT, BNF_IDENTIFIER_CHARACTERS, BNF_ID_BEGIN, BNF_ID_END,
	148	BNF_CONSTANT_BEGIN, BNF_SET_RULE, BNF_EXPRESSION,
	149	BNF_AND_TERM, BNF_OR_TERM, BNF_TERM, BNF_TERM_ID, BNF_CONSTANT, BNF_OR, BNF_TERMINAL_SYMBOL, BNF_TERMINAL_START,
	150	BNF_REPEAT_EXPRESSION, BNF_REPEAT_BEGIN, BNF_REPEAT_END, BNF_SET, BNF_SET_BEGIN, BNF_SET_END,
	151	BNF_NOT_TEST, BNF_NOT_TEST_BEGIN, BNF_OPTIONAL_EXPRESSION, BNF_NOT_EXPRESSION, BNF_NOT_CHK,
	152	BNF_OPTIONAL_BEGIN, BNF_OPTIONAL_END, BNF_NO_TOKEN_START, BNF_SINGLEQUOTE, BNF_SINGLE_QUOTE_EXC, BNF_SET_END_EXC,
	153	BNF_ANY_CHARACTER, BNF_SPECIAL_CHARACTERS1,
	154	BNF_SPECIAL_CHARACTERS2, BNF_WHITE_SPACE_CHK,
	155
	156	BNF_LETTER, BNF_LETTER_DIGIT, BNF_DIGIT, BNF_WHITE_SPACE,
	157	BNF_ALPHA_SET, BNF_NUMBER_SET, BNF_SPECIAL_CHARACTER_SET1,
	158	BNF_SPECIAL_CHARACTER_SET2, BNF_SPECIAL_CHARACTER_SET3, BNF_NOT_CHARS
	159	};
	160
	161
	162	/** structure used to build lexeme Type library */
	163	struct LexemeTokenDef
	164	{
	165	size_t ID; /// Token ID which is the index into the Lexeme Token Definition Container
	166	bool hasAction; /// has an action associated with it. only applicable to terminal tokens
	167	bool isNonTerminal; /// if true then token is non-terminal
	168	size_t ruleID; /// index into Rule database for non-terminal token rulepath and lexeme
	169	bool isCaseSensitive; /// if true use case sensitivity when comparing lexeme to source
	170	String lexeme; /// text representation of token or valid characters for label parsing
	171
	172	LexemeTokenDef(void) : ID(0), hasAction(false), isNonTerminal(false), ruleID(0), isCaseSensitive(false) {}
	173	LexemeTokenDef( const size_t ID, const String& lexeme, const bool hasAction = false, const bool caseSensitive = false )
	174	: ID(ID)
	175	, hasAction(hasAction)
	176	, isNonTerminal(false)
	177	, ruleID(0)
	178	, isCaseSensitive(caseSensitive)
	179	, lexeme(lexeme)
	180	{
	181	}
	182
	183	};
	184
	185	typedef std::vector<LexemeTokenDef> LexemeTokenDefContainer;
	186	typedef LexemeTokenDefContainer::iterator LexemeTokenDefIterator;
	187
	188	typedef std::map<std::string, size_t> LexemeTokenMap;
	189	typedef LexemeTokenMap::iterator TokenKeyIterator;
	190	/// map used to lookup client token based on previously defined lexeme
	191
	192
	193	/** structure for Token instructions that are constructed during first pass*/
	194	struct TokenInst
	195	{
	196	size_t NTTRuleID; /// Non-Terminal Token Rule ID that generated Token
	197	size_t tokenID; /// expected Token ID. Could be UNKNOWN if valid token was not found.
	198	size_t line; /// line number in source code where Token was found
	199	size_t pos; /// Character position in source where Token was found
	200	bool found; /// is true if expected token was found
	201	};
	202
	203	typedef std::vector<TokenInst> TokenInstContainer;
	204	typedef TokenInstContainer::iterator TokenInstIterator;
	205
	206	// token que, definitions, rules
	207	struct TokenState
	208	{
	209	TokenInstContainer tokenQue;
	210	LexemeTokenDefContainer lexemeTokenDefinitions;
	211	TokenRuleContainer rootRulePath;
	212	LexemeTokenMap lexemeTokenMap;
	213	};
	214
	215	TokenState* mClientTokenState;
	216
	217	/// Active token que, definitions, rules currntly being used by parser
	218	TokenState* mActiveTokenState;
	219	/// the location within the token instruction container where pass 2 is
	220	size_t mPass2TokenQuePosition;
	221	/** the que position of the previous token that had an action.
	222	A token's action is fired on the next token having an action.
	223	*/
	224	size_t mPreviousActionQuePosition;
	225
	226	/// pointer to the source to be compiled
	227	const String* mSource;
	228	/// name of the source to be compiled
	229	String mSourceName;
	230	size_t mEndOfSource;
	231
	232	size_t mCurrentLine; /// current line number in source being tokenized
	233	size_t mCharPos; /// position in current line in source being tokenized
	234
	235	/// storage container for constants defined in source
	236	/// container uses Token index as a key associated with a float constant
	237	std::map<size_t, float> mConstants;
	238	/// storage container for string labels defined in source
	239	/// container uses Token index as a key associated with a label
	240	std::map<size_t, String> mLabels;
	241	/// flag indicates when a label is being parsed.
	242	/// It gets set false when a terminal token not of _character_ is encountered
	243	bool mLabelIsActive;
	244	/// the key of the active label being built during pass 1.
	245	/// a new key is calculated when mLabelIsActive switches from false to true
	246	size_t mActiveLabelKey;
	247	/// flag being true indicates that spaces are not to be skipped
	248	/// automatically gets set to false when mLabelIsActive goes to false
	249	bool mNoSpaceSkip;
	250	/// if flag is true then next terminal token is not added to token que if found
	251	/// but does effect rule path flow
	252	bool mNoTerminalToken;
	253
	254	/// Active Contexts pattern used in pass 1 to determine which tokens are valid for a certain context
	255	uint mActiveContexts;
	256
	257	/** perform pass 1 of compile process
	258	scans source for lexemes that can be tokenized and then
	259	performs general semantic and context verification on each lexeme before it is tokenized.
	260	A tokenized instruction list is built to be used by Pass 2.
	261	A rule path can trigger Pass 2 execution if enough tokens have been generated in Pass 1.
	262	Pass 1 will then pass control to pass 2 temporarily until the current tokens have been consumed.
	263
	264	*/
	265	bool doPass1();
	266
	267	/** performs Pass 2 of compile process which is execution of the tokens
	268	@remark
	269	Pass 2 takes the token instructions generated in Pass 1 and
	270	builds the application specific instructions along with verifying
	271	symantic and context rules that could not be checked in Pass 1.
	272	@par
	273	Pass 2 execution consumes tokens and moves the Pass 2 token instruction position towards the end
	274	of the token container. Token execution can insert new tokens into the token container.
	275	*/
	276	bool doPass2();
	277
	278	/** execute the action associated with the token pointed to by the Pass 2 token instruction position.
	279	Its upto the child class to implement how it will associate a token key with and action.
	280	Actions should be placed at positions withing the BNF grammer (instruction que) that indicate
	281	enough tokens exist for pass 2 processing to take place.
	282	*/
	283	virtual void executeTokenAction(const size_t tokenID) = 0;
	284	/** setup client token definitions. Gets called when BNF grammer is being setup.
	285	*/
	286	virtual void setupTokenDefinitions(void) = 0;
	287	/** Gets the next token from the instruction que. If an unkown token is found then an exception is raised but
	288	the instruction pointer is still moved passed the unknown token. The subclass should catch the exception,
	289	provide an error message, and attempt recovery.
	290
	291	@param expectedTokenID if set then if tokenID does not match then an exception is raised
	292	*/
	293	const TokenInst& getNextToken(const size_t expectedTokenID = 0);
	294	/** Gets the current token from the instruction que.
	295	*/
	296	const TokenInst& getCurrentToken(void);
	297	/** If a next token instruction exist then test if its token ID matches.
	298	This method is usefull for peeking ahead during pass 2 to see if a certain
	299	token exists.
	300	@param expectedTokenID is the ID of the token to match.
	301	*/
	302	bool testNextTokenID(const size_t expectedTokenID);
	303	/**
	304	*/
	305	void replaceToken(void);
	306	/** Gets the next token's associated floating point value in the instruction que that was parsed from the
	307	text source. If an unkown token is found or no associated value was found then an exception is raised but
	308	the instruction pointer is still moved passed the unknown token. The subclass should catch the exception,
	309	provide an error message, and attempt recovery.
	310	*/
	311	float getNextTokenValue(void);
	312	/** Gets the next token's associated text label in the instruction que that was parsed from the
	313	text source. If an unkown token is found or no associated label was found then an exception is raised but
	314	the instruction pointer is still moved passed the unknown token. The subclass should catch the exception,
	315	provide an error message, and attempt recovery.
	316	*/
	317	const String& getNextTokenLabel(void);
	318	/** Gets the number of tokens waiting in the instruction que that need to be processed by an token action in pass 2.
	319	*/
	320	size_t getPass2TokenQueCount(void) const;
	321	/** Get the number of tokens not processed by action token.
	322	Client Actions should use this method to retreive the number of parameters(tokens)
	323	remaining to be processed in the action.
	324	*/
	325	size_t getRemainingTokensForAction(void) const;
	326
	327	/** Add a lexeme token association. The backend compiler uses the associations between lexeme
	328	and token when building the rule base from the BNF script so all associations must be done
	329	prior to compiling a source.
	330	@param lexeme is the name of the token and use when parsing the source to determin a match for a token.
	331	@param token is the ID associated with the lexeme
	332	@param hasAction must be set true if the client wants an action triggered when this token is generated
	333	@param caseSensitive should be set true if lexeme match should use case sensitivity
	334	*/
	335	void addLexemeToken(const String& lexeme, const size_t token, const bool hasAction = false, const bool caseSensitive = false);
	336
	337	/** sets up the parser rules for the client based on the BNF Grammer text passed in.
	338	Raises an exception if the grammer did not compile successfully. This method gets called
	339	when a call to compile occurs and no compiled BNF grammer exists, otherwise nothing will happen since the compiler has no rules to work
	340	with. The grammer only needs to be set once during the lifetime of the compiler unless the
	341	grammer changes.
	342	BNF Grammer rules are cached once the BNF grammer source is compiled.
	343	The client should never have to call this method directly.
	344	*/
	345	void setClientBNFGrammer(void);
	346
	347
	348
	349	/// find the eol charater
	350	void findEOL();
	351
	352	/** check to see if the text at the present position in the source is a numerical constant
	353	@param fvalue is a reference that will receive the float value that is in the source
	354	@param charsize reference to receive number of characters that make of the value in the source
	355	@return
	356	true if characters form a valid float representation
	357	false if a number value could not be extracted
	358	*/
	359	bool isFloatValue(float& fvalue, size_t& charsize) const;
	360
	361	/** Check if source at current position is supposed to be a user defined character label.
	362	A new label is processed when previous operation was not _character_ otherwise the processed
	363	character (if match was found) is added to the current label. This allows _character_ operations
	364	to be chained together to form a crude regular expression to build a label.
	365	@param rulepathIDX index into rule path database of token to validate.
	366	@return
	367	true if token was found for character label.
	368	*/
	369	bool isCharacterLabel(const size_t rulepathIDX);
	370	/** check to see if the text is in the lexeme text library
	371	@param lexeme points to begining of text where a lexem token might exist
	372	@param caseSensitive set to true if match should be case sensitive
	373	@return
	374	true if a matching token could be found in the token type library
	375	false if could not be tokenized
	376	*/
	377	bool isLexemeMatch(const String& lexeme, const bool caseSensitive) const;
	378	/// position to the next possible valid sysmbol
	379	bool positionToNextLexeme();
	380	/** process input source text using rulepath to determine allowed tokens
	381	@remarks
	382	the method is reentrant and recursive
	383	if a non-terminal token is encountered in the current rule path then the method is
	384	called using the new rule path referenced by the non-terminal token
	385	Tokens can have the following operation states which effects the flow path of the rule
	386	RULE: defines a rule path for the non-terminal token
	387	AND: the token is required for the rule to pass
	388	OR: if the previous tokens failed then try these ones
	389	OPTIONAL: the token is optional and does not cause the rule to fail if the token is not found
	390	REPEAT: the token is required but there can be more than one in a sequence
	391	DATA: Used by a previous token ie for character sets
	392	NOTTEST: performs negative lookahead ie make sure the next token is not of a certain type
	393	END: end of the rule path - the method returns the succuss of the rule
	394
	395	@param rulepathIDX index into an array of Token Rules that define a rule path to be processed
	396	@return
	397	true if rule passed - all required tokens found
	398	false if one or more tokens required to complete the rule were not found
	399	*/
	400	bool processRulePath( size_t rulepathIDX);
	401
	402
	403	/** setup ActiveContexts - should be called by subclass to setup initial language contexts
	404	*/
	405	void setActiveContexts(const uint contexts){ mActiveContexts = contexts; }
	406
	407	/// comment specifiers are hard coded
	408	void skipComments();
	409
	410	/// find end of line marker and move past it
	411	void skipEOL();
	412
	413	/// skip all the white space which includes spaces and tabs
	414	void skipWhiteSpace();
	415
	416
	417	/** check if current position in source has the lexeme text equivalent to the TokenID
	418	@param rulepathIDX index into rule path database of token to validate
	419	@param activeRuleID index of non-terminal rule that generated the token
	420	@return
	421	true if token was found
	422	false if token lexeme text does not match the source text
	423	if token is non-terminal then processRulePath is called
	424	*/
	425	bool ValidateToken(const size_t rulepathIDX, const size_t activeRuleID);
	426
	427	/** scan through all the rules and initialize token definition with index to rules for non-terminal tokens.
	428	Gets called when internal grammer is being verified or after client grammer has been parsed.
	429	@param grammerName is the name of the grammer the token rules are for
	430	*/
	431	void verifyTokenRuleLinks(const String& grammerName);
	432	/** Checks the last token instruction and if it has an action then it triggers the action of the previously
	433	found token having an action.
	434	*/
	435	void checkTokenActionTrigger(void);
	436	/** Get the text representation of the rule path. This is a good way to way to visually verify
	437	that the BNF grammer did compile correctly.
	438	@param ruleID is the index into the rule path.
	439	*/
	440	String getBNFGrammerTextFromRulePath(size_t ruleID);
	441
	442
	443	private:
	444	// used for interpreting BNF script
	445	// keep it as static so that only one structure is created
	446	// no matter how many times this class is instantiated.
	447	static TokenState mBNFTokenState;
	448	// maintain a map of BNF grammer
	449	typedef std::map<String, TokenState> TokenStateContainer;
	450	static TokenStateContainer mClientTokenStates;
	451	/// if a previous token action was setup then activate it now
	452	void activatePreviousTokenAction(void);
	453	/// initialize token definitions and rule paths
	454	void initBNFCompiler(void);
	455	/// Convert BNF grammer token que created in pass 1 into a BNF rule path
	456	void buildClientBNFRulePaths(void);
	457	/// modify the last rule in the container. An end operation is added to the rule path.
	458	void modifyLastRule(const OperationType pendingRuleOp, const size_t tokenID);
	459	/** get the token ID for a lexeme in the client state. If the lexeme is not found then
	460	it is added to the map and definition container and a new tokenID created.
	461	@return the ID of the token.
	462	*/
	463	size_t getClientLexemeTokenID(const String& lexeme, const bool isCaseSensitive = false);
	464	/// Extract a Non Terminal identifier from the token que
	465	void extractNonTerminal(const OperationType pendingRuleOp);
	466	/// Extract a Terminal lexeme from the token que and add to current rule expression
	467	void extractTerminal(const OperationType pendingRuleOp, const bool notoken = false);
	468	/// Extract a set from the token que and add to current rule expression
	469	void extractSet(const OperationType pendingRuleOp);
	470	/// Extract a numeric constant from the token que and add it to the current rule expression
	471	void extractNumericConstant(const OperationType pendingRuleOp);
	472	String getLexemeText(size_t& ruleID);
	473
	474	public:
	475
	476	/// constructor
	477	Compiler2Pass();
	478	virtual ~Compiler2Pass() {}
	479
	480	/** compile the source - performs 2 passes.
	481	First pass is to tokinize, check semantics and context.
	482	The second pass is performed by using tokens to look up function implementors and executing
	483	them which convert tokens to application specific instructions.
	484	@remark
	485	Pass 2 only gets executed if Pass 1 has built enough tokens to complete a rule path and found no errors
	486	@param source a pointer to the source text to be compiled
	487	@return
	488	true if Pass 1 and Pass 2 are successfull
	489	false if any errors occur in Pass 1 or Pass 2
	490	*/
	491	bool compile(const String& source, const String& sourceName);
	492	/** gets BNF Grammer. Gets called when BNF grammer has to be compiled for the first time.
	493	*/
	494	virtual const String& getClientBNFGrammer(void) = 0;
	495
	496	/** get the name of the BNF grammer.
	497	*/
	498	virtual const String& getClientGrammerName(void) = 0;
	499
	500	};
	501
	502	}
	503
	504	#endif
	505

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: OGRE/trunk/ogrenew/OgreMain/include/OgreCompiler2Pass.h @ 692

Download in other formats: