i want to parse following xml tags
<gains>5.640244e+03 5.640322e+03 5.640402e+03 5.640480e+03 5.640560e+03 </gains>
using c++..
can any one help me??
No, because you haven't told us what the output of the parser should be for this example.
To design a parser you should give the rules of the grammar (informally is fine) and then one or more examples. The examples aren't the grammar, however.
It's possible that the OP is needing assistance with c++ stream processing of scientific notation. In that case, please see http://www.cplusplus.com/reference/iostream/manipulators/scientific/ for some helpful hints.
Otherwise, I agree with Ben Voigt, need more information/context in order to be of further assistance.
Related
Has anyone (come across/worked on) a tool giving hints about the unmatched regexps with provided grammer? I.e. imagine that part of the regexp was matched up to some token but the latter failed:
(?P<name>[a-zA-Z])-(?P<number>\d+)_blah
say we managed to find the name but there was a letter before "_" (e.g. "foo-123Z_blah") or "_blah" wasn't matched (e.g. "foo-123_Zblah").
It would be really great if the user could get a hint what went wrong in a long regexp and maybe introduce some corrections etc.
I remember having read that Antlr was quite good at reversing its parsing procedure in order to provide hints for the incorrect language statements according to the provided grammer definition. Is there anything lilghtweight preferebly in Python doing something of that kind?
Thanks!
You should check out http://www.regexbuddy.com/ for debugging your regular expressions. It is kind of like a regex IDE and has a library full or common regexes, realtime help on regex composition as well as testing/debugging tools.
Unfortunately it's not free, but it's well worth the small amount they charge. The debugging tool is pretty great, I'm confident it'll help you. It shows which components match a string, when backtracking occurs, at which symbols etc.
It's great software, highly recommended!
I'm having trouble making a dynamic boost spirit lexer that tracks the column number. Is this possible? Can anyone provide a simple example?
Thanks!
Take a look at this post
How to use Boost::Spirit::Lex to lex a file without reading the whole file into memory first?
And
http://www.boost.org/doc/libs/1_48_0/libs/wave/doc/samples.html
http://boost-spirit.com/home/articles/qi-example/tracking-the-input-position-while-parsing/
The code they reference throughout the article is posted at the end.
I want to write regular expression library in C/C++.
What is the good starting point , any books or articles.
I know there are may libraries are available , but I want to write my own version.
A good starting point is to use existing implementations and criticize them.
Pay attention to data structures and design decisions you don't like.
Avoid them when you write your version.
[Edit 16-Jan-2015] I recently encountered this beautiful book Beautiful Code. I recommend you go through Chapter 1, "A Regular Expression Matcher" by Brian Kernighan.
You can read the classic paper by Ken Thompson, "Regular expression search algorithm" ... http://portal.acm.org/citation.cfm?doid=363347.363387 ... this paper should give you a good understanding on how regular expressions are matched using finite automata.
This is another page giving some detailed information by Russ Cox ... http://swtch.com/~rsc/regexp/
Hope these help you get started.
I don't know a book that will help you with the implementation details -- and I'm sure there are tons of details to make it efficient. However, the book Languages and Machines, by Thomas A. Sudkamp, will be of help to understand the ideas behind an implementation.
I think what you'll need to do is compile a regular expression into a finite automata. If you don't know much about grammars and automatas, then part II of that book "Grammars, Automata, and Languages" will be of great help.
The book Compilers, principles, techniques, & tools; by Alfred Aho, Monica Lam, Ravi Sethi and Jeffrey Ullman (also refered to as the dragon book), may also be of help. It's oriented towards making a compiler for a computer language, not for regular expression language. However, you'll probably find it helpful, specially the part about parsing, as it has more of a practical nature (as opposed to Languages and Machines that is very theoretical).
Anyway, if I was to write a regular expression language, those would be my starting points. I recommend you borrowing both from the library you have access to. Other than that, you should take a look at working implementations. I'm just guessing here, but I think there'll be probably good documentation regarding Perl regular expression implementation. Seeing they're so popular and work so well.
Good luck.
I'm going to create a javadoc look-a-like for the language I'm mainly using, but I was wondering - is it worth to use a parser generator for this? The main idea to use a parser generator was because I could use templates for the HTML code which could be exported then. Also I could also use PDF templates if I need it.
Thanks,
William v. Doorn
If all you are going to do is extract the "Javadoc" comments, you don't need a full parser; after all, you only need to recognize the comments and regexps will likely do fine.
If you want to extract information from the code and use it augment the javadoc comments, you'll need not only a parser but also name and type resolution.
You can see the results of combining parsing, name/type resolution, and Javadoc comment extraction in the Java Source Code Browser, which produces Javadoc results along with fully hyperlinked source code cross-referenced into the Javadocs.
The machinery which produced this is a generalization of something like ANTLR. But there was little need of using code templates to produce the HTML itself; all the hard work is in parsing and fact collection across the symbol tables.
I am embarking on some learning and I want to write my own syntax highlighting for files in C++.
Can anyone give me ideas on how to go about doing this?
To me it seems that when a file is opened:
It would need to be parsed and decided what type of source file it is. Trusting the extension might not be fool-proof
A way to know what keywords/commands apply to what language
A way to decide what color each keyword/command gets
I want to do this on OS X, using C++ or Objective-C.
Can anyone provide pointers on how I might get started with this?
Syntax highlighters typically don't go beyond lexical analysis, which means you don't have to parse the whole language into statements and declarations and expressions and whatnot. You only have to write a lexer, which is fairly easy with regular expressions. I recommend you start by learning regular expressions, if you haven't already. It'll take all of 30 minutes.
You may want to consider toying with Flex ( the lexical analyzer generator; https://github.com/westes/flex ) as a learning exercise. It should be quite easy to implement a basic syntax highlighter in Flex that outputs highlighted HTML or something.
In short, you would give Flex a set of regular expressions and what to do with matching text, and the generator will greedily match against your expressions. You can make your lexer transition among exclusive states (e.g. in and out of string literals, comments, etc.) as shown in the flex FAQ. Here's a canonical example of a lexer for C written in Flex: http://www.lysator.liu.se/c/ANSI-C-grammar-l.html .
Making an extensible syntax highlighter would be the next part of your journey. Although I am by no means a fan of XML, take a look at how Kate syntax highlighting files are defined, such as this one for C++ . Your task would be to figure out how you want to define syntax highlighters, then make a program that uses those definitions to generate HTML or whatever you please.
You may want to look at how GeSHI implements highlighting, etc. In addition, it has a whole bunch of language packs that contain all the keywords you'll ever want.
Assuming that you are using Cocoa frameworks you can use UTIs to determine the file type.
For an overview of the api:
http://developer.apple.com/mac/library/documentation/FileManagement/Conceptual/understanding_utis/understand_utis_intro/understand_utis_intro.html#//apple_ref/doc/uid/TP40001319-CH201-SW1
For a list of known UTIs:
http://developer.apple.com/mac/library/documentation/Miscellaneous/Reference/UTIRef/Articles/System-DeclaredUniformTypeIdentifiers.html#//apple_ref/doc/uid/TP40009259-SW1
The two keys are you probably most interested in would be kUTTypeObjectiveC​PlusPlusSource and kUTTypeCPlusPlusHeader.
For the highlighting you might find the information on this page helpful as it discusses syntax highlighting with an NSView and temporary attributes:
http://www.cocoadev.com/index.pl?ImplementSyntaxHighlightingUsingTemporaryAttributes
I think (1) isn't possible, since the only way to tell if a file is valid C++ is to run it through a C++ parser and see if it parses... but if you used that as your standard, you couldn't operate on code that doesn't compile because it is a work-in-progress, which you probably want to do. It's probably best just to trust the extension, as I don't think any other method will work better than that.
You can get a list of C++ keywords here: http://www.cppreference.com/wiki/keywords/start
The colors are up to you (or if you want, you can make them configurable and leave the choice to the user)