Boost Spirit.Lex re-lexing altered lines using state from previous line

Boost Spirit.Lex re-lexing altered lines using state from previous line - c++

I'm considering writing some simple lexers with Boost's Spirit.Lex, but I can't seem to find any examples of what I'd like to do.
More or less, I'd like to lex an entire text file (this is easy). But, once the entire file has been processed, I would like to be able to "re-lex" an arbitrary line (e.g. if its contents have changed), using the state from the previous line to avoid lexing the entire file again.
I have seen related resources like this question as well as the Spirit.Lex API documentation (of course), but a simple, concise example of what I'm talking about would be very helpful.
Does such an example exist and/or is this even feasible with Sprit.Lex?

The following page documents API functions letting you specify the initial lexer state : Boost spirit API documentation.

Related

Loading a text file in to memory and analyze its contents

For educational purposes, I would like to build an IDE for PHP coding.
I made a form app and added OpenFileDialog ..(my c# knowledge was useful, because it was easy ... even though without intelisense!)
Loading a file and reading lines from it is basically the same in every language (even PERL).
But my goal is to write homemade intelisense. I don't need info on the richtextBox and the events it generates, endline, EOF, etc, etc.
The problem I have is, how do I handle the data? line for line?
a struct for each line of text file?
looping all the structs in a linked list? ...
while updating the richtextBox?
searching for opening and closing brackets, variables, etc, etc
I think Microsoft stores a SQL type of database in the app project folders.
But how would you keep track of the variables and simulate them in some sort of form?
I would like to know how to handle this efficiently on dynamic text.

Having never thought this through before, it sounds like an interesting challenge.
Personally, I think you'll have to implement a lexical scanner, tokenizing the entire source file into a source tree, with each token also having information about it mapping the token to a line/character inside of the source file.
From there you can see how far you want to go with it - when someone hovers over a token, it can use the context of the code around it to be more intelligent about the "intellisense" you are providing.
Hovering over something would map back to your source tree, which (as you are building it) you would load up with any information that you want to display.
Maybe it's overkill, but it sounds like a fun project.

This sounds to be related to this question:
https://softwareengineering.stackexchange.com/questions/189471/how-do-ide-s-provide-auto-completion-instant-error-checking-and-debugging
The accepted answer of that question recommends this link which I found very interesting:
http://msdn.microsoft.com/en-us/magazine/cc163781.aspx
In a nutshell, most IDEs generate the parse tree from the code and that is what they stores and manage.

Line by line parsing a huge XML file by a light weight parser

I'm writing a small stand alone tool for Linux which needs read a huge xml-file.
The xml-file has simple structure and a progressive or streaming (line by line) parser is suitable for it.
I want to use a light-weight class library such as TinyXML but I don't know it supports progressive parsing or not?!
If the answer is "yes", Do you have a sample? And, if the answer is "no", Do you know another alternative for it which is small and header only class library?
Update: How about RapidXML or pugiXML?

Sounds like libxml's XmlReader interface is just what you want. Fast, simple, and streaming. Light-weight and XML don't mix, unfortunately. I prefer XmlReader's pull model to SAX's push model, but they'll both do what you want.
In the pull model, you call a function and get a new node, then check yourself if it matches. In the push model, you supply callbacks and SAX calls them as it finds nodes matching them.
TinyXML, last I checked, is not standards-compliant -- I would avoid it.

Libraries for .odt formatting

Are there any C/C++ libraries available that can be used in creating, loading and saving files in .odt-format?
Alternatively, where can I find tutorial-like information on implementing .odt(/ODF) specifications?

This site: OASIS OpenDocument Essentials seems to cover the problem, including howto-examples and shortcuts.It's quite nicely done and easy to follow.
Flexibility can be perceived as complexity. If you don't need the
flexibility, create a template ODT and just fill in the content as
needed. As mentioned, there exist XML parsers to actually handle IO.
ODT isn't a plaintext file, so some complexity/difficulty is expected.
– Ioan
From the link:
The Virtues of Cheating
As you begin to work with OpenDocument files, you may want to write a
program that constructs a document with some feature that isn’t
explained in this book—this is, after all, an “essentials” book. Just
start OpenOffice.org or KOffice, create a document that has the
feature you want, unpack the file, and look for the XML that
implements it. To get a better understanding of how things works,
change the XML, repack the document, and reload it. Once you know how
a feature works, don’t hesitate to copy and paste the XML from the
OpenDocument file into your program. In other words, cheat. It worked
for me when I was writing this book, and it can work for you too!

Associate text from source code line to line - too fragile

I need to associate textual data with the lines in a source code file. Something like "these lines are to create a Myclass object" -> lines from 20 to 32.
The problem is that this kind of line tracking is highly fragile: it is sufficient that someone adds a newline to break my correspondence between associated text and lines.
I need an idea to make this link a bit stronger (not too much but at least resisting to a few line shifts), suggestions are greatly welcome.

An easy solution would be to hash (md5 is pretty easy and accessible) the lines and store the hash along the data.
You can then check the hash against the possibly modified file. If it matches, great, otherwise begin checking previous/next lines for a match.

One approach might be to enlist the help of a source control system. For example, using Git, you could associate textual data with a specific version of the source code. If the source code is changed, you can use a "diff" algorithm to discover which line(s) have been added or removed. Using that delta information, you can then update your annotation lines (for example, adding a line at the top of the file would cause your 20-32 annotation to move to 21-33).

Are you trying to implement some form of automatic documentation system? If so, then basing this around line numbering is indeed fragile. I would suggest using some sort of markup to associate the text with semantic blocks of code that are robust when moved or altered. Perhaps something along the lines of doxygen might be what you are looking for.

High performance XML parsing in C++

Well a lot of questions have been made about parsing XML in C++ and so on...
But, instead of a generic problem, mine is very specific.
I am asking for a very efficient XML parser for C++. In particular I have a VERY VERY BIG XML file to parse.
My application must open this file and retrieve data. It must also insert new nodes and save the final result in the file again.
To do this I used, at the beginning, rapidxml, but it requires me to open the file, parse it all (all the content because this lib has no functions to access the file directly without loading the entire tree first), then edit the tree, modify it and store the final tree on the file by overwriting it... It consumes too much resources.
Is there an XML parser that does not require me to load the entire file, but that I can use to insert, quickly, new nodes and retrieve data? Can you please indicate solutions for this problem of mine?

You want a streaming XML parser rather than what is called a DOM parser.
There are two types of streaming parsers: pull and push. A pull parser is good for quickly writing XML parsers that load data into program memory. A push parser is good for writing a program to translate one document to another (which is what you are trying to accomplish). I think, therefore, that a push parser would be best for your problem.
In order to use a push parser, you need to write what is essentially an event handler for parsing events. By "parsing event", I mean events like "start tag reached", "end tag reached", "text found", "attribute parsed", etc.
I suggest that as you read in the document, you write out the transformed document to a separate, temporary file. Thus, your XML parsing event handlers will need to be written so that they are stateful and write out the XML of the translated document incrementally.
Three excellent push parser libraries for C++ include Expat, Xerces-C++, and libxml2.

Search for "SAX parser". They are mostly tokenizers, i.e. they emit tag by tag without building a tree.

SAX parsers are faster than DOM parsers because DOM parsers read the entire file into memory before building an in-memory representation of the XML document, whereas a SAX parser behaves like an event listener and builds the document as it reads in the file. Go here for an explanation.
As you mentioned Xerces is a good C++ SAX parser.
I would recommend looking into ways of breaking the XML document into smaller XML documents as that seems to be part of your problem.

Okay, here is one off the beaten track, I looked at this, but haven't really used it myself, it's called asmxml. These boys claim performance bar none, downside, you need x86 assembler.

If you really seek high performance XML stream parser then libhpxml is likely the right thing for you.

I’m convinced that no XML library exists that allows you to modify a file without loading it first. This simply isn’t possible because files don’t work that way: you cannot insert (or remove) in the middle of a file. You can only overwrite a block of identical size, or append at the end. But your request would require to append or remove in the middle of the file.
Reading only parts of an XML file may be possible. But writing … no way.

Go for template libraries as much as possible, like Boost::property_tree or Boost::XMLParser or POCO::XML and Folly has XML Parser in it.
Avoid old C libraries, it all old code designs.

someone say QtXML module is high performance for huge XML files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Boost Spirit.Lex re-lexing altered lines using state from previous line - c++

The following page documents API functions letting you specify the initial lexer state : Boost spirit API documentation.

Related

Loading a text file in to memory and analyze its contents

Line by line parsing a huge XML file by a light weight parser

Libraries for .odt formatting

Associate text from source code line to line - too fragile

High performance XML parsing in C++

Categories

Resources