I am reading a clojure source code file using clojure's read-string.
However single line comments are naturally ignored by the reader. When I generate new source code, those comments are no there anymore.
Is there a way I can preserve these comments ?
Rewrite-clj https://github.com/xsc/rewrite-clj seems perfect for what you are trying to do? It preserves comments.
I have a large file of source code that I need to parse some specific text out of. I want to get it done as fast as possible. What would be the fastest way to do this in Swift? These are all the options I could think of?
Using a third-party library of string functions- I've tried this. It works well, but I imagine this is much slower compared to other lower level methods in general, unless there are some notably fast ones out there specifically for Swift.
Using a third-party HTML parser. I've looked into a few, but I'm not sure if they will suit my needs. Before I proceed with this, I just want to know if these are generally faster, if there are any notabley fast ones out there, and if I'm able to tweak them to get specifically what I want from the source code.
Using String or NSString. From what I understand, using String vs NSString should give no difference in speed. I am quite comfortable with this approach, and it's lower level than some of the other ones, so should I expect fairly fast performance?
Using regular expressions. I've been told that since these are lower-level, they should ideally be the fastest. I've used regular expressions before, but not in ios. Is it easy to do string parsing with NSRegularExpression, and is it faster?
Thank you!
Came upon this link while researching your question: http://benedictcohen.co.uk/blog/archives/74
The authors explains an older approach to what #CodaFi suggested, but there is a relevant update at the end you should check out:
The easiest way to parse HTML is to treat it as XML and use the
NSXMLParser. iOS comes with LibTidy which is capable of fixing a
multitude of markup sins. Use LibTidy to create clean XML and pass
this XML to NSXMLParser. Only use the approach outlined above if it’s
not possible to use NSXMLParser.
So perhaps option 4 or 5 for you to check out?
I'm going to create a javadoc look-a-like for the language I'm mainly using, but I was wondering - is it worth to use a parser generator for this? The main idea to use a parser generator was because I could use templates for the HTML code which could be exported then. Also I could also use PDF templates if I need it.
Thanks,
William v. Doorn
If all you are going to do is extract the "Javadoc" comments, you don't need a full parser; after all, you only need to recognize the comments and regexps will likely do fine.
If you want to extract information from the code and use it augment the javadoc comments, you'll need not only a parser but also name and type resolution.
You can see the results of combining parsing, name/type resolution, and Javadoc comment extraction in the Java Source Code Browser, which produces Javadoc results along with fully hyperlinked source code cross-referenced into the Javadocs.
The machinery which produced this is a generalization of something like ANTLR. But there was little need of using code templates to produce the HTML itself; all the hard work is in parsing and fact collection across the symbol tables.
i want to parse following xml tags
<gains>5.640244e+03 5.640322e+03 5.640402e+03 5.640480e+03 5.640560e+03 </gains>
using c++..
can any one help me??
No, because you haven't told us what the output of the parser should be for this example.
To design a parser you should give the rules of the grammar (informally is fine) and then one or more examples. The examples aren't the grammar, however.
It's possible that the OP is needing assistance with c++ stream processing of scientific notation. In that case, please see http://www.cplusplus.com/reference/iostream/manipulators/scientific/ for some helpful hints.
Otherwise, I agree with Ben Voigt, need more information/context in order to be of further assistance.
I have a huge set of log lines and I need to parse each line (so efficiency
is very important).
Each log line is of the form
cust_name time_start time_end (IP or URL )*
So ip address, time, time and a possibly empty list of ip addresses or urls separated by semicolons. If there is only ip or url in the last list there is no separator. If there
is more than 1, then they are separated by semicolons.
I need a way to parse this line and read it into a data structure. time_start or
time_end could be either system time or GMT. cust_name could also have multiple strings
separated by spaces.
I can do this by reading character by character and essentially writing my own parser.
Is there a better way to do this ?
Maybe Boost RegExp lib will help you.
http://www.boost.org/doc/libs/1_38_0/libs/regex/doc/html/index.html
I've had success with Boost Tokenizer for this sort of thing. It helps you break an input stream into tokens with custom separators between the tokens.
Using regular expressions (boost::regex is a nice implementation for C++) you can easily separate different parts of your string - cust_name, time_start ... and find all that urls\ips
Second step is more detailed parsing of that groups if needed. Dates for example you can parse using boost::datetime library (writing custom parser if string format isn't standard).
Why do you want to do this in C++? It sounds like an obvious job for something like perl.
Consider using a Regular Expressions library...
Custom input demands custom parser. Or, pray that there is an ideal world and errors don't exist. Specially, if you want to have efficiency. Posting some code may be of help.
for such a simple grammar you can use split, take a look at http://www.boost.org/doc/libs/1_38_0/doc/html/string_algo/usage.html#id4002194
UPDATE changed answer drastically!
I have a huge set of log lines and I need to parse each line (so efficiency is very important).
Just be aware that C++ won't help much in terms of efficiency in this situation. Don't be fooled into thinking that just because you have a fast parsing code in C++ that your program will have high performance!
The efficiency you really need here is not the performance at the "machine code" level of the parsing code, but at the overall algorithm level.
Think about what you're trying to do.
You have a huge text file, and you want to convert each line to a data structure,
Storing huge data structure in memory is very inefficient, no matter what language you're using!
What you need to do is "fetch" one line at a time, convert it to a data structure, and deal with it, then, and only after you're done with the data structure, you go and fetch the next line and convert it to a data structure, deal with it, and repeat.
If you do that, you've already solved the major bottleneck.
For parsing the line of text, it seems the format of your data is quite simplistic, check out a similar question that I asked a while ago: C++ string parsing (python style)
In your case, I suppose you could use a string stream, and use the >> operator to read the next "thing" in the line.
see this answer for example code.
Alternatively, (I didn't want to delete this part!!)
If you could write this in python it will be much simpler. I don't know your situation (it seems you're stuck with C++), but still
Look at this presentation for doing these kinds of task efficiently using python generator expressions: http://www.dabeaz.com/generators/Generators.pdf
It's a worth while read.
At slide 31 he deals with what seems to be something very similar to what you're trying to do.
It'll at least give you some inspiration.
It also demonstrates quite strongly that performance is gained not by the particular string-parsing code, but the over all algorithm.
You could try to use a simple lex/yacc|flex/bison vocabulary to parse this kind of input.
The parser you need sounds really simple. Take a look at this. Any compiled language should be able to parse it at very high speed. Then it's an issue of what data structure you build & save.