I'm using <regex> from Visal Studio 2010.
I understand that when I create regex object then it's compiled. There is no compile method like in other languages and libraries but I thinks that's how it work, am I right?
I need to store large amount of this compiled regexes in a file so I would just get chunk of memory block and get my compiled regex.
I can't figure how to do this. I found that in PCRE it is possible but it's Linux library. There is a Windows [version2 but it's 3 years old and I would like to use more high-level approach (there isn't c++ wrapper in windows version).
So is it possible to use save std:regex or boost::regex (it's the same right?) as a chunk of memory and then simply reuse it later?
Or is there other simple library for Windows that allows to do this?
EDIT:
Thanks for great answers. I'll simply check if it would be sufficient to simply store a regex as a string and then if it would still be slow I'll test and compare it with this old PCRE library.
You can use the regex strings themselves as the 'serialized' regex - just save those to a file, then when you want to reconstitute the regex objects, just pass the saved strings to the regex constructor.
The only drawbacks I can think of:
it might take some more time to 'reconstitute' the regex database, but I really don't know how much (I suspect that the time would be dominated by I/O anyway, so I'm not sure if the difference would be significant - I really don't know how much overhead there is in regex compilation by the boost library's implementation)
if you want the stored regexes obfuscated, you'll have to do that yourself instead of relying on the compiled-binary state to be unreadable
The advantages to this are:
it's 100% supported, so it's not fragile/brittle
it's portable across compiler versions and platforms (ie., not fragile/brittle)
Is the time to compile the regex database (excluding I/O) really significant enough to warrant trying to save the compiled state?
I don't think it can be done without modifying the boost library to support it.
I don't know specifically how the boost regex library is implemented, but most regex libraries compile things to a binary blob that's then interpreted later as a series of instructions for a sort of limited virtual machine.
If boost's regex library is implemented in this way, serializing it would be relatively easy. Just get at the binary blob somehow and dump it to disk. The existence of the POSIX regex API for the boost library tells me that this is probably how it's implemented.
OTOH, another way to implement it (and a not so common way) is by generating something like an abstract syntax tree for the regex. This means that the individual pieces of the regex would be represented by their own objects and those objects would be linked together into some larger structure that represented the whole regex.
If boost does it this way then serialization will be very complex.
This is not possible with C++, but what I really wish happened is that boost could compile constant string regular expressions at compile time with template meta-programming. The reason this is not possible is that it isn't possible to iterate over the contents of a string (even a constant string) with a template.
I'm not sure, but did you take a look at boost::serialization, which can serialize a C++ object?
Related
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
compile and run c++ code runtime
I want to take as an input an expression from the user as a string and compile it into a callable c++ function. Are there any tools that allow you to do this easily?
Basically, How do I compile an Expression Tree into a callable method, C#? seems similar to what I want to do except that I need to do this in c++ and not c#.
I can certainly make a sort of generic evaluator using lex and yacc but I don't want to have to parse the string every time. Basically this expression will run in a critical inner loop so I'm looking for a way to "compile" it at run-time.
It's not easy... If you want my two cents, I will follow these steps:
Create an interface for the code that you must create at runtime. At first, you create an interface for what you can do. For example your class must inherit from a pure virtual base class that will represent your interface. Take care that your program will use not arbitrary code, but code created in a specific way, because it must know how to use it.
Call the compiler from inside your program. The compiler should create a library from your source code. You can use a predefined project that you store somewhere, and then replace its source file with your own. So it can be easy to obtain a right library.
Put your library in a specified source where you can find it.
Load the library at runtime. If you search, you will see that it's possible to load dynamic libraries at runtime, not only at linking time (in this way, for example, you can create plugins for programs). So your program can load your library and use it. For example you can find some information here.
But, as others have said, it's not a trivial task.
EDIT: Another solution is to check a parser like boost::spirit::qi, that is well used can give extremly helpful results.
You have to parse the expression to an abstract syntax tree and walk it or evaluate it in-place. Something like this should satisfy your needs for a simple mathematical expression.
You can write your mini-interpreter. With the commands same with c++ (not all of them). Of course your compiler will optimize it but not sure how much. I did it for assembly in qbasic (mov, add, sub...) but it was quite slow because of being an interpreter of an interpreter :D
Did you think about Evolutionary computation and fitness functions? Worth looking at.
You can create a data structure that represents your parsed expression tree, and the overhead of evaluating that at runtime will be small compared to parsing the string every time.
Actually getting a callable method in C++ will be quite difficult, in that you would have to generate object code and dynamically load it into your program. This would duplicate a lot of what the whole compiler tool-chain does.
I'm using Google's C++ interface to PCRE to match a single regex multiple times (possibly thousands of times). From reading the PCRE manual, it seems like a good idea to let PCRE 'study' (spend time optimizing) the regex, however, I can't seem to find a way to do that with the C++ wrapper. The pcrecpp.h doesn't mention studying at all.
Is using pcre_study() worthwile, and if so, how can it be combined with pcrecpp and its RE class?
From a quick scan of the PCRE++ source code, it appears that "studying" is impossible with this API because the compiled RE (pcre*) member of the RE wrapper object is private and there's no way to get it out or reset it.
If you want to know whether the studying optimization is worthwhile with your REs, the easiest option that I see is to copy pcrecpp.{cc,h} into your project and hack it in; the C++ API is just some thin wrapper code. You might even want to submit a patch upstream if, like me, you like to litter open source projects with your name and copyright ;)
For the benefit of people hitting on this question from a web search, I will point out that the ability to "study" an RE has been removed from PCRE2:
Explicit "studying" of compiled patterns has been abolished - it now always
happens automatically. JIT compiling is done by calling a new function,
pcre2_jit_compile() after a successful return from pcre2_compile().
Reference: https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
I wanted to do some regular expressions in C++ so I looked on the interwebz (yes, I am an beginner/intermediate with C++) and found this SO answer.
I really don't know what to choose between boost::regex and boost::xpressive. What are the pros/cons?
I also read that boost::xpressive opposed to boost::regex is a header-only library. Is it hard to statically compile boost::regex on Linux and Windows (I almost always write cross-platform applications)?
I'm also interested in comparisons of compile time. I have a current implementation using boost::xpressive and I'm not too content with the compile times (but I have no comparisons to boost::regex).
Of course I'm open for other suggestions for regex implementations too. The requirements are free (as in beer) and compatible with http://nclabs.org/license.php.
One fairly important difference is that Boost Regex can support linking to ICU for Unicode support (character classes, etc) Boost Regex ICU Support.
As far as I can tell, Boost Xpressive doesn't have this kind of support built-in.
Well if you need to create a regular expression at runtime (i.e. Letting the user type in a regular expression to search for) you can't use xpressive as it is compile time only.
On the other hand, since it is a compile-time construct, it should benefit more from your optimizer than regex does.
I do enough stuff with Boost.MPL, StateChart, and Spirit that 220KB of compiler warning and errors don't really bother me much. If that sounds like hell to you, stick with Boost.Regex.
If you do use xpressive, I highly recommend turning on -Wfatal-errors as this will stop compilation (and further errors) after the first 'error:' line.
For compilation time, it's no contest. Boost.Regex will be faster*. The fact that xpressive uses MPL will cause compile times to be dramatically increased.
*This assumes you only build the dll/so once
When using the Boost libraries I tend to lean toward the use of header only libraries, due to cross platform compatability issues. The down side of that is that when your compiler reports an error related to your use of the the library, the header only output tends toward the arcane.
Assuming you're using a reasonably recent compiler, there's a pretty decent chance that it includes a regex package already. Try just doing #include <regex> and see if the compiler finds it.
The only trick to things is that it could be in either (or both) of two different namespaces. Regexes were included in TR1 of the C++ standard, and are also in (the final drafts of) C++11. The TR1 version is in a namespace named tr1, where the standard version is in std, just like the rest of the library.
FWIW, this is essentially the same as Boost regex, not Boost Xpressive.
I would try to supplement other people answers by get deeper into topic of compile-time regular expressions(CTR) vs run-time(dynamic) regular expressions(RTR) in a more theoretical way(this topic is implied by OP question indirectly IMHO). Run-time regex are more known and popular(most language core-libraries implementations), i suppose due to historical reasons. They are OK when regular expression is determined at run-time, unlike CTR. Both work on finite state machine basis.
RTR are "compiled" and interpreted by some kind of universal finite state machine(universal means its kind of interpreter which scheme is given at run-time, "compiled" in some internal data structure - when you pass regex string, then interpreted at run-time).
But CTR is "compiled" at compile-time and are specific for particular regex, so you can't use them, when regex is given at run-time(applications like text editors, file/internet search engines).
But they are a priori more efficient(theoretically however) as customized in compile-time finite state machine will be efficient, than interpreter with table-preset scheme of this machine(some similar cases are reflection field access vs compile-time access, or specialized function optimized for some fixed parameter as pointed out there). Another advantage is compile-time syntax checking. CTR can be implemented through meta-programming and/or code generation.
As for specific implementations - there are many RTR, but not so numerous CTR. For C++ they are above mentioned Boost and STL C++0x11 implementations. You may need them for optimizing regex perfomance/size of generated code/memory usage, mostly relevant for embedded systems or high perfomance specific applications.
SO question about CTR
Finding CTR-implementations is harder, one example if found is Re2C Code generator project, Java CTR implementation and C# implementation featuring run-time compilation(into IL code, not internal data structure) of Regex [there is SO question about it]
P.S. Sorry, couldn't post some relevant links due to reputation
I have to write a C/C++ program to process a bunch of text files (around 100) and find a pattern (commonly a string). Since the platform I am going to run this will be unix, I thought why wouldn't I make use of the grep system command within my program as it is very fast and effective. But, my friend says using system("grep...") within a program is not advisable. He suggests me to use string pattern matching algorithm which I feel will slow down the program.
So, I want some advice over this. Help me out.
Without knowing what your program is going to do, it's hard to say. But running commands via system() will slow your program, down considerably, though this may not be important. Whatever you do, don't write your own string-matching code if regular expressions can solve the problem - use one of the many existing regex libraries. And if most of your problem could be solved using grep, consider writing a shell script, or using a scripting language like Python instead of a C++ program.
Your two major alternatives are (a) to use grep, or (b) to use a library, linked to your C or C++ program, which provides regular expressions.
Using grep means you get your program running very soon, because you don't have much to learn. Using a regular expression library means your program runs faster.
How much faster? The major speed increase is because you're not setting up a new process and running a new program for each of those 100 files. How significant is this speed saving?
The answer depends on how large each of those files is. If they're very large, it won't make much speed difference which method you use. If small, it will.
If you decide to go with a regular expression library, my guess is that they're all about the same speed. I chose something I was familiar with, since I know Perl: the Perl compatible regular expression library.
make forking and using exec family of command use grep and save its result in a file.
in main wait for process to end.
then in main open the file and use the result.
Is there an efficient way to store the compiled regexes (compiled via regcomp(), PCRE) in a binary file, so that later I can just read from the file and call regexec()?
Or is it just a matter of dumping the compiled regex_t structs to the file and reading them back when needed?
Unless you have a super-complex regex, I hardly see an advantage of serializing the compiled regex, the compilation time shouldn't be that big. Unless you are on a super-tight embed system?
In any case, indeed dumping the structure might be a solution, at least you can try...
[EDIT] I just looked at the source I have (6.7) and as I feared, it is not so simple, the structure starts with a void *... You can't serialize pointers, they have to be recomputed.