Regular Expression in C++ - c++

I want to write C++ library for Regular Expression. I know there are many libraries available but I want to learn theory behind regular expression and implemented it by myself.
Can anybody please guide on what should I start with.

http://swtch.com/~rsc/regexp/regexp1.html has a good explanation of the two major approaches to regular expressions, their trade-offs, and how to make the faster one (DFAs) usable in a lot of cases that most implementations fail to use them for.

It's also worth looking into book "Compilers: Principles, Techniques, and Tools" that deeply covers techniques behind regular expression parsing (and the theory regarding DFAs and NDFAs). It has good pseudo-code examples that could help in creating own implementation

As long as you want to write the library, then in addition to referencing the excellent resources other answers give, you might explore implementing the C++0x specification for regular expressions found in chapter 28 of N3225.

As far as I'm concerned, this is THE book on the subject of regular expressions. It may not get you all the way to figuring out how to code up a C++ library, but the explanation of the theory is excellent and it includes a lot of examples for the practical application of regular expressions in many contexts.
http://oreilly.com/catalog/9780596528126?green=9514625548&cmp=af-mybuy-9780596528126.IP

In the Microsoft implementation of the TR1, which is the report for the next C++0x standard, there is the <regex> library available.
the TR1 is available for visual 2008, and by default in visual 2010.
But that's interesting only if you plan to program on the windows plateform, of course.
I don't know if g++ includes the <regex> library in its tr1 implementation. I guess yes, but I don't know.

I have used book http://www.amazon.com/Compiler-Design-C-Prentice-Hall-software/dp/0131550454
and implemented here http://code.google.com/p/regex/

Related

Where are `Malformed and `Uchar defined?

NB: This question is more about how to navigate through/decipher OCaml source code, than about the specifics of the items listed below.1
I've come across the expressions
`Malformed
`Uchar
in some OCaml source code, but I can't find definitions for them.
Are they somehow built-in, i.e. part of the standard language?
If not, where does one go when one wants to find out such definitions? (Actually, it's not clear to me whether the leading backtick is part of the name or is a separate operator.)
1I'm having a horrible time making sense of OCaml source code. Usually I have no problem picking up new programming languages, but with OCaml I can't find anything where I expect to find it, and when I do find something it comes along with 100x more undefined/differently-defined/outright bizarre names/concepts, so it's a losing battle.
It's a polymorphic variant, you can read about it in the OCaml manual here
For other basic language features you would not understand, well, you should first read the whole part 1 of the manual
Here is how the manual is organized:
Quick introduction through the language
The whole language (two parts: first the legacy language, then the features added through time)
The standards shell tools manpages (compilers, interprets, ocamlbuild etc.)
The standard libraries
I noted from your previous questions that you use non-standard tools, such as opam and several libraries, make sure to check their documentations and note that some can extend your syntax.
I have to admit that the doc generally assumes people reading it already know about the concepts presented but you don't need to use them at first to master the language.
Happy hacking !
It is a polymorhic variant, that do bot need to be defined before usage. It is something between lisp's symbols and common ADT's. You think of it as a syntactically lighter version version of common ADTs.
I would suggest you to use utop as a playing ground to learn OCaml. Indeed, it is a damn easy language, it looks like to me that you have atacked it from a wrong side, or working with a program that is obfuscated.

What prior knowledge is required for proper Boost library use?

I'm still in the process of learning C++ concepts, but I'm fairly comfortable with pointers, references, Object Oriented Programming, and other programming basics. But I still need to learn more about templates, iterators, and regular expressions. Are there any other concepts I should have a firm grounding in to get the best use out of Boost libraries?
There is no such thing as "proper" use of Boost. You use that part of Boost that helps you with your problem. For Boost Test, for example, you don't have to know much about anything specific. For Boost Graph or Algorithm, you should have a good grasp of templates.
Hence, there's no good way to answer your question. Look at the documentation of the library you want to use (Boost or otherwise), and if you think you can handle it, use it. Otherwise, come back here and ask a more specific question. ;-)
You should know how templates and inheritancy works and read carefully the documentation of the module you are planning to use. It should be enough for most cases.
Hard to say since boost is really a collection of libraries. You should have knowledge of the problem domain before using a library. For example, what are threads and how to deal with them before using boost.thread.
As for C++ specific stuff:
You should know what the standard library already provides you.
Have a firm grasp on how to use templates

boost::optional alternative in C++ Standard Library

I'm trying to get my program working without boost usage, but can't find an alternative of some useful patterns. Namely, I can't find boost::optional-likewise pattern in the standard library. Is there some standard alternative for boost::optional (C++11 or somewhere else)?
Short answer: No.
Long answer: Roll your own according to the boost spec. The documentation is quite exhaustive and the code isn't that complex, but this still requires above average C++ skills.
To update this answer: C++14 unfortunately did not ship with std::optional. The current proposal (Revision 5) is N3793 and it is expected to be shipped as a separate technical specification or to become part of C++17.
There is currently a proposal for C++14 (or C++17). So the answer is (probably) not yet :).
Like pmr explained, it is not possible right now, and will not be until C++17 is out.
However, you should be able to use this single header library on github as a drop in replacement of boost- or std optional. It has no dependencies (except a c++11/c++14 capable compiler).

How do I format c++ comments properly?

Could anyone please suggest a proper format of comments I should use in a C++ project? I think that there is some analog of javadoc format, but I can't find which one. Is there any de-facto standard of that kind? Thanks!
Doxygen is used rather frequently for C++ projects (Doxygen supports other languages as well). You can use Doxygen-style comments as a starting point (here are some examples). Just be consistent.
It's quite controversial as a subject in every language.
While I appreciate the Doxygen answers, since it's effectively kind of a de facto standard, Doxygen itself supports multiple formats.
C++ supports 2 styles of comments:
/**/ multi-line comment
// until the end of line comment
I would advise not to use the former style. The issue is that it does not nest, which is quite annoying when for testing purpose you wish to comment out a whole block of code.
My 2 cents, as the saying goes.
Your mention of javadoc suggests you might be thinking of doxygen. This is a kit which supports structured comments in C++ and other languages, and can produce output in HTML and (if I recall correctly) PDF. It supports a variety of comment styles, including one which looks very much like javadoc.
It works well, and is broadly used.

Official C++ language subsets

I mainly use C++ to do scientific computing, and lately I've been restricting myself to a very C-like subset of C++ features; namely, no classes/inheritance except complex and STL, templates only used for find/replace kinds of substitutions, and a few other things I can't put in words off the top of my head. I am wondering if there are any official or well-documented subsets of the C++ language that I could look at for reference (as well as rationale) when I go about picking and choosing which features to use.
There is Embedded C++. It sounds mostly similar to what you're looking for.
Google publishes its internal C++ style guide, which is often referred to as such a subset: https://google.github.io/styleguide/cppguide.html . Ben Maurer, whose company reCAPTCHA was acquired by Google, describes it as follows in this post on Quora:
You can basically think of Google's
C++ subset as C plus a bit of sugar:
The ability to add methods to structs
Basic single inheritance.
Collection and string classes
Scope based resource management.
They also publish a lint tool, cpplint.py.
Not long ago I listened to this SE-Radio podcast - Episode 152: MISRA with Johan Bezem, which introduces MISRA, standard guidelines for C and C++ to ensure better quality, try looking at it.
The GCC developers are about to allow some C++ features. I'm not aware of any official guidelines, yet, but I am pretty sure that they will define some. Have a look at initial report on the mailing list.
Well, latest developments (TR1, C++0x) in C++ made it very much generic, allowing you to do imperative, OOP or even (limited) functional programming in C++. Libraries like Boost also enable you to do very power declarative template-based meta-programming.
I think Boost is the first thing to try out in C++. It's a comprehensive library, which also includes several modules that enable you to program in functional style (Boost.Functional) or making compile-time declarative meta-programming (Boost MPL).
OpenCL has been using C for writing kernels, but they have recently added (or will soon add) C++ bindings and perhaps Java. OpenCL leaves out a number of performance robbing features of C. Excluded are things like function pointers and recursion. Smart pointers and polymorphism also create overhead.
Restrictions on C:
SIMD programming languages
Slightly off topic: Here is a good discussion comparing OpenCL with CUDA using C.
OpenCL or CUDA Which way to go?
The SEI CERT C++ Coding Standard gives a list of rules for writing safe, reliable, and secure systems in C++14. This is not a subset of C++ per se, but as a coding standard like the other answers is a subset in effect by avoiding unsafe, undefined, or easily-misused features (including some common to C).