C++/STL string: How to mimic regex like function with wildcards? - c++

I would like to compare 4 character string using wildcards.
For example:
std::string wildcards[]=
{"H? ", "RH? ", "H[0-5] "};
/*in the last one I need to check if string is "H0 ",..., and "H5 " */
Is it possible to manage to realize only by STL?
Thanks,
Arman.
EDIT:
Can we do it without boost.regex?
Or should I add yet another library dependences to my project?:)

Use Boost.Regex

No - you need boost::regex

Regular expressions were made for this sort of thing. I can understand your reluctance to avoid a dependency, but in this case it's probably justified.
You might check your C++ compiler to see if it includes any built-in regular expression library. For example, Microsoft includes CAtlRegExp.
Barring that, your problem doesn't look too difficult to write custom code for.

You can do it without introducing a new library dependency, but to do so you'd end up writing a regular expression engine yourself (or at least a subset of one).
Is there some reason you don't want to use a library for this?

Related

Regular Expression for whole world

First of all, I use C# 4.0 to parse the code of a VB6 application.
I have some old VB6 code and about 500+ copies of it. And I use a regular expression to grab all kinds of global variables from the code. The code is described as "Yuck" and some poor victim still has to support this. So I'm hoping to help this poor sucker a bit by generating overviews of specific constants. (And yes, it should be rewritten but it ain't broke, so...)
This is a sample of a code line I need to match, in this case all boolean constants:
Public Const gDemo = False 'Is this a demo version
And this is the regular expression I use at this moment:
Public\s+Const\s+g(?'Name'[a-zA-Z][a-zA-Z0-9]*)\s+=\s+(?'Value'[0-9]*)
And I think it too is yuckie, since the * at the end of the boolean group. But if I don't use it, it will only return 'T' or 'F'. I want the whole word.
Is this the proper RegEx to use as solution or is there an even nicer-looking option?
FYI, I use similar regexs to find all string constants and all numeric constants. Those work just fine. And basically the same .BAS file is used for all 50 copies but with different values for all these variables. By parsing all files, we have a good overview of how every version is configured.
And again, yes, we need to rebuild the whole project from scratch since it becomes harder to maintain these days. But it works and we need the manpower for other tasks. It just needs the occasional tweaks...
You can use: Public\s+Const\s+g(?<Name>[a-zA-Z][a-zA-Z0-9]*)\s+=\s+(?<Value>False|True)
demo

How to parse mathematical formulae from strings in c++

I want to write a program that takes an string like x^2+1 and understand it.
I want to ask the user to enter her/his function and I want to be able to process and understand it. Any Ideas?
char s[100];
s <- "x*I+2"
x=5;
I=2;
res=calc(s);
I think it could be done by something like string analyses but I think Its so hard for me.
I have another Idea and that is using tcc in main program and doing a realtime compile and run and delete a seprated program (or maybe function) that has the string s in it.
and I will create a temp file every time and ask tcc to compile it and run it by exec or similar syntax.
/*tmp.cpp:*/
#include <math.h>
void main(/*input args*/){
return x*I+2;
}
the tmp.cpp will created dynamically.
thanks in advance.
I am not sure what do you expect. It's too complex to give the code as answer, but the general idea is not very complex. It's not out of reach to code, even for a normal hobbyist programmer.
You need to define grammar, tokenize string, recognize operators, constants and variables.
Probably put expression into a tree. Make up a method for substituting the variables... and you can evaluate!
You need to have some kind of a parser. The easiest way to have math operations parsable is to have them written in RPN. You can, however, write your own parser using parser libraries, like Spirit from boost or Yacc
I use with success , function parser
from www it looks like it supports also std::complex, but I never used it
As luck would have it, I recently wrote one!
Look for {,include/}lib/MathExpression/Term. It handles complex numbers but you can easily adapt it for plain old floats.
The licence is GPL 2.
The theory in brief, when you have an expression like
X*(X+2)
Your highest level parser can parse expressions of the form A + B + C... In this case A is the whole expression.
You recurse to parse an operator of higher precedence, A * B * C... In this case A is X and B is (X+2)
Keep recursing until you're parsing either basic tokens such as X or hit an opening parenthesis, in which case push some kind of stack to track where your are and recurse into the parentheses with the top-level low-precedence parser.
I recommend you use RAII and throw exceptions when there are parse errors.
use a Recursive descent parser
Sample: it's in german, but a small and powerfull solution
look here
here is exactly what You are searching for. Change the function read_varname to detect a variable like 'x' or 'I'.

Parse std::string for a selection of characters

Is there an easy way to parse a std::string in search of a list of certain charcters? For example, let's say the user enters this<\is a.>te!st string. I'd like to be able to spot those non-letter characters are there and do something about it. I'm looking for a general purpose solution that allows me to simply specify a list of chars so I can reuse the function in different situations. I'm guessing regular expressions will play a key role in any solution, and obviously the more compact and effience, the better.
You could use std::string::find_first_not_of() for this. It'll find the characters except those in the set that you give it. Its counterpart, find_first_of(), will search for characters that are in the set.
Both functions allow you to specify the starting index. This will enable you you to continue the search from where you left off.
How about using a regex library like boost::regex?
This should exactly do what you are looking for.
If your compiler supports C++11 you can use std::regex.
Regex seems like overkill. You can use std::string's methods: find_first_of() and/or find_last_of(). Here you can find documentation and examples.

C++ - Splitting Filename and File Extension

Ok, first of all I don't want to use Boost, or any external libraries. I just want to use the C++ Standard Library. I can easily split strings with a given delimiter with my split() function:
void split(std::string &string, std::vector<std::string> &tokens, const char &delim) {
std::string ea;
std::stringstream stream(string);
while(getline(stream, ea, delim))
tokens.push_back(ea);
}
I do this on filenames. But there's a problem. There are files that have extensions like: tar.gz, tar.bz2, etc. Also there are some filenames that have extra dots. Some.file.name.tar.gz. I wish to separate Some.file.name and tar.gz Note: The number of dots in a filename isn't constant.
I also tried PathFindExtension but no luck. Is this possible? If so, please enlighten me. Thank you.
Edit: I'm very sorry about not specifying the OS. It's Windows.
I think you could use std::string find_last_of to get the index of the last ., and substr to cut the string (although the "complex extensions" involving multiple dots will require additional work).
There is no way of doing what you want that does not involve a database of extensions for your purpose. There's nothing magical about extensions, they are just part of a filename (if you gunzip foo.tar.gz you'll likely get a foo.tar, so for this application .gz actually is "the extension"). So, in order to do what you want, build a database of extensions that you want to look for and fall back on "last dot" if you don't find one.
There's nothing in the C++ standard library -- that is, it's not in the Standard --, but every operating system I know of provides this functionality in a variety of ways.
In Windows you can use _splitpath(), and in Linux you can use dirname() & basename()
The problem is indeed filenames like *.tar.gz, which can not be split consistently, due to the fact that (at least in Windows) the .tar part isn't part of the extension. You'll either have to keep a list for these special cases and use a one-dot string::rfind for the rest or find some pre-implemented way. Note that the .tar.* extensions aren't infinite, and very much standardized (there's about ten of them I think).
You could create a look-up table of file extensions that you think you might encounter. And also add a command line option to add a new one to the look-up table if you encounter anything new. Then parse through the file name to see if it any entry in the look-up table is a sub-string in the file name.
EDIT: You can also refer to this question: C++/STL string: How to mimic regex like function with wildcards?

Parse URLs using C-Strings in C++

I'm learning C++ for one of my CS classes, and for our first project I need to parse some URLs using c-strings (i.e. I can't use the C++ String class).
The only way I can think of approaching this is just iterating through (since it's a char[]) and using some switch statements. From someone who is more experienced in C++ - is there a better approach? Could you maybe point me to a good online resource? I haven't found one yet.
Weird that you're not allowed to use C++ language features i.e. C++ strings!
There are some C string functions available in the standard C library.
e.g.
strdup - duplicate a string
strtok - breaking a string into tokens. Beware - this modifies the original string.
strcpy - copying string
strstr - find string in string
strncpy - copy up to n bytes of string
etc
There is a good online reference here with a full list of available c string functions
for searching and finding things.
http://www.cplusplus.com/reference/clibrary/cstring/
You can walk through strings by accessing them like an array if you need to.
e.g.
char* url="http://stackoverflow.com/questions/1370870/c-strings-in-c"
int len = strlen(url);
for (int i = 0; i < len; ++i){
std::cout << url[i];
}
std::cout << endl;
As for actually how to do the parsing, you'll have to work that out on your own. It is an assignment after all.
There are a number of C standard library functions that can help you.
First, look at the C standard library function strtok. This allows you to retrieve parts of a C string separated by certain delimiters. For example, you could tokenize with the delimiter / to get the protocol, domain, and then the file path. You could tokenize the domain with delimiter . to get the subdomain(s), second level domain, and top level domain. Etc.
It's not nearly as powerful as a regular expression parser, which is what you would really want for parsing URLs, but it works on C strings, is part of the C standard library and is probably OK to use in your assignment.
Other C standard library functions that may help:
strstr() Extracts substrings just like std::string::substr()
strspn(), strchr() and strpbrk() Find a character or characters in a string, similar to std::string::find_first_of(), etc.
Edit: A reminder that the proper way to use these functions in C++ is to include <cstring> and use them in the std:: namespace, e.g. std::strtok().
You might want to refer to an open source library that can parse URLs (as a reference for how others have done it -- obviously don't copy and paste it!), such as curl or wget (links are directly to their url parsing files).
I don't know what the requirements are for parsing the URLs,
but if this is CS level it would be appropriate to use (very
simple) BNF and a (very simple) recursive descent parser.
This would make for a more robust solution than direct
iteration, e.g. for malformed URLs.
Very few string functions from the standard C library would
be needed.
You can use C functions like strtok, strchr, strstr etc.
Many of the runtime library functions that have been mentioned work quite well, either in conjunction with or apart from the approach of iterating through the string that you mentioned (which I think is time honored).