Remove text between two characters (parenthesis) in a string - c++

I'm working on a project and I want to remove text between two parentheses in a string.
Example:
std::string str = "I want to remove (this)."
How would I go about doing that?
I've searched google and stackoverflow an haven't found anything.

I'd use a regular expression for that. Check out the link I provided. As for the expression to use the following expression
(\()(?:[^\)\\]*(?:\\.)?)*\)
That guy worked for me.
Conditionally replace regex matches in string
Do not get regular and common expressions confused. This is not like the more common expression of :-) or :-O or >:( All-though effective These expressions are mutually exclusive expressions that not many languages understand but are more commonly used.

Related

Regular Expression groups ignoring comma inside parenthesis

I know that are plenty of regular expressions around here similar to what I am going to ask, but couldn't find one that actually helps me.
This one got close, but it uses Java split method, but I need to capture the values using only regular expressions:
Java: splitting a comma-separated string but ignoring commas in quotes
So, what I need to do is, given the below input:
string,string([a-zA-Z]{0,9}),integer
I would like to capture 3 matches:
string
string([a-zA-Z]{0,9})
integer
Note that inside the parenthesis we can have a regular expression, which means almost any chars, even comma.
I can't use split here, because I am not using Java, but an internal declarative programming that uses ICU regular expressions and has an API for capturing groups, but not a regex based split method.
Any help would be appreciated. And I am really sorry if there exists other posts that could be duplicated as this one, but I have spent a few hours looking around, and even played with the post I mentioned, but couldn't get to a solution.
Thanks
EDIT
The input I provided is just an example, but other inputs are also possible.
Besides, after #sin comments, I have reviewed the input, and we can actually assume we'll have quotes inside the parenthesis, like that:
string("[\w]{0,9}"),integer,string

How to match Regular Expression with String containing a wildcard character?

Regular expression:
/Hello .*, what's up?/i
String which may contain any number of wildcard characters (%):
"% world, what's up?" (matches)
"Hello world, %?" (matches)
"Hello %, what's up?" (matches)
"Hey world, what's up?" (no match)
"Hello %, blabla." (no match)
I have thought of a solution myself, but I'd like to see what you are able to come up with (considering performance is a high priority). A requirement is the ability to use any regular expression; I only used .* in the example, but any valid regular expression should work.
A little automata theory might help you here. You say
this is a simplified version of matching a regular expression with a regular expression[1]
Actually, that does not seem to be the case. Instead of matching the text of a regular expression, you want to find regular expressions that can match the same string as a given regular expression.
Luckily, this problem is solvable :-) To see whether such a string exists, you would need to compute the union of the two regular languages and test whether the result is not the empty language. This might be a non-trivial problem and solving it efficiently [enough] may be hard, but standard algorithms for this do already exist. Basically you would need to translate the expression into a NFA, that one into a DFA which you then can union.
[1]: Indeed, the wildcard strings you're using in the question build some kind of regular language, and can be translated to corresponding regular expressions
Not sure that I fully understand your question, but if you're looking for performance, avoid regular expressions. Instead you can split the string on %. Then, take a look at the first and last matches:
// Anything before % should match at start of the string
targetString.indexOf(splits[0]) === 0;
// Anything after % should match at the end of the string
targetString.indexOf(splits[1]) + splits[1].length === targetString.length;
If you can use % multiple times within the string, then the first and last splits should follow the above rules. Anything else just needs to be in the string, and .indexOf is how you can check that.
I came to realize that this is impossible with a regular language, and therefore the only solution to this problem is to replace the wildcard symbol % with .* and then match two regular expressions with each other. This can however not be done by traditional regular expressions, look at this SO-question and it's answers for details.
Or perhaps you should edit the underlying Regular Expression engine for supporting wildcard based strings. Anyone being able to answer this question by extending the default implementation will be accepted as answer to this question ;-)

VB6 and C# regexes

I need to convert a VB6(which I'm not fammiliar with) project to C# 4.0 one. The project contains some regexes for string validation.
I need to know if the regexes behave the same in both cases, so if i just copy the regex string from the VB6 project, to the C# project, will they work the same?
I have a basic knowledge of regexes and I can just about read what one does, but for flavors and such, that's a bit over my head at the moment.
For example, are these 2 lines equivalent?
VB6:
isStringValid = (str Like "*[!0-9A-Z]*")
C#:
isStringValid = Regex.IsMatch(str, "*[!0-9A-Z]*");
Thanks!
The old VB Like operator, despite appearances, is not a regular expression interface. It's more of a glob pattern matcher. See http://msdn.microsoft.com/en-us/library/swf8kaxw.aspx
In your example:
Like "*[!0-9A-Z]*"
Matches strings that start and end with any character (zero or more), then doesn't match an alphanumeric character somewhere in the middle. The regular expression for this would be:
/.*[^0-9A-Z].*/
EDIT To answer your question: No, the two can't be used interchangeably. However, it's fairly easy to convert Like's operand into a proper regular expression:
Like RegEx
========== ==========
? .
* .*
# \d
[abc0-9] [abc0-9]
[!abc0-9] [^abc0-9]
There are a few caveats to this, but that should get you started and cover most cases.
In a word, yes.
These are the same. Some quick googling should give you answers to more complex issues.
http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/bce145b8-95d4-4be4-8b07-e8adee7286f1/
http://www.regular-expressions.info/dotnet.html

Substring match by reqular expression

I am not much familiar in regular expression, I wanted to do the following comparison by using regular expression.
Source word is : Hello124
In a list, I have following strings
Hello12
Hello
Hel
Hel123
Her
the output I want is ( Hello12, Hello, Hel ). i.e from source sting, I will reduce last char one by one and find the match in the list. Please let me know, Is that possible to use regular expression to optimize this functionality?
I am using C++ with stl::tr1 library.
You could try this:
^H(?:e(?:l(?:l(?:o(?:1(?:24?)?)?)?)?)?)?$
But in most languages it would be easier just to evaluate query.StartsWith(word) for each word.
Of course, you can solve this problem by using regular expressions, for example using the following: h|he|hel|hell|hello|hello1|hello12|hello124.
However, this is not very nice and an overkill. As far as I know, every language supporting regex also supports querying for substrings (you may want to look here if you find yours).

Meaning of "match" as related to Regular Expressions

I'm writing a term paper on regular expressions and I'm a bit confused regarding the way one uses the word "match" when referring to regexes. Which of the following is the correct wording to use:
"The regular expression matches the string"
or
"The string matches the regular expression"
Or are they both correct? All opinions on this are welcome! I really want to get this right and I think it would help my understanding greatly to get this clarified.
I think both are correct. It depends on what you're focusing on. If your focus is in the regular expression itself to see if it serves to work on a given string or set of strings, then you use the first sentence. In the contrary, if you are more interested in looking at a set of strings that match certain criteria, the second one is applicable. You know, a match has the meaning of some equivalence under certain conditions, so both sentences sound equivalent to me.
The string is being matched to the regular expression pattern, therefore I would say the latter is more accurate
When two things match, it is (from a logical perspective at least) irrelevant in which order you mention them.
So it depends on what you want to put focus on.
The string matches the regular expression: Focus is on the string.
The regular expression matches the string: Focus is on the regex.
The latter sounds better to me. The regex specifies a pattern that the string may match. But there's nothing really wrong with either.
If you said either one to me, I would understand what you're saying. I'm sure people have said both to me, and I never thought either one needed to be corrected.
I agree that the string matches (or not) the regular expression. To make it clear why I'd say: the regular expression defines a grammar, and a given string is either well-formed according to that grammar or not.
"The regular expression matches the string"
True if the RE matches the whole string (eg. using ^ $ or just happening to match everything). Otherwise, I would write: the regular expression has match(es) in the string.
"The string matches the regular expression"
Again, true if the regex matches everything, otherwise it sounds a bit odd.
But indeed, in the case of a whole match, the two sentences are equivalent.
Since you're looking for a regular expression within a string, it's more correct to say that you've found the regular expression since that's a one-way relationship.
But as to which matches which, that's a two way relationship and it doesn't really matter (in English, anyway - I can't vouch for other languages ), so either would be correct.
My preference would be to say that the string matches the regular expression, since the RE is the invariant part and the string changes. But that's a personal preference and is unlikely to have any bearing on reality :-)
"The string matches the regular expression" seems to be shorthand for "the string is in the language defined by and isomorphic to the regular expression."
"The regular expression matches the string" seems to be shorthand for "a parser automaton compiled from the regular expression will parse the string and halt in a final state."
I'd say:
At design time a user/develper creates a regular expression that matches a string.
At run time a regular expression engine finds a string that matches the regular expression.
(Not intended to be a definition, just an example of common usage.)
Since a regular expression represents a possibly infinite set of finite strings, I would say that it is most correct to write that "string s matches regular expression r". You could also say that "string s is member of the set generated by regular expression r".
Also, you should consider using the words accept and reject, especially if you intend to discuss finite automata in your paper.