Regex, ignore matches that might occur inside a string - regex

Say for example I have the test string:
this is text "This is a quote { containing } some characters" blah blah { inside }
I would like to match every pair of curly brackets and the text in between using the expression
\{[^{]*?\}
but ignore any matches that might occur inside of a string, namely the { containing } portion of the string, or even be able to match only { text } of the following test string
more text "text text { { { } " { text } words

Well this works:
{[^}]*}(?=(?:[^"]*"[^"]*"[^"]*)*$)
But I'm not sure that it's bullet proof. You can view it online:
{[^}]*} get the curly content
(?=(?:[^"]*"[^"]*"[^"]*)*$) ensure that it's followed by an even number of ".
Note: This regex doesn't take account of escaped double quotes.

Related

How can I replace all words in a string except one

So, I would like to change all words in a string except one, that stays in the middle.
#include <boost/algorithm/string/replace.hpp>
int main()
{
string test = "You want to join player group";
string find = "You want to join group";
string replace = "This is a test about group";
boost::replace_all(test, find, replace);
cout << test << endl;
}
The output was expected to be:
This is a test about player group
But it doesn't work, the output is:
You want to join player group
The problem is on finding out the words, since they are a unique string.
There's a function that reads all words, no matter their position and just change what I want?
EDIT2:
This is the best example of what I want to happen:
char* a = "This is MYYYYYYYYY line in the void Translate"; // This is the main line
char* b = "This is line in the void Translate"; // This is what needs to be find in the main line
char* c = "Testing - is line twatawtn thdwae voiwd Transwlate"; // This needs to replace ALL the words in the char* b, perserving the MYYYYYYYYY
// The output is expected to be:
Testing - is MYYYYYYYY is line twatawtn thdwae voiwd Transwlate
You need to invert your thinking here. Instead of matching "All words but one", you need to try to match that one word so you can extract it and insert it elsewhere.
We can do this with Regular Expressions, which became standardized in C++11:
std::string test = "You want to join player group";
static const std::regex find{R"(You want to join (\S+) group)"};
std::smatch search_result;
if (!std::regex_search(test, search_result, find))
{
std::cerr << "Could not match the string\n";
exit(1);
}
else
{
std::string found_group_name = search_result[1];
auto replace = boost::format("This is a test about %1% group") % found_group_name;
std::cout << replace;
}
Live Demo
To match the word "player" I used a pretty simply regular expression (\S+) which means "match one or more non-whitespace characters (greedily) and put that into a group"
"Groups" in regular expressions are enclosed by parentheses. The 0th group is always the entire match, and since we only have one set of parentheses, your word is therefore in group 1, hence the resulting access of the match result at search_result[1].
To create the regular expression, you'll notice I used the perhaps-unfamiliar string literal syntaxR"(...)". This is called a raw string literal and was also standardized in C++11. It was basically made for describing regular expressions without needing to escape backslashes. If you've used Python, it's the same as r'...'. If you've used C#, it's the same as #"..."
I threw in some boost::format to print the result because you were using Boost in the question and I thought you'd like to have some fun with it :-)
In your example, find is not a substring of test, so boost::replace_all(test, find, replace); has no effect.
Removing group from find and replace solves it:
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main()
{
std::string test = "You want to join player group";
std::string find = "You want to join";
std::string replace = "This is a test about";
boost::replace_all(test, find, replace);
std::cout << test << std::endl;
}
Output: This is a test about player group.
In this case, there is just one replace of the beginning of the string because the end of the string is already the right one. You could have another call of replace_all to change the end if needed.
Some other options:
one is in the other answer.
split the strings into a vector (or array) of words, then insert the desired word (player) at the right spot of the replace vector, then build your output string from it.

how can extract the name from a line

Assume that I have a line from a file that I want to read:
>NZ_FNBK01000055.1 Halorientalis regularis
So how can extract the name from that line that begins with a greater than sign; everything following the greater-than sign (and excluding the newline at the end of the line) is the name.
The name should be:
NZ_FNBK01000055.1 Halorientalis regularis
Here is my code so far:
bool file::load(istream& file)
{
string line;
while(getline(genomeSource, line)){
if(line.find(">") != string::npos)
{
m_name =
}
}
return true;
}
You could easily handle both conditions using regular expressions. c++ introduced <regex> in c++11. Using this and a regex like:
>.*? (.*?) .*$
> Get the literal character
.*? Non greedy search for anything stopping at a space
(.*?) Non greedy search sor anything stopping at a space but grouping the characters before hand.
.*$ Greedy search until the end of the string.
With this you can easily check if this line meets your criteria and get the name at the same time. Here is a test showing it working. For the code, the c++11 regex lib is very simple:
std::string s = ">NZ_FNBK01000055.1 Halorientalis regularis ";
std::regex rgx(">.*? (.*?) .*$"); // Make the regex
std::smatch matches;
if(std::regex_search(s, matches, rgx)) { // Do a search
if (matches.size() > 1) { // If there are matches, print them.
std::cout << "The name is " << matches[1].str() << "\n";
}
}
Here is a live example.

regex for at least one character

How to check if string contains at least one character? I want to eliminate strings where are only special characters, so I've decided that the easiest way is to check if there is at least one character or digit, so I've created [a-zA-Z0-9]{1,} and [a-zA-Z0-9]+ but none of these work.
boost::regex noSpecialCharacters("[a-zA-Z0-9]+");
boost::regex noSpecialCharacters2("[a-zA-Z0-9]{1,}");
string tab[SIZE] = {"father", "apple is red"};
for (int i = 0; i < SIZE; i++) {
if (!boost::regex_match(tab[i], noSpecialCharacters)) {
puts("This is it!");
} else {
puts("or not");
}
if (!boost::regex_match(tab[i], noSpecialCharacters2)) {
puts("This is it!");
} else {
puts("or not");
}
}
for "apple is red" the answer is correct but for "father" it doesn't work.
apple is red won't match because, as per here (my bold):
Note that the result is true only if the expression matches the whole of the input sequence.
That means the spaces make it invalid. It then goes on to say (again, my bold):
If you want to search for an expression somewhere within the sequence then use regex_search.
If all you're looking for is one valid character somewhere in there, you can just use regex_match() with ".*[a-zA-Z0-9].*" or regex_search() with "[a-zA-Z0-9]".

notepad++ regular expression remove all text between curly brackets

function get_last_word($sentance){
$wordArr = explode(' ', $sentance);
$last_word = trim($wordArr[count($wordArr) - 1]);
runDebug( __FILE__, __FUNCTION__, __LINE__, "Sentance: $sentance. Last word:$last_word",4);
return $last_word;
}
i want to remove all text between {}
result should be:
function get_last_word($sentance){}
i have tried
{+.*}
and its working only when curly brackets are on same line
Newer version of Notepad++ supports multi-line matching (I am now using 6.1.3)
In the Find/Replace dialog, next to the "Regular Expression" radio button, there is a checkbox called ". matches newline" which means multi-line matching.
Then, use \{.*?\} (which is a ungreedy match) to achieve what you want.
Beware that it does not match braces for you. For example
foo {
bar {
blabalbla
}
xxx {
yyy
}
}
will give you
foo {}
xxx {
yyy
}
}
(I believe there are other questions in SO about brace matching in regex, you may have a look, though I wonder if they will work in notepad++)
You should be fine when you just replace \{[^{}]+\} with {}, repeatedly...
Try
(?<=\{)[^}]+(?=\})
this will match anything that falls between { and }

Want to Encode text during Regex.Replace call

I have a regex call that I need help with.
I haven't posted my regex, because it is not relevant here.
What I want to be able to do is, during the Replace, I also want to modify the ${test} portion by doing a Html.Encode on the entire text that is effecting the regex.
Basically, wrap the entire text that is within the range of the regex with the bold tag, but also Html.Encode the text inbetween the bold tag.
RegexOptions regexOptions = RegexOptions.Compiled | RegexOptions.IgnoreCase;
text = Regex.Replace(text, regexBold, #"<b>${text}</b>", regexOptions);
There is an incredibly easy way of doing this (in .net). Its called a MatchEvaluator and it lets you do all sorts of cool find and replace. Essentially you just feed the Regex.Replace method the method name of a method that returns a string and takes in a Match object as its only parameter. Do whatever makes sense for your particular match (html encode) and the string you return will replace the entire text of the match in the input string.
Example: Lets say you wanted to find all the places where there are two numbers being added (in text) and you want to replace the expression with the actual number. You can't do that with a strict regex approach, but you can when you throw in a MatchEvaluator it becomes easy.
public void Stuff()
{
string pattern = #"(?<firstNumber>\d+)\s*(?<operator>[*+-/])\s*(?<secondNumber>\d+)";
string input = "something something 123 + 456 blah blah 100 - 55";
string output = Regex.Replace(input, pattern, MatchMath);
//output will be "something something 579 blah blah 45"
}
private static string MatchMath(Match match)
{
try
{
double first = double.Parse(match.Groups["firstNumber"].Value);
double second = double.Parse(match.Groups["secondNumber"].Value);
switch (match.Groups["operator"].Value)
{
case "*":
return (first * second).ToString();
case "+":
return (first + second).ToString();
case "-":
return (first - second).ToString();
case "/":
return (first / second).ToString();
}
}
catch { }
return "NaN";
}
Find out more at http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchevaluator.aspx
Don't use Regex.Replace in this case... use..
foreach(Match in Regex.Matches(...))
{
//do your stuff here
}
Heres an implementation of this I've used to pick out special replace strings from content and localize them.
protected string FindAndTranslateIn(string content)
{
return Regex.Replace(content, #"\{\^(.+?);(.+?)?}", new MatchEvaluator(TranslateHandler), RegexOptions.IgnoreCase);
}
public string TranslateHandler(Match m)
{
if (m.Success)
{
string key = m.Groups[1].Value;
key = FindAndTranslateIn(key);
string def = string.Empty;
if (m.Groups.Count > 2)
{
def = m.Groups[2].Value;
if(def.Length > 1)
{
def = FindAndTranslateIn(def);
}
}
if (group == null)
{
return Translate(key, def);
}
else
{
return Translate(key, group, def);
}
}
return string.Empty;
}
From the match evaluator delegate you return everything you want replaced, so where I have returns you would have bold tags and an encode call, mine also supports recursion, so a little over complicated for your needs, but you can just pare down the example for your needs.
This is equivalent to doing an iteration over the collection of matches and doing parts of the replace methods job. It just saves you some code, and you get to use a fancy shmancy delegate.
If you do a Regex.Match, the resulting match objects group at the 0th index, is the subset of the intput that matched the regex.
you can use this to stitch in the bold tags and encode it there.
Can you fill in the code inside {} to add the bold tag, and encode the text?
I'm confused as to how to apply the changes to the entire text block AND replace the section in the text variable at the end.