Stringbuilder replace() - count how many replacements were made? - replace

I'm using C# stringBuilder to replace string with the best performance, the the replace is never finish without informing the user how many replacements were made, but the Replace() method only return the stringBuilder instance, also I can't find any method in stringBuilder helping to count the replacements.
So is there anyway to find out how many replacements were made?
Thank for reading :)

Just do a count for string before running the replace. You'd have to introduce your own extension method or LINQ query.
Example extension method:
public static int OccurencesOf(this string str, string val)
{
int num_occurrences = 0;
int num_startingIndex = 0;
while ((num_startingIndex = str.IndexOf(val, num_startingIndex)) >= 0)
{
++num_occurrences;
++num_startingIndex;
}
return num_occurrences;
}

Related

Extract all allowed characters from a regular expression

I need to extract a list of all allowed characters from a given regular expression.
So for example, if the regex looks like this (some random example):
[A-Z]*\s+(4|5)+
the output should be
ABCDEFGHIJKLMNOPQRSTUVWXYZ45
(omitting the whitespace)
One obvious solution would be to define a complete set of allowed characters, and use a find method, to return the corresponding subsequence for each character. This seems to be a bit of a dull solution though.
Can anyone think of a (possibly simple) algorithm on how to implement this?
One thing you can do is:
split the regex by subgroup
test the char panel against the subgroup
See the following example (not perfect yet) c#:
static void Main(String[] args)
{
Console.WriteLine($"-->{TestRegex(#"[A-Z]*\s+(4|5)+")}<--");
}
public static string TestRegex(string pattern)
{
string result = "";
foreach (var subPattern in Regex.Split(pattern, #"[*+]"))
{
if(string.IsNullOrWhiteSpace(subPattern))
continue;
result += GetAllCharCoveredByRegex(subPattern);
}
return result;
}
public static string GetAllCharCoveredByRegex(string pattern)
{
Console.WriteLine($"Testing {pattern}");
var regex = new Regex(pattern);
var matches = new List<char>();
for (var c = char.MinValue; c < char.MaxValue; c++)
{
if (regex.IsMatch(c.ToString()))
{
matches.Add(c);
}
}
return string.Join("", matches);
}
Which outputs:
Testing [A-Z]
Testing \s
Testing (4|5)
-->ABCDEFGHIJKLMNOPQRSTUVWXYZ
? ? ???????? 45<--

replace the occurrence of character with a seq number using boost

How can i replace multiple occurrences of a character with a string containing the occurrence number.
e.g if i have the following expression.
insert into emp values(?,?,?)
I want the following converted string.
insert into emp values(_p_1,_p_2,_p_3)
I am trying this using the boost regular expression.
Can anyone tell me how to achieve this using the boost c++ (with no or minimum iteration).
currently I am using the following approach:
std::wstring q=L"insert into emp values(?,?,?)";
auto loc = q.find(L"?");
auto len = wcslen(L"?");
auto c=1;
while(loc != std::wstring::npos)
{
q.replace(loc, len , L"_p_"+to_wstring(c));
c++;
loc = q.find(L"?");
}
cout<<q.c_str();
Please suggest better and efficient approaches.
I'd just forget regular expressions and trying to do this simple thing with Boost.
It's like asking, "how do I add 1 to a variable using Boost regular expressions"?
Best answer, IMHO, is to instead just use ++ for the task of adding 1, and to use a loop to replace special characters with strings.
string const query_format = "insert into emp values(?,?,?)";
string const params[] = {"_p_1", "_p_2", "_p3"};
string query;
string const* p = params;
for( char const c : query_format )
{
if( c == '?' ) { query += *p++; } else { query += c; }
}
// Use `query`
One might choose to wrap this up as a replace function.
Disclaimer: code not touched by compiler.
If you control the query_format string, why not instead make the placeholders compatible with Boost format.
Re the parenthetical requirement
” with no or minimum iteration
there's iteration involved no matter how you do this. You can hide the iteration behind a function name, but that's all. It's logically impossible to actually avoid the iteration, and it's trivial (completely trivial) to hide it behind a function name.

SSN masking using the regular expression

I am trying to mask the SSN which is in "123-12-1234" to "XXX-XX-1234". I am able achieve using the below code.
string input = " 123-12-1234 123-11-1235 ";
Match m = Regex.Match(input, #"((?:\d{3})-(?:\d{2})-(?<token>\d{4}))");
while (m.Success)
{
if (m.Groups["token"].Length > 0)
{
input = input.Replace(m.Groups[0].Value,"XXX-XX-"+ m.Groups["token"].Value);
}
m = m.NextMatch();
}
Is there a better way to do it in one line using the Regex.Replace method.
You can try the following:
string input = " 123-12-1234 123-11-1235";
string pattern = #"(?:\d{3})-(?:\d{2})-(\d{4})";
string result = Regex.Replace(input, pattern, "XXX-XX-$1");
Console.WriteLine(result); // XXX-XX-1234 XXX-XX-1235
If your are going to be doing a lot of masking you should consider a few whether to use compiled regular expression or not.
Using them will cause a slight delay when the application is first run, but they will run faster subsequently.
Also the choice of static vs instances of the Regex should be considered.
I found the following to be the most efficient
public class SSNFormatter
{
private const string IncomingFormat = #"^(\d{3})-(\d{2})-(\d{4})$";
private const string OutgoingFormat = "xxxx-xx-$3";
readonly Regex regexCompiled = new Regex(IncomingFormat, RegexOptions.Compiled);
public string SSNMask(string ssnInput)
{
var result = regexCompiled.Replace(ssnInput, OutgoingFormat);
return result;
}
}
There is a comparison of six methods for regex checking/masking here.

Comparing a string with a regEx wildcard value

So I need to check a string (url) against a list of reg ex wildcard values to see if there is a match. I will be intercepting an HTTP request and checking it against a list of pre-configured values and if there is a match, do something to the URL. Examples:
Request URL: http://www.stackoverflow.com
Wildcards: *.stackoverflow.com/
*.stack*.com/
www.stackoverflow.*
Are there any good libraries for C++ for doing this? Any good examples would be great. Pseudo-code that I have looks something like:
std::string requestUrl = "http://www.stackoverflow.com";
std::vector<string> urlWildcards = ...;
BOOST_FOREACH(string wildcard, urlWildcards) {
if (requestUrl matches wildcard) {
// Do something
} else {
// Do nothing
}
}
Thanks a lot.
The following code example uses regular expressions to look for exact substring matches. The search is performed by the static IsMatch method, which takes two strings as input. The first is the string to be searched, and the second is the pattern to be searched for.
#using <System.dll>
using namespace System;
using namespace System::Text::RegularExpressions;
int main()
{
array<String^>^ sentence =
{
"cow over the moon",
"Betsy the Cow",
"cowering in the corner",
"no match here"
};
String^ matchStr = "cow";
for (int i=0; i<sentence->Length; i++)
{
Console::Write( "{0,24}", sentence[i] );
if ( Regex::IsMatch( sentence[i], matchStr,
RegexOptions::IgnoreCase ) )
Console::WriteLine(" (match for '{0}' found)", matchStr);
else
Console::WriteLine("");
}
return 0;
}
}
Code from MSDN (http://msdn.microsoft.com/en-us/library/zcwwszd7(v=vs.80).aspx).
If you use VS 2010, consider use the regex introduced by c++ tr1.
Refer to following page for more details.
http://www.johndcook.com/cpp_regex.html

Regex Rejecting matches because of Instr

What's the easiest way to do an "instring" type function with a regex? For example, how could I reject a whole string because of the presence of a single character such as :? For example:
this - okay
there:is - not okay because of :
More practically, how can I match the following string:
//foo/bar/baz[1]/ns:foo2/#attr/text()
For any node test on the xpath that doesn't include a namespace?
(/)?(/)([^:/]+)
Will match the node tests but includes the namespace prefix which makes it faulty.
I'm still not sure whether you just wanted to detect if the Xpath contains a namespace, or whether you want to remove the references to the namespace. So here's some sample code (in C#) that does both.
class Program
{
static void Main(string[] args)
{
string withNamespace = #"//foo/ns2:bar/baz[1]/ns:foo2/#attr/text()";
string withoutNamespace = #"//foo/bar/baz[1]/foo2/#attr/text()";
ShowStuff(withNamespace);
ShowStuff(withoutNamespace);
}
static void ShowStuff(string input)
{
Console.WriteLine("'{0}' does {1}contain namespaces", input, ContainsNamespace(input) ? "" : "not ");
Console.WriteLine("'{0}' without namespaces is '{1}'", input, StripNamespaces(input));
}
static bool ContainsNamespace(string input)
{
// a namspace must start with a character, but can have characters and numbers
// from that point on.
return Regex.IsMatch(input, #"/?\w[\w\d]+:\w[\w\d]+/?");
}
static string StripNamespaces(string input)
{
return Regex.Replace(input, #"(/?)\w[\w\d]+:(\w[\w\d]+)(/?)", "$1$2$3");
}
}
Hope that helps! Good luck.
Match on :? I think the question isn't clear enough, because the answer is so obvious:
if(Regex.Match(":", input)) // reject
You might want \w which is a "word" character. From javadocs, it is defined as [a-zA-Z_0-9], so if you don't want underscores either, that may not work....
I dont know regex syntax very well but could you not do:
[any alpha numeric]\*:[any alphanumeric]\*
I think something like that should work no?
Yeah, my question was not very clear. Here's a solution but rather than a single pass with a regex, I use a split and perform iteration. It works as well but isn't as elegant:
string xpath = "//foo/bar/baz[1]/ns:foo2/#attr/text()";
string[] nodetests = xpath.Split( new char[] { '/' } );
for (int i = 0; i < nodetests.Length; i++)
{
if (nodetests[i].Length > 0 && Regex.IsMatch( nodetests[i], #"^(\w|\[|\])+$" ))
{
// does not have a ":", we can manipulate it.
}
}
xpath = String.Join( "/", nodetests );