how to replace all instances of a sub string in a string - regex

I'm trying to work with RegEx to split a large string into smaller sections, and as part of this I'm trying to replace all instances of a substring in this larger string. I've been trying to use the replace function but this only replaces the first instance of the substring. How can I replace al instances of the substring within the larger string?
Thanks
Stephen

adding 'g' to searchExp. e.g. /i_want_to_be_replaced/g

One fast way is use split and join:
function quickReplace(source:String, oldString:String, newString:String):String
{
return source.split(oldString).join(newString);
}

In addition to #Alex's answer, you might also find this answer handy,
using String's replace() method.
here's a snippet:
function addLinks(pattern:RegExp,text:String):String{
var result = '';
while(pattern.test(text)) result = text.replace(pattern, "<font color=\"#0000dd\">$&</font>");
if(result == '') result+= text;//if there was nothing to replace
return result;
}

Related

Use IndexOf and Substring to catch a string AFTER IndexOf

I just came across something I would ask you for other possible solutions.
I have a string:
string text = "This is a very serious sample text, not a joke!"
Now I would like to find the position of the word "serious" and get the rest of the string AFTER "serious".
One way I would solve this is:
$text="This is a very serious sample text, not a joke!"
$start=($text).IndexOf("serious")
(($text).Substring($start+"serious".Length)).TrimStart()
I am sure there is a regex solution for this as well, but I was wondering if I can use IndexOf() and then Substring to get the rest of the string AFTER "serious".
I was also looking into this post here: Annoying String Substring & IndexOf but either it is not the solution/question I am looking for or I didnt understand...
Thanks for your help in advance, Adis
Since one of SubString's overloads takes only the starting index, first find where serious (note the trailing space) is and then pick substring from that point plus length of what was searched for.
By putting the search term into a variable, one can access its length as a property. Changing the search term would be easy too, as it requires just updating the variable value instead of doing search and replace for string values.
Like so,
$searchTerm = "serious "
$start = $text.IndexOf($searchTerm)
$text.Substring($start + $searchTerm.length)
sample text, not a joke!
As for a simple regex, use -replace and pattern ^.*serious . That would match begin of string ^ then anything .* followed by seroius . Replacing that with an empty string removes the matched start of string. Like so,
"This is a very serious sample text, not a joke!" -replace '^.*serious ', ''
sample text, not a joke!
There might be cases in which Extension Methods would be straight-forward solution. Those allow adding new methods to existing .Net classes. The usual solution would be inheriting, but since string is sealed, that's not allowed. So, extension methods are the way to go. One case could be creating a method, say, IndexEndOf that'll return where search term ends.
Adding .Net code (C# in this case) is easy enough. Sample code is adapted from another answer. The IndexEndOf method does the arithmetic and returns index where the pattern ended at. Like so,
$code=#'
public class ExtendedString {
public string s_ {get; set;}
public ExtendedString(string theString){
s_ = theString;
}
public int IndexEndOf(string pattern)
{
return s_.IndexOf(pattern) + pattern.Length;
}
public static implicit operator ExtendedString(string value){
return new ExtendedString(value);
}
}
'#
add-type -TypeDefinition $code
$text = "This is a very serious sample text, not a joke!"
$searchTerm = "serious "
$text.Substring(([ExtendedString]$text).IndexEndOf($searchTerm))
sample text, not a joke!

Match return substring between two substrings using regexp

I have a list of records that are character vectors. Here's an example:
'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'
From these names I would like to extract whatever's between the two substrings 1mil_ and _ks_drivers_sorted.csv.
So in this case the output would be:
0,1_1_1_lb200
0_1_lb100
1_1_lb2_100_100
1_1_lb100
I'm using MATLAB so I thought to use regexp to do this, but I can't understand what kind of regular expression would be correct.
Or are there some other ways to do this without using regexp?
Let the data be:
x = {'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'};
You can use lookbehind and lookahead to find the two limiting substrings, and match everything in between:
result = cellfun(#(c) regexp(c, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match'), x);
Or, since the regular expression only produces one match, the following simpler alternative can be used (thanks #excaza for noticing):
result = regexp(x, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match', 'once');
In your example, either of the above gives
result =
4×1 cell array
'0,1_1_1_lb200'
'0_1_lb100'
'1_1_lb2_100_100'
'1_1_lb100'
For me the easy way to do this is just use espace or nothing to replace what you don't need in your string, and the rest is what you need.
If is a list, you can use a loop to do this.
Exemple to replace "1mil_" with "" and "_ks_drivers_sorted.csv" with ""
newChr = strrep(chr,'1mil_','')
newChr = strrep(chr,'_ks_drivers_sorted.csv','')

How to replace parts of a string in lua "in a single pass"?

I have the following string of anchors (where I want to change the contents of the href) and a lua table of replacements, which tells which word should be replaced for:
s1 = '<a href="word7">'
replacementTable = {}
replacementTable["word1"] = "potato1"
replacementTable["word2"] = "potato2"
replacementTable["word3"] = "potato3"
replacementTable["word4"] = "potato4"
replacementTable["word5"] = "potato5"
The expected result should be:
<a href="word7">
I know I could do this iterating for each element in the replacementTable and process the string each time, but my gut feeling tells me that if by any chance the string is very big and/or the replacement table becomes big, this apporach is going to perform poorly.
So I though it could be best if I could do the following: apply the regular expression for finding all the matches, get an iterator for each match and replace each match for its value in the replacementTable.
Something like this would be great (writing it in Javascript because I don't know yet how to write lambdas in Lua):
var newString = patternReplacement(s1, '<a[^>]* href="([^"]*)"', function(match) { return replacementTable[match] })
Where the first parameter is the string, the second one the regular expression and the third one a function that is executed for each match to get the replacement. This way I think s1 gets parsed once, being more efficient.
Is there any way to do this in Lua?
In your example, this simple code works:
print((s1:gsub("%w+",replacementTable)))
The point is that gsub already accepts a table of replacements.
In the end, the solution that worked for me was the following one:
local updatedBody = string.gsub(body, '(<a[^>]* href=")(/[^"%?]*)([^"]*")', function(leftSide, url, rightSide)
local replacedUrl = url
if (urlsToReplace[url]) then replacedUrl = urlsToReplace[url] end
return leftSide .. replacedUrl .. rightSide
end)
It kept out any querystring parameter giving me just the URI. I know it's a bad idea to parse HTML bodies with regular expressions but for my case, where I required a lot of performance, this was performing a lot faster and just did the job.

Check If first character is "+"

How can i detect string is start with "+"
I tried
^\s*?\+.*$
but no help.
P.s: I have only one line alltime.
You don't need \s*?, you have to use:
^\+
or...
^[+]
In case you want to check a complete string, you can use:
^\+.*$
Working demo
Without regex, you can also use native method startsWith().
So it would be:
var str1 = '+some text';
var bool = str1.startsWith('+'); //true
^\+.*$ should work for your purposes.
Here's a fiddle with a couple test strings : https://regex101.com/r/nP2eL7/1
Here's an optional (and optimal) solution in the case that the first character of your string happens to be either a + or - and you don't want the proceeding number to have any leading zeros:
/(?<=^\+|-|^)[1-9]\d*/

Struggling with regex logic: how do I remove a param from a url query string?

I'm comparing 2 URL query strings to see if they're equal; however, I want to ignore a specific query parameter (always with a numeric value) if it exists. So, these 2 query strings should be equal:
firstName=bobby&lastName=tables&paramToIgnore=2
firstName=bobby&lastName=tables&paramToIgnore=5
So, I tried to use a regex replace using the REReplaceNoCase function:
REReplaceNoCase(myQueryString, "&paramToIgnore=[0-9]*", "")
This works fine for the above example. I apply the replace to both strings and then compare. The problem is that I can't be sure that the param will be the last one in the string... the following 2 query strings should also be equal:
firstName=bobby&lastName=tables&paramToIgnore=2
paramToIgnore=5&firstName=bobby&lastName=tables
So, I changed the regex to make the preceding ampersand optional... "&?paramToIgnore=[0-9]*". But - these strings will still not be equal as I'll be left with an extra ampersand in one of the strings but not the other:
firstName=bobby&lastName=tables
&firstName=bobby&lastName=tables
Similarly, I can't just remove preceding and following ampersands ("&?paramToIgnore=[0-9]*&?") as if the query param is in the middle of the string I'll strip one ampersand too many in one string and not the other - e.g.
firstName=bobby&lastName=tables&paramToIgnore=2
firstName=bobby&paramToIgnore=5&lastName=tables
will become
firstName=bobby&lastName=tables
firstName=bobbylastName=tables
I can't seem to get my head around the logic of this... Can anyone help me out with a solution?
If you can't be sure of the order the parameters appear i would recommend, that you don't compare them by the string itsself.
I recommend splitting the string up like this:
String stringA = "firstName=bobby&lastName=tables&paramToIgnore=2";
String stringB = "firstName=bobby&lastName=tables&paramToIgnore=5";
String[] partsA = stringA.split("&");
String[] partsB = stringB.split("&");
Then go through arrays and make the paramToIgnore somehow euqal:
for(int i = 0; i < partsA.length; i++)
{
if(partsA[i].startsWith("paramToIgnore"){
partsA[i] = "IgnoreMePlease";
}
}
for(int j = 0; j < partsB.length; j++)
{
if(partsB[i].startsWith("paramToIgnore"){
partsB[i] = "IgnoreMePlease";
}
}
Then you can sort and compare the arrays to see if they are equal:
Arrays.sort(partsA);
Arrays.sort(partsB);
boolean b = Arrays.equals(partsA, partsB);
I'm pretty sure it's possible to make this more compact and give it a better performance. But with comparing strings like you do, you somehow alsways have to care about the order of your parameters.
You can use the QueryStringDeleteVar UDF on cflib to remove the query string variables you want to ignore from both strings, then compare them.
Make it in two steps:
first remove your param, as you described in example
then remove ampersand which is left at the begining or the end of query with separate regex, or any double/triple/... ampersands in the middle of the query
How about having an 'or' in the RegEx to match an ampersand at the start or the end?
&paramToIgnore=[0-9]*|paramToIgnore=[0-9]*&
Seems to do the job when testing in regexpal.com
try changing it to:
REReplaceNoCase(myQueryString, "&?paramToIgnore=[0-9]+", "")
plus instead of star should capture 1 or more of the preceding matched characters. It won't match anything but 0-9 so if there is another parameter after that it'll stop when it can't match any more digits.
Alternatively, you could use:
REReplaceNoCase(myQueryString, "&?paramToIgnore=[^&]", "")
This will match anything but an ampersand. It will cover the case if the parameter exists but there is no value; which is probably something you'd want to account for.