Struggling with regex logic: how do I remove a param from a url query string? - regex

I'm comparing 2 URL query strings to see if they're equal; however, I want to ignore a specific query parameter (always with a numeric value) if it exists. So, these 2 query strings should be equal:
firstName=bobby&lastName=tables&paramToIgnore=2
firstName=bobby&lastName=tables&paramToIgnore=5
So, I tried to use a regex replace using the REReplaceNoCase function:
REReplaceNoCase(myQueryString, "&paramToIgnore=[0-9]*", "")
This works fine for the above example. I apply the replace to both strings and then compare. The problem is that I can't be sure that the param will be the last one in the string... the following 2 query strings should also be equal:
firstName=bobby&lastName=tables&paramToIgnore=2
paramToIgnore=5&firstName=bobby&lastName=tables
So, I changed the regex to make the preceding ampersand optional... "&?paramToIgnore=[0-9]*". But - these strings will still not be equal as I'll be left with an extra ampersand in one of the strings but not the other:
firstName=bobby&lastName=tables
&firstName=bobby&lastName=tables
Similarly, I can't just remove preceding and following ampersands ("&?paramToIgnore=[0-9]*&?") as if the query param is in the middle of the string I'll strip one ampersand too many in one string and not the other - e.g.
firstName=bobby&lastName=tables&paramToIgnore=2
firstName=bobby&paramToIgnore=5&lastName=tables
will become
firstName=bobby&lastName=tables
firstName=bobbylastName=tables
I can't seem to get my head around the logic of this... Can anyone help me out with a solution?

If you can't be sure of the order the parameters appear i would recommend, that you don't compare them by the string itsself.
I recommend splitting the string up like this:
String stringA = "firstName=bobby&lastName=tables&paramToIgnore=2";
String stringB = "firstName=bobby&lastName=tables&paramToIgnore=5";
String[] partsA = stringA.split("&");
String[] partsB = stringB.split("&");
Then go through arrays and make the paramToIgnore somehow euqal:
for(int i = 0; i < partsA.length; i++)
{
if(partsA[i].startsWith("paramToIgnore"){
partsA[i] = "IgnoreMePlease";
}
}
for(int j = 0; j < partsB.length; j++)
{
if(partsB[i].startsWith("paramToIgnore"){
partsB[i] = "IgnoreMePlease";
}
}
Then you can sort and compare the arrays to see if they are equal:
Arrays.sort(partsA);
Arrays.sort(partsB);
boolean b = Arrays.equals(partsA, partsB);
I'm pretty sure it's possible to make this more compact and give it a better performance. But with comparing strings like you do, you somehow alsways have to care about the order of your parameters.

You can use the QueryStringDeleteVar UDF on cflib to remove the query string variables you want to ignore from both strings, then compare them.

Make it in two steps:
first remove your param, as you described in example
then remove ampersand which is left at the begining or the end of query with separate regex, or any double/triple/... ampersands in the middle of the query

How about having an 'or' in the RegEx to match an ampersand at the start or the end?
&paramToIgnore=[0-9]*|paramToIgnore=[0-9]*&
Seems to do the job when testing in regexpal.com

try changing it to:
REReplaceNoCase(myQueryString, "&?paramToIgnore=[0-9]+", "")
plus instead of star should capture 1 or more of the preceding matched characters. It won't match anything but 0-9 so if there is another parameter after that it'll stop when it can't match any more digits.
Alternatively, you could use:
REReplaceNoCase(myQueryString, "&?paramToIgnore=[^&]", "")
This will match anything but an ampersand. It will cover the case if the parameter exists but there is no value; which is probably something you'd want to account for.

Related

How to get items into array from string with comma separated values in type script and any item has comma it will be in double quotes

I've been struggling to get all items of below string into an array.
abc,"de,f",hi,"hello","te,st&" items into an array in Typescript.
If any string has comma (,) or ampersand (&) in it,It will be placed in double quotes.
Tried split function but it fails as my strings can have comma as well.
Any help in this regard is highly appreciated.
Thank you.
If you are looking to use a regular expression matching, can you try a different regEx that would match strings inside quotes first, then strings outside quotes, something like (\".+?\")|(^[^\"]+,)|(,[^\"]+,)
I don't know how relevant it would be in case of TypeScript, but I am guessing you'd be able to work something out that takes this Pattern and gives you the matches one by one
First of all, I think that you are making the things more complicated than what they are by implementing the following logic:
has comma (,) or ampersand (&) in it,It will be placed in double quotes.
Instead of doing this that way, you should systematically put your elements inside double quote:
abc,"de,f",hi,"hello","te,st&"
→
"abc","de,f","hi","hello","te,st&"
you will have then the following string to parse.
A regex like this one will do the job:
(?<=,")([^"]*)(?=",)|(?<=")([^"]*)(?=",)|(?<=")([^"]*)(?="$)
using back references $1$2$3, you can extract your elements.
RegEx /(?:^|,)(\"(?:[^\"])\"|[^,])/ has helped me get the required values.
var test = '"abc,123",test,123,456,"def:get"';
test.split(/(\"(?:[^\"])\"|[^,])/);
Its returning the below array.
["", ""abc,123"", ",", "test", ",", "123", ",", "456", ",", ""def:get"", ""]
And when a particular values in side double quotes,I just trimmed them to get the actual values and have ignore empty items of array..
use the split a string .....
let fullName = "First,Last"
let fullNameArr = fullName.characters.split{$0 == ","}.map(String.init)
fullNameArr[0] // First
fullNameArr[1] // Last

Find group of strings starting and ending by a character using regular expression

I have a string, and I want to extract, using regular expressions, groups of characters that are between the character : and the other character /.
typically, here is a string example I'm getting:
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
and so, I want to retrieved, 45.72643,4.91203 and also hereanotherdata
As they are both between characters : and /.
I tried with this syntax in a easier string where there is only 1 time the pattern,
[tt]=regexp(str,':(\w.*)/','match')
tt = ':45.72643,4.91203/'
but it works only if the pattern happens once. If I use it in string containing multiples times the pattern, I get all the string between the first : and the last /.
How can I mention that the pattern will occur multiple time, and how can I retrieve it?
Use lookaround and a lazy quantifier:
regexp(str, '(?<=:).+?(?=/)', 'match')
Example (Matlab R2016b):
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = regexp(str, '(?<=:).+?(?=/)', 'match')
result =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
In most languages this is hard to do with a single regexp. Ultimately you'll only ever get back the one string, and you want to get back multiple strings.
I've never used Matlab, so it may be possible in that language, but based on other languages, this is how I'd approach it...
I can't give you the exact code, but a search indicates that in Matlab there is a function called strsplit, example...
C = strsplit(data,':')
That should will break your original string up into an array of strings, using the ":" as the break point. You can then ignore the first array index (as it contains text before a ":"), loop the rest of the array and regexp to extract everything that comes before a "/".
So for instance...
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
Breaks down into an array with parts...
1 - 'abcd'
2 - '45.72643,4.91203/Rou'
3 - 'hereanotherdata/defgh'
Then Ignore 1, and extract everything before the "/" in 2 and 3.
As John Mawer and Adriaan mentioned, strsplit is a good place to start with. You can use it for both ':' and '/', but then you will not be able to determine where each of them started. If you do it with strsplit twice, you can know where the ':' starts :
A='abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
B=cellfun(#(x) strsplit(x,'/'),strsplit(A,':'),'uniformoutput',0);
Now B has cells that start with ':', and has two cells in each cell that contain '/' also. You can extract it with checking where B has more than one cell, and take the first of each of them:
C=cellfun(#(x) x{1},B(cellfun('length',B)>1),'uniformoutput',0)
C =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
Starting in 16b you can use extractBetween:
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = extractBetween(str,':','/')
result =
2×1 cell array
{'45.72643,4.91203'}
{'hereanotherdata' }
If all your text elements have the same number of delimiters this can be vectorized too.

Match return substring between two substrings using regexp

I have a list of records that are character vectors. Here's an example:
'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'
From these names I would like to extract whatever's between the two substrings 1mil_ and _ks_drivers_sorted.csv.
So in this case the output would be:
0,1_1_1_lb200
0_1_lb100
1_1_lb2_100_100
1_1_lb100
I'm using MATLAB so I thought to use regexp to do this, but I can't understand what kind of regular expression would be correct.
Or are there some other ways to do this without using regexp?
Let the data be:
x = {'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'};
You can use lookbehind and lookahead to find the two limiting substrings, and match everything in between:
result = cellfun(#(c) regexp(c, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match'), x);
Or, since the regular expression only produces one match, the following simpler alternative can be used (thanks #excaza for noticing):
result = regexp(x, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match', 'once');
In your example, either of the above gives
result =
4×1 cell array
'0,1_1_1_lb200'
'0_1_lb100'
'1_1_lb2_100_100'
'1_1_lb100'
For me the easy way to do this is just use espace or nothing to replace what you don't need in your string, and the rest is what you need.
If is a list, you can use a loop to do this.
Exemple to replace "1mil_" with "" and "_ks_drivers_sorted.csv" with ""
newChr = strrep(chr,'1mil_','')
newChr = strrep(chr,'_ks_drivers_sorted.csv','')

Part of as string from a string using regular expressions

I have a string of 5 characters out of which the first two characters should be in some list and next three should be in some other list.
How could i validate them with regular expressions?
Example:
List for First two characters {VBNET, CSNET, HTML)}
List for next three characters {BEGINNER, EXPERT, MEDIUM}
My Strings are going to be: VBBEG, CSBEG, etc.
My regular expression should find that the input string first two characters could be either VB, CS, HT and the rest should also be like that.
Would the following expression work for you in a more general case (so that you don't have hardcoded values): (^..)(.*$)
- returns the first two letters in the first group, and the remaining letters in the second group.
something like this:
^(VB|CS|HT)(BEG|EXP|MED)$
This recipe works for me:
^(VB|CS|HT)(BEG|EXP|MED)$
I guess (VB|CS|HT)(BEG|EXP|MED) should do it.
If your strings are as well-defined as this, you don't even need regex - simple string slicing would work.
For example, in Python we might say:
mystring = "HTEXP"
prefix = mystring[0:2]
suffix = mystring[2:5]
if (prefix in ['HT','CS','VB']) AND (suffix in ['BEG','MED','EXP']):
pass # valid!
else:
pass # not valid. :(
Don't use regex where elementary string operations will do.

Regex for Comma delimited list

What is the regular expression to validate a comma delimited list like this one:
12365, 45236, 458, 1, 99996332, ......
I suggest you to do in the following way:
(\d+)(,\s*\d+)*
which would work for a list containing 1 or more elements.
This regex extracts an element from a comma separated list, regardless of contents:
(.+?)(?:,|$)
If you just replace the comma with something else, it should work for any delimiter.
It depends a bit on your exact requirements. I'm assuming: all numbers, any length, numbers cannot have leading zeros nor contain commas or decimal points. individual numbers always separated by a comma then a space, and the last number does NOT have a comma and space after it. Any of these being wrong would simplify the solution.
([1-9][0-9]*,[ ])*[1-9][0-9]*
Here's how I built that mentally:
[0-9] any digit.
[1-9][0-9]* leading non-zero digit followed by any number of digits
[1-9][0-9]*, as above, followed by a comma
[1-9][0-9]*[ ] as above, followed by a space
([1-9][0-9]*[ ])* as above, repeated 0 or more times
([1-9][0-9]*[ ])*[1-9][0-9]* as above, with a final number that doesn't have a comma.
Match duplicate comma-delimited items:
(?<=,|^)([^,]*)(,\1)+(?=,|$)
Reference.
This regex can be used to split the values of a comma delimitted list. List elements may be quoted, unquoted or empty. Commas inside a pair of quotation marks are not matched.
,(?!(?<=(?:^|,)\s*"(?:[^"]|""|\\")*,)(?:[^"]|""|\\")*"\s*(?:,|$))
Reference.
/^\d+(?:, ?\d+)*$/
i used this for a list of items that had to be alphanumeric without underscores at the front of each item.
^(([0-9a-zA-Z][0-9a-zA-Z_]*)([,][0-9a-zA-Z][0-9a-zA-Z_]*)*)$
You might want to specify language just to be safe, but
(\d+, ?)+(\d+)?
ought to work
I had a slightly different requirement, to parse an encoded dictionary/hashtable with escaped commas, like this:
"1=This is something, 2=This is something,,with an escaped comma, 3=This is something else"
I think this is an elegant solution, with a trick that avoids a lot of regex complexity:
if (string.IsNullOrEmpty(encodedValues))
{
return null;
}
else
{
var retVal = new Dictionary<int, string>();
var reFields = new Regex(#"([0-9]+)\=(([A-Za-z0-9\s]|(,,))+),");
foreach (Match match in reFields.Matches(encodedValues + ","))
{
var id = match.Groups[1].Value;
var value = match.Groups[2].Value;
retVal[int.Parse(id)] = value.Replace(",,", ",");
}
return retVal;
}
I think it can be adapted to the original question with an expression like #"([0-9]+),\s?" and parse on Groups[0].
I hope it's helpful to somebody and thanks for the tips on getting it close to there, especially Asaph!
In JavaScript, use split to help out, and catch any negative digits as well:
'-1,2,-3'.match(/(-?\d+)(,\s*-?\d+)*/)[0].split(',');
// ["-1", "2", "-3"]
// may need trimming if digits are space-separated
The following will match any comma delimited word/digit/space combination
(((.)*,)*)(.)*
Why don't you work with groups:
^(\d+(, )?)+$
If you had a more complicated regex, i.e: for valid urls rather than just numbers. You could do the following where you loop through each element and test each of them individually against your regex:
const validRelativeUrlRegex = /^(^$|(?!.*(\W\W))\/[a-zA-Z0-9\/-]+[^\W_]$)/;
const relativeUrls = "/url1,/url-2,url3";
const startsWithComma = relativeUrls.startsWith(",");
const endsWithComma = relativeUrls.endsWith(",");
const areAllURLsValid = relativeUrls
.split(",")
.every(url => validRelativeUrlRegex.test(url));
const isValid = areAllURLsValid && !endsWithComma && !startsWithComma