Regex, search for prefix, excluding suffix [duplicate] - regex

How do I put a regular expression to check if a string starts with certain pattern and is NOT ending with certain pattern.
Example:
Must StartsWith: "US.INR.USD.CONV"
Should not end with: ".VALUE"
Passes Regex: "US.INR.USD.CONV.ABC.DEF.FACTOR"
Fails Regex Check: "US.INR.USD.CONV.ABC.DEF.VALUE"
I am using C#.

You can use this regex based on negative lookahead:
^US\.INR\.USD\.CONV(?!.*?\.VALUE$).*$
RegEx Demo
Explanation:
^US\.INR\.USD\.CONV - Match US.INR.USD.CONV at start of input
(?!.*?\.VALUE$) - Negative lookahead to make sure line is not ending with .value

^US\.INR\.USD\.CONV.*(?<!\.VALUE)$
Try this.See demo.
https://regex101.com/r/fA6wE2/26
Just use a negative lookbehind to make .VALUE is not before $ or end of string.
(?<!\.VALUE)$ ==>Makes sure regex engine looks behind and checks if `.VALUE` is not there when it reaches the end of string.

You don't need regular expressions for that. You can just use String.StartsWith and String.EndsWith
if(val.StartsWith("US.INR.USD.CONV") && !val.EndsWith(".VALUE"))
{
// valid
}
And as you mention in your comment to anubhava's answer you can do this to check for ".PERCENT" at the end as well.
if(val.StartsWith("US.INR.USD.CONV") &&
!val.EndsWith(".VALUE") &&
!val.EndsWith(".PERCENT"))
{
// valid
}
IMHO this makes the code much more readable and will almost definitely perform faster as well.

Related

How to test url string using regular expression

Below is my code
/config\/info\/newplan/.test(string)
which will return true when find /config/info/newplan/ in string.
However, I would like to test different condition in the same time like below
/config\/info\/newplan/.test(string) || /config\/info\/oldplan/.test(string) || /config\/info\/specplan/.test(string)
which will return true if the string end up with either "newplan" or "oldplan" or "specplan"
My question is how to make a better code and not write "/config/\info/\xxxx\ so many times?
Use an alternation group:
/config\/info\/(?:new|old|spec)plan/.test(string)
^^^^^^^^^^^^^^^
See the regex demo.
Pattern details:
config\/info\/ - a literal config/info/ substring
(?:new|old|spec) - a non-capturing group (where | separates alternatives) matching any one of the substrings: new, old or spec
plan - a literal plan substring
this would be your bet
config\/info\/(newplan|oldplan|specplan)\/
OR
config\/info\/(newplan|oldplan|specplan)\/.test(string)
please see the example at [https://regex101.com/r/NyP1HP/1] as it doesn't allow other possibilities like following
/config/info/new1plan/
/config/info/newoldplan/
/config/info/specplan1/

Get all matches for a certain pattern using RegEx

I am not really a RegEx expert and hence asking a simple question.
I have a few parameters that I need to use which are in a particular pattern
For example
$$DATA_START_TIME
$$DATA_END_TIME
$$MIN_POID_ID_DLAY
$$MAX_POID_ID_DLAY
$$MIN_POID_ID_RELTM
$$MAX_POID_ID_RELTM
And these will be replaced at runtime in a string with their values (a SQL statement).
For example I have a simple query
select * from asdf where asdf.starttime = $$DATA_START_TIME and asdf.endtime = $$DATA_END_TIME
Now when I try to use the RegEx pattern
\$\$[^\W+]\w+$
I do not get all the matches(I get only a the last match).
I am trying to test my usage here https://regex101.com/r/xR9dG0/2
If someone could correct my mistake, I would really appreciate it.
Thanks!
This will do the job:
\$\$\w+/g
See Demo
Just Some clarifications why your regex is doing what is doing:
\$\$[^\W+]\w+$
Unescaped $ char means end of string, so, your pattern is matching something that must be on the end of the string, that's why its getting only the last match.
This group [^\W+] doesn't really makes sense, groups starting with [^..] means negate the chars inside here, and \W is the negation of words, and + inside the group means literally the char +, so you are saying match everything that is Not a Not word and that is not a + sign, i guess that was not what you wanted.
To match the next word just \w+ will do it. And the global modifier /g ensures that you will not stop on the first match.
This should work - Based on what you said you wanted to match this should work . Also it won't match $$lower_case_strings if that's what you wanted. If not, add the "i" flag also.
\${2}[A-Z_]+/g

Regular expression for validating complicated username

So, the conditions are:
At least 1 character, max 20 characters
Starts with [a-zA-Z]
Contains [a-zA-Z0-9.-]
Ends with [a-zA-Z0-9]
My expression is:
^(?=[a-zA-Z])+[a-zA-Z0-9.-]*[a-zA-Z0-9]{1,20}$
It works nicely. However, it doesn't work properly with a username's length. I can enter a thirty-character username and still find a match. What's wrong with it?
I tend to find complicated regexps a poor choice when wanting to validate a string against multiple rules. They cause unreadable code that's difficult to maintain.
How about (in pseudocode)
.length >= 1 && .length <= 20
&& /^[a-z0-9.-]+$/i
&& /^[a-z]/i
&& /[a-z0-9]$/i
i.e. check the length, then check the legal character validity, then check the opening and closing characters, exactly as described in your question text.
You could also combine the first two lines so that you're only using regexps:
/^[a-z0-9.-]{1,20}$/i
&& /^[a-z]/i
&& /[a-z0-9]$/i
I'd be surprised if this was slower than a one-liner regexp, but it's certainly more readable.
If it contains only [a-zA-Z0-9.-], starts with [a-zA-Z] and ends with [a-zA-Z0-9], it doesn't start with [-0-9.] and doesn't end with [.-]
^(?![-0-9.])[a-zA-Z0-9.-]{1,20}(?<![.-])$
Note: Works only in regex flavors, that support negative lookbehind.
Test at regex101
Try this:
^[a-zA-Z]$|^(?=.{2,20}$)[a-zA-Z][a-zA-Z0-9.-]*[a-zA-Z0-9]$
You could use the below regex,
^(?=.{1,20}$)[a-zA-Z][a-zA-Z0-9.-]*[a-zA-Z0-9]$
DEMO
If the string does not start with [a-zA-Z] the regex will fail. The rest is easier to understand.
^(?=[a-zA-Z])[a-zA-Z0-9.-]{0,19}[a-zA-Z0-9]$
DEMO
The following is a fairly simple solution:
^[a-zA-Z]$|^[a-zA-Z]{1}[a-zA-Z0-9.-]{0,18}[a-zA-Z0-9]{1}$
Broken down:
Either: a single character in the group [a-zA-Z]
Or: Exactly one character in group [a-zA-Z], up to 18 characters in the group [a-zA-Z0-9.-] and finally 1 character from the group [a-zA-Z0-9].
Matches correctly against the following:
Valid
Valid.UserName
Valid1-1UserName
0-Invalid
Invalid.
Invalid-ThisIsTooLong
V

Replace a character by another, unless it is located in between braces

What I would like to do with the following string, is to replace all comas "," by tabulation, unless the said coma is between braces { }.
Say I have:
goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}"
The result should be:
goldRigged\t1\t0\t0\t0\t1\t0\t0\t0\t1\t"{"LootItemID": "goldOre"**,** "Amount": 1}"
I already have: \"(\\{((.*?))\\})\" which allow me to match what's in between { }.
The idea would be to exclude the content with something and match any comas with something like \",^(\\{((.*?))\\})\"
But I guess that by doing that it will exclude the comma itself.
What you would need is called a negative lookahead and a negative lookbehind. However, this would make up a quite complex statement:
Match all commas that are not preceeded by a opening brace as long as they were not previously preceeded by a closing brace (plus the reverted logic for the right side of the comma). This will result in an expression that is difficult to process because the regex engine constantly needs to run up and down your string from its current position what will be rather inefficient.
Instead, iterate over all characters of your string. If you match an opening brace, set an escape hint. Remove it, when you find a closing brace. When you find a comma, replace it when your escape hint is not set. Write your result to some sort of string buffer and your solution will b significantly more efficiant over the regex.
You want to use a negative lookaround to achieve this:
(?<![\{\}]),*(?![\{\}]) should work, try here: http://regex101.com/r/gG3oU1
Use negative lookahead (?!expr) and negative lookbehind (?<!expr) in your regex expression
for example you can code like this:
System.Text.RegularExpressions.Regex.Replace(
"goldRigged,1,0,0,0,1,0,0,0,1, {\"LootItemID\": \"goldOre\", \"Amount\": 1}" ,
#"(?<!\{[^\}].*)[,](?![^\{]*\})", "\t");
Does your input line contain the { only in the last token?
If yes then you can try this brute force approach
echo "goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}"" | awk -F'{' '{one=$1;gsub(",","\t",one);printf("%s{%s\n",one,$2);}
The below regex is an expensive way of doing it. As suggest by #Sniffer a parser would be nicer here :)
(?=,.*?"{),|(?!,.*?\}),
First alternation
(?=,.*?"{), - make sure comma is outside the sequence "{
Second alternation
(?!,.*?\}), - make sure comma isn't inside the sequence }"
There will be edge cases that haven't been accounted for, that's the parser comes in
I think you actually need only one lookahead:
,(?=[^{}]*({|$))
reads: a comma, followed by some non-braces and then either an open brace or the end.
Example in JS:
> x = 'goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}",some,more{stuff,ff}end'
> x.replace(/,(?=[^{}]*({|$))/g, "#")
"goldRigged#1#0#0#0#1#0#0#0#1#"{"LootItemID": "goldOre", "Amount": 1}"#some#more{stuff,ff}end"
Note this doesn't work if braces can be nested, in this case you need either a regex engine with recursion (?R) or a proper parser.

Regex to match a string not followed by anything

I am trying to figure out a regex sequence that will match the first item in the list below but not the other two, {Some-Folder} is variable.
http://www.url.com/{Some-Folder}/
http://www.url.com/{Some-Folder}/thing/key/
http://www.url.com/{Some-Folder}/thing/119487302/
http://www.url.com/{Some-Folder}/{something-else}
Essentially I want to be able to detect anything that is of the form:
http://www.url.com/{Some-Folder}/
or
http://www.url.com/{Some-Folder}
but not
http://www.url.com/{Some-Folder}/{something-else}
So far I have
http://www.url.com/[A-Z,-]*\/^.
but this doesn't match anything
http://www.url.com/[^/]+/?$
Or, in the few parsers that use \Z as end of text,
http://www.url.com/[^/]+/?\Z
I customized a regex I've used for URL parsing before, it's not perfect, and will need even more work once gTLD becomes more used. Anyway, here it is:
\bhttps?:\/\/[a-z0-9.-]+\.(?:[a-z]{2,4}|museum|travel)\/[^\/\s]+(?:\/\b)?
You may want to add case insensitive flag, for whichever language you're using.
Demo: http://rubular.com/r/HyVXU30Hvp
You may use the following regex:
(?m)http:\/\/www\.example\.com\/[^\/]+\/?$
Explanation:
(?m) : Set the m modifier which makes ^ and $ match start and end of line respectively
http:\/\/www\.example\.com\/ : match http://www.example.com/
[^\/]+ : match anything except / one or more times
\/? : optionally match /
$ : declare end of line
Online demo
I've been looking for an answer to this exact problem. aaaaaa123456789's answer almost worked for me. But the $ and \Z didn't work. My solution is:
http://www.url.com/[^/]+/?.{0}