Regex match contingent non word in string - regex

How would I create a regex which matches all contingent non words "[a].[b]" in a string? I don't care about spaces or newline or any invisible character I haven't heard about, as long as it only matches the contingent string. Anything else is invalid.
[a].[b] // valid for "[a].[b]"
[a].[b] // valid for "[a].[b]"
[a].[b] // valid for "[a].[b]"
\n[a].[b] // valid for "[a].[b]"
[a].[b]$[c] // invalid because of "$" (or any other character) and everything after
[c]$[a].[b] // invalid because of "$" (or any other character) and everything before
[c].[a].[b] // invalid because of "[c]."
The problem I'm having is if I try
[\ \n\r]
it matches the space before " [a].[b]" which is not what I want, I want spaces to be ignored because I don't want to replace anything besides "[a].[b]". But of course only when it is a contingent string, "somethinganythingbutspaceandnewline[a].[b]" I don't want to replace.
Thank you.

If I understand you right, you want [a].[b] string with possible leading and trailing whitespaces. If it's your case, I suggest \A\s*\[a\]\.\[b\]\s*\Z pattern, e.g. (C# code)
string pattern = #"\A\s*\[a\]\.\[b\]\s*\Z";
string source = "\n[a].[b] \t ";
if (Regex.IsMatch(source, pattern))
Console.Write("Match");
else
Console.Write("Not Match");
Pattern:
\A - beginning of the text
\s* - zero or more leading whitespaces
\[a\]\.\[b\] - string to find (please, notice escapements)
\s* - zero or more trailing whitespaces
\Z - end of the text
Edit: As far as I can see, the match's core is a constant - [a].[b], so I doubt if you really want the match's text which is "[a].[b]". If you do
you can try look ahead and look behind construction (C# code):
string pattern = #"(?<=\A\s*)\[a\]\.\[b\](?=\s*\Z)";
string source = "\n[a].[b] \t ";
var match = Regex.Match(source, pattern);
if (match.Success)
Console.Write($"Matched: '{match.Value}'");
Now
(?<=\A\s*) - zero or more leading spaces should be matched but not included into the match
(?=\s*\Z) - zero or more trailing spaces should be matched but not included into the match
Edit 2: in case you have several [a].[b] separated by white spaces (see comments below)
string pattern = #"(?<=\A|\s+)\[a\]\.\[b\](?=\s+|\Z)";
string source = "[a].[b] [a].[b][a].[b] [a].[b] \t ";
string result = string.Join(", ", Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value));
Console.Write(join);
Outcome (2 matches):
[a].[b], [a].[b]

Related

Regex To Match String With All Words Contains Certain Format

I want to validate a field of string so that it only accept string that contains words with certain format.
Example accepted string:
#key;
#key1; #key2;#key3;
Example rejected string:
key;
%key1X #key2X$key3X
My regex:
\B(\#[a-zA-Z0-9_; ]+\b)(\;)
It seems my regex still accept a string as long as it has a word with valid format, while I only want it to be accepted if whole words are in the correct format.
Current example:
%key1; %key2 #keysz;#key3; #key4;
From the above Current Example still accepted because it contains #keysz; and #key3; while I want it to be rejected because there are %key1; %key2 and #key4;.
I've do some search and the closest I can found is this question, but it returns similar result as my current regex.
What did i do wrong in my regex? What is the right regex?
Sorry if this is dumb question but I'm a newbie in regex.
The main thing needed are start ^ and end $ anchors. The rest can be simplified too:
^( *#\w+;)+$
See live demo.
Breaking it down:
^ = start
* = 0-n spaces
# = a literal hash (these don't need escaping in regex)
\w+ = one or more word characters (letters, digits and the underscore)`
$
If underscore can be in the input and must not be, then use:
^( *#[A-Za-z0-9]+;)+$
Your regex matches a full sentence because in your regex pattern(\B(\#[a-zA-Z0-9_; ]+\b)(\;)) you haven't specified where the matching process should start and end. So regex engine will try to match every position of the string on which you run the regex.match.
The way to specify where regex should try to match is done by adding anchors(^-beginning and $-end) to regex pattern.
You can edit your pattern to look like this: /(?:\s|^)(#[a-zA-Z0-9_; ]+?);(?:\s|$)/gm
Explanation:
/(?:\s|^)
- (?: means a non capture group, means dont include whatever is matched in between these () in the result. \s|^ means start matching if the beginning is a white space or beginning of a string.
(#[a-zA-Z0-9_; ]+);
- () is a regular capture group, which means that things captured in this group are included in the result.
You don't need to insert a '\' before every symbol
(?:\s|$)/
- another non capture group, specifying to match a white space or end position of a string.
gm
- global and multiline flags of javascript regex
Here is an example:
let regex_pattern = /(?:\s|^)(#[a-zA-Z0-9_; ]+);(?=\s|$)/gm
let input1 = " #key;" // string with just one word
let input2 = "#key1; #key2;#key3;" // string with one whole word and another word which will match your pattern
let input3 = "soemthing random #key;andjointstring" // a string with a word that will match the pattern but its not a whole word
console.log(input1.match(regex_pattern)) // it matches
console.log(input2.match(regex_pattern)) // it matches
console.log(input3.match(regex_pattern)) // it doesnt matches

Regex extraction of substrings ignoring internal character used to match

I'm matching a string of a key value pair between characters "" with "(.*?)" how can I ignore any extra " characters within the value part.
example string {"1"=>"email#example.com"}
You may use
String pat = "(?<=\\{|=>)\"(.*?)\"(?=\\}|=>)";
See the regex demo
Details
(?<=\{|=>) - a positive lookbehind that matches a location immediately preceded with { or =>
" - a double quotation mark
(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
" - a double quotation mark
(?=\}|=>) - a positive lookahead that matches a location immediately followed with } or =>.

Get everything except special words

I need a regex which match everything expect for several words.
The input-string is something like:
This Is A &ltTest$gt;
It should match
This Is A Test
So I want to have everything around , < and >
I've tried something like [^ ] to ignore all appearances of but this excludes every character.
/&[a-zA-Z]{2,8};/g
Breakdown:
& - match & literally
[a-zA-Z]{2,8} - match any characters in ranges a-z and A-Z from 2 to 8 times
; - until a semi colon
The longest special character that you could encounter is ϑ - ϑ, and so I've taken this into account in the regex.
The proper formatting replaces each of the special characters with a space, and replaces multiple spaces in a row with a single space
let regex = /&[a-zA-Z]{2,8};/g,
string = "This Is A <Test>",
properlyFormatted = string.replace(regex, " ").replace(/\ +/g, " ");
console.log(properlyFormatted);
The alternative:
/&(?:lt|gt|nbsp);/g
Breakdown:
& - match & literally
(?:lt|gt|nbsp) - match any group in lt, gt, nbsp
; - directly followed by a semi colon
This regex will only take into account the specific characters you described.
let regex = /&(?:lt|gt|nbsp);/g,
string = "This Is A <Test>",
properlyFormatted = string.replace(regex, " ").replace(/\ +/g, " ");
console.log(properlyFormatted);

How can I check it with regular Expression?

I have a long input string that contains certain field names in-bedded in it. For instance:
SELECT some-name, some-name FROM [some-table] WHERE [some-column] = 'some-value'
The actual field name may change, but it is always in the form of word-word. I need to perform a regex replace on the string so that the output will look like this:
SELECT some - name, some - name FROM [some-table] WHERE [some-column] = 'some - value'
In other words, when the field name is enclosed in square-brackets, it should be left untouched, but when it is not, spaces should be inserted on either side of the dash. There are no nested square brackets and the reserved word could be one or more in the string.
You can do this:
Regex.Replace(input, "(?<!\[[^-\]]*)(\w+)-(\w+)(?![^-\]]*\])", "$1 - $2")
Here's an explanation of the pattern:
(?<!\[[^-\]]*) - This is a negative look-behind. It asserts that matches cannot be immediately preceded by text that matches the sub-pattern \[[^-\]]*. In other words, the matches we are looking for cannot be preceded by a [ character followed by any number of characters that are not a - or a ].
(\w+)-(\w+) - Matches one or more word-characters, then a dash, and then one or more word characters following the dash. By enclosing the sub-patterns on either side of the dash in capturing groups, we can then refer to their values as $1 and $2 in the replacement pattern.
(?![^-\]]*\]) - This is a negative look-ahead. Similar to the negative look-behind, it asserts that matches cannot be immediately followed by text which matches the sub pattern [^-\]]*\]. In other words, a match cannot be followed by any number of characters that are not a - or a ] and then a closing ].
See a demo.
At first glance, you might assume that you could simply assert that is must not be immediately preceded by a [ character and that it must not be immediately followed by a ] character. In other words, (?<!\[)(\w+)-(\w+)(?!\]). However, that pattern would still match the text ome-nam in the input [some-name] because the text ome-nam is not immediately preceded or followed by the brackets.
Dim regex As Regex = New Regex("\[[^-]*-[^-]*\]")
Dim match As Match = regex.Match("A long string containing square brackets [some-name]")
If match.Success Then
Console.WriteLine(match.Value)
End If
Or you could use Regex.IsMatch:
Return Regex.IsMatch("A long string containing square brackets [some-name]",
"\[[^-]*-[^-]*\]")
You may match and capture the [...] substrings and then only match hyphens that are not surrounded with hyphens to replace them:
Dim nStr As String = "SELECT 'some-name' FROM [some-name]"
Dim nResult = Regex.Replace(nStr, "(\[.+?])|\s*-\s*", New MatchEvaluator(Function(m As Match)
If m.Groups(1).Success Then
Return m.Groups(1).Value
Else
Return " - "
End If
End Function))
So, what is happening is:
(\[[^]]+]) - matches and stores the value of [...] substring inside the Group(1) buffer (or \[.+?] can be used here to match a [, then 1 or more any characters and then ] - with RegexOptions.Singleline flag so that . could match a newline, too)
(?<!\s)-(?!\s) - matches any hyphen not preceded ((?<!\s)) or followed ((?!\s)) with whitespace (\s). Actually, we may even use \s*-\s* (where \s* stands for zero or more whitespaces as many as possible since * is a greedy quantifier matching zero or more occurrences of the quantified subpattern) here to remove any whitespace there is to make sure we just insert 1 space before and after -.
If Group 1 matches, then we just re-insert it (Return m.Groups(1).Value), else we insert the space-enclosed hyphen Return " - ".
Just to check if it exists, you could try
\[[^\]]+-[^\]]+\]
It matches a literal [ and then any characters, except ], up to (including) a hyphen. Then again any characters, except ], up to a literal ].
See it here at regex101.
Actually I don't know the vb.net syntax but you can use regex as
/[\s\'](\w+)\-(\w+)/g
find the (\w+)-(\w+) which is followed by space or ' and replace your string with capture group 1st - 2nd
See the sample here

Regular Expression remove leading blank and dash character

Given a string like String a="- = - - What is your name?";
How to remove the leading equal, dash, space characters, to get the clean text,
"What is your name?"
If you want to remove the leading non-alphabets you can match:
^[^a-zA-Z]+
and replace it with '' (empty string).
Explanation:
first ^ - Anchor to match at the
begining.
[] - char class
second ^ - negation in a char class
+ - One or more of the previous match
So the regex matches one or more of any non-alphabets that are at the beginning of the string.
In your case case it will get rid of all the leading spaces, leading hyphens and leading equals sign. In short everything before the first alphabet.
$a=~s/- = - - //;
In Javascript you could do it like this
var a = "- = - - What is your name?";
a = a.replace(/^([-=\s]*)([a-zA-Z0-9])/gm,"$2");
Java:
String replaced = a.replaceFirst("^[-= ]*", "");
Assuming Java try this regex:
/^\W*(.*)$/
retrieve your string from captured group 1!
\W* matches all preceding non-word characters
(.*)then matches all characters to the end beginning with the first word character
^,$ are the boundaries. you could even do without $ in this case.
Tip try the excellent Java regex tutorial for reference.
In Python:
>>> "- = - - What is your name?".lstrip("-= ")
'What is your name?'
To remove any kind of whitespace, use .lstrip("-= \t\r\n").
In Javascript, I needed to do this and did it using the following regex:
^[\s\-]+
and replace it with '' (empty string) like this:
yourStringValue.replace(/^[\s\-]+/, '');