Regex extraction of substrings ignoring internal character used to match

Regex extraction of substrings ignoring internal character used to match - regex

I'm matching a string of a key value pair between characters "" with "(.*?)" how can I ignore any extra " characters within the value part.
example string {"1"=>"email#example.com"}

You may use
String pat = "(?<=\\{|=>)\"(.*?)\"(?=\\}|=>)";
See the regex demo
Details
(?<=\{|=>) - a positive lookbehind that matches a location immediately preceded with { or =>
" - a double quotation mark
(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
" - a double quotation mark
(?=\}|=>) - a positive lookahead that matches a location immediately followed with } or =>.

Related

Select all single quotes in regex field

I have this field in my JSON data:
"pinyin": "bei1 'ai1",
I just want to select any single quote ' like the one before ai1;
I tried this
(?<="pinyin": "\w*)\'+(?!")
but it didn't work

You can use
(?<="pinyin": "[\w\s]*)'(?!")
See this regex demo. Details:
(?<="pinyin": "[\w\s]*) - a positive lookbehind that matches a location that is immediately preceded with "pinyin": " and then any zero or more word or whitespace chars
' - a single quotation mark
(?!") - a negative lookahead that fails the match of there is a " char immediately to the right of the current location.

A regular expression for matching a group followed by a specific character

So I need to match the following:
1.2.
3.4.5.
5.6.7.10
((\d+)\.(\d+)\.((\d+)\.)*) will do fine for the very first line, but the problem is: there could be many lines: could be one or more than one.
\n will only appear if there are more than one lines.
In string version, I get it like this: "1.2.\n3.4.5.\n1.2."
So my issue is: if there is only one line, \n needs not to be at the end, but if there are more than one lines, \n needs be there at the end for each line except the very last.

Here is the pattern I suggest:
^\d+(?:\.\d+)*\.?(?:\n\d+(?:\.\d+)*\.?)*$
Demo
Here is a brief explanation of the pattern:
^ from the start of the string
\d+ match a number
(?:\.\d+)* followed by dot, and another number, zero or more times
\.? followed by an optional trailing dot
(?:\n followed by a newline
\d+(?:\.\d+)*\.?)* and another path sequence, zero or more times
$ end of the string

You might check if there is a newline at the end using a positive lookahead (?=.*\n):
(?=.*\n)(\d+)\.(\d+)\.((\d+)\.)*
See a regex demo
Edit
You could use an alternation to either match when on the next line there is the same pattern following, or match the pattern when not followed by a newline.
^(?:\d+\.\d+\.(?:\d+\.)*(?=.*\n\d+\.\d+\.)|\d+\.\d+\.(?:\d+\.)*(?!.*\n))
Regex demo
^ Start of string
(?: Non capturing group
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
(?=.*\n\d+\.\d+\.) Positive lookahead, assert what follows a a newline starting with the pattern
| Or
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
*(?!.*\n) Negative lookahead, assert what follows is not a newline
) Close non capturing group

(\d+\.*)+\n* will match the text you provided. If you need to make sure the final line also ends with a . then (\d+\.)+\n* will work.

Most programming languages offer the m flag. Which is the multiline modifier. Enabling this would let $ match at the end of lines and end of string.
The solution below only appends the $ to your current regex and sets the m flag. This may vary depending on your programming language.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /((\d+)\.(\d+)\.((\d+)\.)*)$/gm,
match;
while (match = regex.exec(text)) {
console.log(match);
}
You could simplify the regex to /(\d+\.){2,}$/gm, then split the full match based on the dot character to get all the different numbers. I've given a JavaScript example below, but getting a substring and splitting a string are pretty basic operations in most languages.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /(\d+\.){2,}$/gm;
/* Slice is used to drop the dot at the end, otherwise resulting in
* an empty string on split.
*
* "1.2.3.".split(".") //=> ["1", "2", "3", ""]
* "1.2.3.".slice(0, -1) //=> "1.2.3"
* "1.2.3".split(".") //=> ["1", "2", "3"]
*/
console.log(
text.match(regex)
.map(match => match.slice(0, -1).split("."))
);
For more info about regex flags/modifiers have a look at: Regular Expression Reference: Mode Modifiers

Regex match contingent non word in string

How would I create a regex which matches all contingent non words "[a].[b]" in a string? I don't care about spaces or newline or any invisible character I haven't heard about, as long as it only matches the contingent string. Anything else is invalid.
[a].[b] // valid for "[a].[b]"
[a].[b] // valid for "[a].[b]"
[a].[b] // valid for "[a].[b]"
\n[a].[b] // valid for "[a].[b]"
[a].[b]$[c] // invalid because of "$" (or any other character) and everything after
[c]$[a].[b] // invalid because of "$" (or any other character) and everything before
[c].[a].[b] // invalid because of "[c]."
The problem I'm having is if I try
[\ \n\r]
it matches the space before " [a].[b]" which is not what I want, I want spaces to be ignored because I don't want to replace anything besides "[a].[b]". But of course only when it is a contingent string, "somethinganythingbutspaceandnewline[a].[b]" I don't want to replace.
Thank you.

If I understand you right, you want [a].[b] string with possible leading and trailing whitespaces. If it's your case, I suggest \A\s*\[a\]\.\[b\]\s*\Z pattern, e.g. (C# code)
string pattern = #"\A\s*\[a\]\.\[b\]\s*\Z";
string source = "\n[a].[b] \t ";
if (Regex.IsMatch(source, pattern))
Console.Write("Match");
else
Console.Write("Not Match");
Pattern:
\A - beginning of the text
\s* - zero or more leading whitespaces
\[a\]\.\[b\] - string to find (please, notice escapements)
\s* - zero or more trailing whitespaces
\Z - end of the text
Edit: As far as I can see, the match's core is a constant - [a].[b], so I doubt if you really want the match's text which is "[a].[b]". If you do
you can try look ahead and look behind construction (C# code):
string pattern = #"(?<=\A\s*)\[a\]\.\[b\](?=\s*\Z)";
string source = "\n[a].[b] \t ";
var match = Regex.Match(source, pattern);
if (match.Success)
Console.Write($"Matched: '{match.Value}'");
Now
(?<=\A\s*) - zero or more leading spaces should be matched but not included into the match
(?=\s*\Z) - zero or more trailing spaces should be matched but not included into the match
Edit 2: in case you have several [a].[b] separated by white spaces (see comments below)
string pattern = #"(?<=\A|\s+)\[a\]\.\[b\](?=\s+|\Z)";
string source = "[a].[b] [a].[b][a].[b] [a].[b] \t ";
string result = string.Join(", ", Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value));
Console.Write(join);
Outcome (2 matches):
[a].[b], [a].[b]

How can I check it with regular Expression?

I have a long input string that contains certain field names in-bedded in it. For instance:
SELECT some-name, some-name FROM [some-table] WHERE [some-column] = 'some-value'
The actual field name may change, but it is always in the form of word-word. I need to perform a regex replace on the string so that the output will look like this:
SELECT some - name, some - name FROM [some-table] WHERE [some-column] = 'some - value'
In other words, when the field name is enclosed in square-brackets, it should be left untouched, but when it is not, spaces should be inserted on either side of the dash. There are no nested square brackets and the reserved word could be one or more in the string.

You can do this:
Regex.Replace(input, "(?<!\[[^-\]]*)(\w+)-(\w+)(?![^-\]]*\])", "$1 - $2")
Here's an explanation of the pattern:
(?<!\[[^-\]]*) - This is a negative look-behind. It asserts that matches cannot be immediately preceded by text that matches the sub-pattern \[[^-\]]*. In other words, the matches we are looking for cannot be preceded by a [ character followed by any number of characters that are not a - or a ].
(\w+)-(\w+) - Matches one or more word-characters, then a dash, and then one or more word characters following the dash. By enclosing the sub-patterns on either side of the dash in capturing groups, we can then refer to their values as $1 and $2 in the replacement pattern.
(?![^-\]]*\]) - This is a negative look-ahead. Similar to the negative look-behind, it asserts that matches cannot be immediately followed by text which matches the sub pattern [^-\]]*\]. In other words, a match cannot be followed by any number of characters that are not a - or a ] and then a closing ].
See a demo.
At first glance, you might assume that you could simply assert that is must not be immediately preceded by a [ character and that it must not be immediately followed by a ] character. In other words, (?<!\[)(\w+)-(\w+)(?!\]). However, that pattern would still match the text ome-nam in the input [some-name] because the text ome-nam is not immediately preceded or followed by the brackets.

Dim regex As Regex = New Regex("\[[^-]*-[^-]*\]")
Dim match As Match = regex.Match("A long string containing square brackets [some-name]")
If match.Success Then
Console.WriteLine(match.Value)
End If
Or you could use Regex.IsMatch:
Return Regex.IsMatch("A long string containing square brackets [some-name]",
"\[[^-]*-[^-]*\]")

You may match and capture the [...] substrings and then only match hyphens that are not surrounded with hyphens to replace them:
Dim nStr As String = "SELECT 'some-name' FROM [some-name]"
Dim nResult = Regex.Replace(nStr, "(\[.+?])|\s*-\s*", New MatchEvaluator(Function(m As Match)
If m.Groups(1).Success Then
Return m.Groups(1).Value
Else
Return " - "
End If
End Function))
So, what is happening is:
(\[[^]]+]) - matches and stores the value of [...] substring inside the Group(1) buffer (or \[.+?] can be used here to match a [, then 1 or more any characters and then ] - with RegexOptions.Singleline flag so that . could match a newline, too)
(?<!\s)-(?!\s) - matches any hyphen not preceded ((?<!\s)) or followed ((?!\s)) with whitespace (\s). Actually, we may even use \s*-\s* (where \s* stands for zero or more whitespaces as many as possible since * is a greedy quantifier matching zero or more occurrences of the quantified subpattern) here to remove any whitespace there is to make sure we just insert 1 space before and after -.
If Group 1 matches, then we just re-insert it (Return m.Groups(1).Value), else we insert the space-enclosed hyphen Return " - ".

Just to check if it exists, you could try
\[[^\]]+-[^\]]+\]
It matches a literal [ and then any characters, except ], up to (including) a hyphen. Then again any characters, except ], up to a literal ].
See it here at regex101.

Actually I don't know the vb.net syntax but you can use regex as
/[\s\'](\w+)\-(\w+)/g
find the (\w+)-(\w+) which is followed by space or ' and replace your string with capture group 1st - 2nd
See the sample here

regex is allowing character while using \s

How do I write a regular expression for checking if the entered value are all white space or empty or digits. In all other cases it will return false. I tried
\s*|\d+ which is allowing character values too.

Your regex \s*|\d+ just matches 0 or more whitespace or 1 or more digits anywhere inside an input string and it may contain whitespace mixed with digits and other characters.
You can use the following regex:
^(?:\s+|\d+)?$
See demo
The regex matches:
^ - start of string
(?:\s+|\d+)? - 1 or 0 occurrences (due to (?:...)?) of...
\s+ - 1 or more whitespace symbols or
\d+ - 1 or more digits
$ - end of string
In plain human words,
^....$ - make sure we require a full string match, any partial matches inside a string are not allowed
(?:...)? - all the texts matched with the subpatterns inside this group are optional, thus allowing an empty string match (i.e. return true if the string is empty).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex extraction of substrings ignoring internal character used to match - regex

I'm matching a string of a key value pair between characters "" with "(.*?)" how can I ignore any extra " characters within the value part. example string {"1"=>"email#example.com"}

Related

Select all single quotes in regex field

A regular expression for matching a group followed by a specific character

Regex match contingent non word in string

How can I check it with regular Expression?

regex is allowing character while using \s

Categories

Resources