Get everything except special words - regex

I need a regex which match everything expect for several words.
The input-string is something like:
This Is A &ltTest$gt;
It should match
This Is A Test
So I want to have everything around , < and >
I've tried something like [^ ] to ignore all appearances of but this excludes every character.

/&[a-zA-Z]{2,8};/g
Breakdown:
& - match & literally
[a-zA-Z]{2,8} - match any characters in ranges a-z and A-Z from 2 to 8 times
; - until a semi colon
The longest special character that you could encounter is ϑ - ϑ, and so I've taken this into account in the regex.
The proper formatting replaces each of the special characters with a space, and replaces multiple spaces in a row with a single space
let regex = /&[a-zA-Z]{2,8};/g,
string = "This Is A <Test>",
properlyFormatted = string.replace(regex, " ").replace(/\ +/g, " ");
console.log(properlyFormatted);
The alternative:
/&(?:lt|gt|nbsp);/g
Breakdown:
& - match & literally
(?:lt|gt|nbsp) - match any group in lt, gt, nbsp
; - directly followed by a semi colon
This regex will only take into account the specific characters you described.
let regex = /&(?:lt|gt|nbsp);/g,
string = "This Is A <Test>",
properlyFormatted = string.replace(regex, " ").replace(/\ +/g, " ");
console.log(properlyFormatted);

Related

Notepad++: reemplace ocurrences of characters before other character

I have a file with text like this:
"Title" = "Body"
And I would like to remove both " before the =, to leave it like this:
Title = "Body"
So far I managed to select the first block of text with:
.+(=)
That selects everything up to the =, but I can't find how to reemplace (or delete) both " .
Any suggestions?
You could use a capture group in the replacement, and match the double quotes to be removed while asserting an equals sign at the right.
Find what:
"([^"]+)"(?=\h*=)
" Match literally
([^"]+) Capture group 1, match 1+ times any char other than "
" Match literally
(?=\h*=) Positive lookahead, assert an = sigh at the right
Regex demo
Replace with:
$1
To match the whole pattern from the start till end end of the string, you might also use 2 capture groups and use those in the replacement.
^"([^"]+)"(\h*=\h*"[^"]+")$
Regex demo
In the replacement use $1$2
You can use
(?:\G(?!^)|^(?=.*=))[^"=\v]*\K"
Replace with an empty string.
Details:
(?:\G(?!^)|^(?=.*=)) - end of the previous successful match (\G(?!^)) or (|) start of a line that contains = somewhere on it (^(?=.*=))
[^"=\v]* - any zero or more chars other than ", = and vertical whitespace
\K - omit the text matched
" - a " char (matched, consumed and removed)
See the screenshot with settings and a demo:

Regex for parse name with one or more words after double number and before 2 or more spaces

Problem:
How create regex to parse "DISNAY LAND 2.0 GCP" like name from Array of lines in Scala like this:
DE1ALAT0002 32.4756 -86.4393 106.1 ZQ DISNAY LAND 2.0 GCP 23456
//For using in code:
val regex = """(?:[\d\.\d]){2}\s*(?:[\d.\d])\s*(ZQ)\s*([A-Z])""".r . // my attempt
val getName = row match {
case regex(name) => name
case _ =>
}
I'm sure only in:
1) there is different number of spaces between values
2) useful value "DISNAY LAND 2.0 GCP" come after double number and "ZQ" letters
3) name separating with one space and may consist of one or many words
4) name ending with two or more spaces
sorry if I repeat the question, but after a long search I did not find the right solution
Many thank for answers
You may use an .unanchored pattern like
\d\.\d+\s+ZQ\s+(\S+(?:\s\S+)*)
See the regex demo. Details
\d\.\d+ - 1 digit, . and then 1+ digits
\s+ - 1+ whitespaces
ZQ - ZQ substring
\s+ - 1+ whitespaces (here, the left-hand side context definition ends, now, starting to capture the value we need to return)
(\S+(?:\s\S+)*) - Capturing group 1:
\S+ - 1 or more non-whitespace chars
(?:\s\S+)* - a non-capturing group that matches 0 or more sequences of a single whitespace (\s) and then 1+ non-whitespace chars (so, up to the double whitespace or end of string).
Scala demo:
val regex = """\d\.\d+\s+ZQ\s+(\S+(?:\s\S+)*)""".r.unanchored
val row = "DE1ALAT0002 32.4756 -86.4393 106.1 ZQ DISNAY LAND 2.0 GCP 23456"
val getName = row match {
case regex(name) => name
case _ =>
}
print(getName)
Output: DISNAY LAND 2.0 GCP

Regex match contingent non word in string

How would I create a regex which matches all contingent non words "[a].[b]" in a string? I don't care about spaces or newline or any invisible character I haven't heard about, as long as it only matches the contingent string. Anything else is invalid.
[a].[b] // valid for "[a].[b]"
[a].[b] // valid for "[a].[b]"
[a].[b] // valid for "[a].[b]"
\n[a].[b] // valid for "[a].[b]"
[a].[b]$[c] // invalid because of "$" (or any other character) and everything after
[c]$[a].[b] // invalid because of "$" (or any other character) and everything before
[c].[a].[b] // invalid because of "[c]."
The problem I'm having is if I try
[\ \n\r]
it matches the space before " [a].[b]" which is not what I want, I want spaces to be ignored because I don't want to replace anything besides "[a].[b]". But of course only when it is a contingent string, "somethinganythingbutspaceandnewline[a].[b]" I don't want to replace.
Thank you.
If I understand you right, you want [a].[b] string with possible leading and trailing whitespaces. If it's your case, I suggest \A\s*\[a\]\.\[b\]\s*\Z pattern, e.g. (C# code)
string pattern = #"\A\s*\[a\]\.\[b\]\s*\Z";
string source = "\n[a].[b] \t ";
if (Regex.IsMatch(source, pattern))
Console.Write("Match");
else
Console.Write("Not Match");
Pattern:
\A - beginning of the text
\s* - zero or more leading whitespaces
\[a\]\.\[b\] - string to find (please, notice escapements)
\s* - zero or more trailing whitespaces
\Z - end of the text
Edit: As far as I can see, the match's core is a constant - [a].[b], so I doubt if you really want the match's text which is "[a].[b]". If you do
you can try look ahead and look behind construction (C# code):
string pattern = #"(?<=\A\s*)\[a\]\.\[b\](?=\s*\Z)";
string source = "\n[a].[b] \t ";
var match = Regex.Match(source, pattern);
if (match.Success)
Console.Write($"Matched: '{match.Value}'");
Now
(?<=\A\s*) - zero or more leading spaces should be matched but not included into the match
(?=\s*\Z) - zero or more trailing spaces should be matched but not included into the match
Edit 2: in case you have several [a].[b] separated by white spaces (see comments below)
string pattern = #"(?<=\A|\s+)\[a\]\.\[b\](?=\s+|\Z)";
string source = "[a].[b] [a].[b][a].[b] [a].[b] \t ";
string result = string.Join(", ", Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value));
Console.Write(join);
Outcome (2 matches):
[a].[b], [a].[b]

Relevant Regular Expression in scala

I want to keep only the last term of a string separated by dots
Example:
My string is:
abc"val1.val2.val3.val4"zzz
Expected string after i use regex:
abc"val4"zzz
Which means i want the content from left-hand side which was separated with dot (.)
The most relevant I tried was
val json="""abc"val1.val2.val3.val4"zzz"""
val sortie="""(([A-Za-z0-9]*)\.([A-Za-z0-9]*){2,10})\.([A-Za-z0-9]*)""".r.replaceAllIn(json, a=> a.group(3))
the result was:
abc".val4"zzz
Can you tell me if you have different solution for regex please?
Thanks
You may use
val s = """abc"val1.val2.val3.val4"zzz"""
val res = "(\\w+\")[^\"]*\\.([^\"]*\")".r replaceAllIn (s, "$1$2")
println(res)
// => abc"val4"zzz
See the Scala demo
Pattern details:
(\\w+\") - Group 1 capturing 1+ word chars and a "
[^\"]* - 0+ chars other than "
\\. - a dot
([^\"]*\") - Group 2 capturing 0+ chars other than " and then a ".
The $1 is the backreference to the first group and $2 inserts the text inside Group 2.
Maybe without Regex at all:
scala> json.split("\"").map(_.split("\\.").last).mkString("\"")
res4: String = abc"val4"zzz
This assumes you want each "token" (separated by ") to become the last dot-separated inner token.

How can I check it with regular Expression?

I have a long input string that contains certain field names in-bedded in it. For instance:
SELECT some-name, some-name FROM [some-table] WHERE [some-column] = 'some-value'
The actual field name may change, but it is always in the form of word-word. I need to perform a regex replace on the string so that the output will look like this:
SELECT some - name, some - name FROM [some-table] WHERE [some-column] = 'some - value'
In other words, when the field name is enclosed in square-brackets, it should be left untouched, but when it is not, spaces should be inserted on either side of the dash. There are no nested square brackets and the reserved word could be one or more in the string.
You can do this:
Regex.Replace(input, "(?<!\[[^-\]]*)(\w+)-(\w+)(?![^-\]]*\])", "$1 - $2")
Here's an explanation of the pattern:
(?<!\[[^-\]]*) - This is a negative look-behind. It asserts that matches cannot be immediately preceded by text that matches the sub-pattern \[[^-\]]*. In other words, the matches we are looking for cannot be preceded by a [ character followed by any number of characters that are not a - or a ].
(\w+)-(\w+) - Matches one or more word-characters, then a dash, and then one or more word characters following the dash. By enclosing the sub-patterns on either side of the dash in capturing groups, we can then refer to their values as $1 and $2 in the replacement pattern.
(?![^-\]]*\]) - This is a negative look-ahead. Similar to the negative look-behind, it asserts that matches cannot be immediately followed by text which matches the sub pattern [^-\]]*\]. In other words, a match cannot be followed by any number of characters that are not a - or a ] and then a closing ].
See a demo.
At first glance, you might assume that you could simply assert that is must not be immediately preceded by a [ character and that it must not be immediately followed by a ] character. In other words, (?<!\[)(\w+)-(\w+)(?!\]). However, that pattern would still match the text ome-nam in the input [some-name] because the text ome-nam is not immediately preceded or followed by the brackets.
Dim regex As Regex = New Regex("\[[^-]*-[^-]*\]")
Dim match As Match = regex.Match("A long string containing square brackets [some-name]")
If match.Success Then
Console.WriteLine(match.Value)
End If
Or you could use Regex.IsMatch:
Return Regex.IsMatch("A long string containing square brackets [some-name]",
"\[[^-]*-[^-]*\]")
You may match and capture the [...] substrings and then only match hyphens that are not surrounded with hyphens to replace them:
Dim nStr As String = "SELECT 'some-name' FROM [some-name]"
Dim nResult = Regex.Replace(nStr, "(\[.+?])|\s*-\s*", New MatchEvaluator(Function(m As Match)
If m.Groups(1).Success Then
Return m.Groups(1).Value
Else
Return " - "
End If
End Function))
So, what is happening is:
(\[[^]]+]) - matches and stores the value of [...] substring inside the Group(1) buffer (or \[.+?] can be used here to match a [, then 1 or more any characters and then ] - with RegexOptions.Singleline flag so that . could match a newline, too)
(?<!\s)-(?!\s) - matches any hyphen not preceded ((?<!\s)) or followed ((?!\s)) with whitespace (\s). Actually, we may even use \s*-\s* (where \s* stands for zero or more whitespaces as many as possible since * is a greedy quantifier matching zero or more occurrences of the quantified subpattern) here to remove any whitespace there is to make sure we just insert 1 space before and after -.
If Group 1 matches, then we just re-insert it (Return m.Groups(1).Value), else we insert the space-enclosed hyphen Return " - ".
Just to check if it exists, you could try
\[[^\]]+-[^\]]+\]
It matches a literal [ and then any characters, except ], up to (including) a hyphen. Then again any characters, except ], up to a literal ].
See it here at regex101.
Actually I don't know the vb.net syntax but you can use regex as
/[\s\'](\w+)\-(\w+)/g
find the (\w+)-(\w+) which is followed by space or ' and replace your string with capture group 1st - 2nd
See the sample here