Multiline Regular Expression replace

Multiline Regular Expression replace - regex

Ok, there's lots of regular expressions, but as always, none of them seem to match what I'm trying to do.
I have a text file:
F00220034277909272011
H001500020003000009272011
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000
and, with a multiline regex (.NET flavored), I want to do a replace so that I get:
H050100180030263709272011
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000
so that, basically, I grab everything that starts with [HD]0501 and nothing else.
I know this seems more suited to a match that a replace, but I'm going through a pre-built engine that accepts a Regex pattern string and a regex replace string only.
What can I supply for a pattern and a replace string to get my desired result? Multiline Regex is a hardcoded configuration?
I originally thought something like this would work:
search:
(?<Match>^[HD]0501\d+$), but this matched nothing.
search:
(?!^[HD]0501\d+$), but this matched a bunch of empty strings, and I couldn't figure out what to put for the replace string.
search:
(?!(?<Omit>^[HD]0501\d+$)), "Group 'Omit' not found."
It seems this should be simple, but as always, Regex manages to make me feel dumb. Help would be greatly appreciated.

Try matching the following pattern:
(?m)^(?![HD]0501).+(\r?\n)?
and replace it with an empty string.
The following demo:
using System;
using System.Text.RegularExpressions;
namespace Test
{
class MainClass
{
public static void Main (string[] args)
{
string input = #"F00220034277909272011
H001500020003000009272011
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000";
string regex = #"(?m)^(?![HD]0501).+(\r?\n)?";
Console.WriteLine(Regex.Replace(input, regex, ""));
}
}
}
prints:
H050100180030263709272011
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000
A quick explanation:
(?m)
enable multi-line mode so that ^ matches the start of a new line;
^
match the start of a new line;
(?![HD]0501)
look ahead to see if there's no "H0501" or "D0501";
.+
match one or more chars other than line break-chars;
(\r?\n)?
match an optional line break.

Related

Dart RegEx is not splitting String

Im a fresher to RegEx.
I want to get all Syllables out of my String using this RegEx:
/[^aeiouy]*[aeiouy]+(?:[^aeiouy]*\$|[^aeiouy](?=[^aeiouy]))?/gi
And I implemented it in Dart like this:
void main() {
String test = 'hairspray';
final RegExp syllableRegex = RegExp("/[^aeiouy]*[aeiouy]+(?:[^aeiouy]*\$|[^aeiouy](?=[^aeiouy]))?/gi");
print(test.split(syllableRegex));
}
The Problem:
Im getting the the word in the List not being splitted.
What do I need to change to get the Words divided as List.
I tested the RegEx on regex101 and it shows up to Matches.
But when Im using it in Dart with firstMatch I get null

You need to
Use a mere string pattern without regex delimiters in Dart as a regex pattern
Flags are not used, i is implemented as a caseSensitive option to RegExp and g is implemented as a RegExp#allMatches method
You need to match and extract, not split with your pattern.
You can use
String test = 'hairspray';
final RegExp syllableRegex = RegExp(r"[^aeiouy]*[aeiouy]+(?:[^aeiouy]*$|[^aeiouy](?=[^aeiouy]))?",
caseSensitive: true);
for (Match match in syllableRegex.allMatches(test)) {
print(match.group(0));
}
Output:
hair
spray

How to create "blocks" with Regex

For a project of mine, I want to create 'blocks' with Regex.
\xyz\yzx //wrong format
x\12 //wrong format
12\x //wrong format
\x12\x13\x14\x00\xff\xff //correct format
When using Regex101 to test my regular expressions, I came to this result:
([\\x(0-9A-Fa-f)])/gm
This leads to an incorrect output, because
12\x
Still gets detected as a correct string, though the order is wrong, it needs to be in the order specified below, and in no other order.
backslash x 0-9A-Fa-f 0-9A-Fa-f
Can anyone explain how that works and why it works in that way? Thanks in advance!

To match the \, folloed with x, followed with 2 hex chars, anywhere in the string, you need to use
\\x[0-9A-Fa-f]{2}
See the regex demo
To force it match all non-overlapping occurrences, use the specific modifiers (like /g in JavaScript/Perl) or specific functions in your programming language (Regex.Matches in .NET, or preg_match_all in PHP, etc.).
The ^(?:\\x[0-9A-Fa-f]{2})+$ regex validates a whole string that consists of the patterns like above. It happens due to the ^ (start of string) and $ (end of string) anchors. Note the (?:...)+ is a non-capturing group that can repeat in the string 1 or more times (due to + quantifier).
Some Java demo:
String s = "\\x12\\x13\\x14\\x00\\xff\\xff";
// Extract valid blocks
Pattern pattern = Pattern.compile("\\\\x[0-9A-Fa-f]{2}");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<>();
while (matcher.find()){
res.add(matcher.group(0));
}
System.out.println(res); // => [\x12, \x13, \x14, \x00, \xff, \xff]
// Check if a string consists of valid "blocks" only
boolean isValid = s.matches("(?i)(?:\\\\x[a-f0-9]{2})+");
System.out.println(isValid); // => true
Note that we may shorten [a-zA-Z] to [a-z] if we add a case insensitive modifier (?i) to the start of the pattern, or just use \p{Alnum} that matches any alphanumeric char in a Java regex.
The String#matches method always anchors the regex by default, we do not need the leading ^ and trailing $ anchors when using the pattern inside it.

Regex Match whole word string in coldfusion

Im trying this example
first example
keyword = "star";
myString = "The dog sniffed at the star fish and growled";
regEx = "\b"& keyword &"\b";
if (reFindNoCase(regEx, myString)) {
writeOutput("found it");
} else {
writeOutput("did not find it");
}
Example output -> found it
second example
keyword = "star";
myString = "The dog sniffed at the .star fish and growled";
regEx = "\b"& keyword &"\b";
if (reFindNoCase(regEx, myString)) {
writeOutput("found it");
} else {
writeOutput("did not find it");
}
output -> found it
but i want to find only whole word. punctuation issue for me how can i using regex for second example output: did not find it

Coldfusion does not support lookbehind, so, you cannot use a real "zero-width boundary" check. Instead, you can use groupings (and fortunately a lookahead):
regEx = "(^|\W)"& keyword &"(?=\W|$)";
Here, (^|\W) matches either the start of a string, and (?=\W|$) makes sure there is either a non-word character (\W) or the end of string ($).
See the regex demo
However, make sure you escape your keyword before passing to the regex. See ColdFusion 10 now provides reEscape() to prepare string literals for native RE-methods.
Another way is to match spaces or start/end of string:
<cfset regEx = "(^|\s)" & TABLE_NAME & "($|\s)">

Use regex to find a phrase with symbols in an URL

I have several pages with the current url:
onclick="location.href='https://www.mydomain.com/shop/bags
at the end of each url there's something like this:
?cid=Black'"
or
?cid=Beige'"
or
?cid=Green'"
What I need is a regex to find ?cid= in each url and then replace everything from ?cid= to the ending '
CUrrently I have this:
.?cid=.*?'
This finds occurences of ?cid= in EVERY line of code. I only want it to find occurrences in onclick="location.href='https://www.mydomain.com/shop/bags
Any one got any solutions for this?
UPDATE
Sorry for the initial confusion. I'm using this program http://www.araxis.com/replace-in-files/index-eur.html which allows the use of regex's to find elements. I think it says it allows PERL style regex.
Thanks

You can use lookaround syntax to match ?cid=something preceded by the URL and followed by a '
This pattern should work:
(?<=\Qhttps://www.mydomain.com/shop/bags\E)\?cid=[^']++(?=')
If you replace that pattern with your replacement then the entire bit from ?cid until ' will be replaced.
Here is an example in Java (ignore the slightly different syntax):
public static void main(String[] args) {
final String[] in = {
"onclick=\"location.href='https://www.mydomain.com/shop/bags?cid=Black'",
"onclick=\"location.href='https://www.mydomain.com/shop/bags?cid=Beige'",
"onclick=\"location.href='https://www.mydomain.com/shop/bags?cid=Green'"
};
final Pattern pattern = Pattern.compile("(?<=\\Qhttps://www.mydomain.com/shop/bags\\E)\\?cid=[^']++(?=')");
for(final String string : in) {
final Matcher m = pattern.matcher(string);
final String replaced = m.replaceAll("SOMETHING_ELSE");
System.out.println(replaced);
}
}
Output
onclick="location.href='https://www.mydomain.com/shop/bagsSOMETHING_ELSE'
onclick="location.href='https://www.mydomain.com/shop/bagsSOMETHING_ELSE'
onclick="location.href='https://www.mydomain.com/shop/bagsSOMETHING_ELSE'
This assumes, obviously, that your tools supports lookaround.
This should certainly work if you just use Perl directly rather than via your magic tool
perl -pi -e '/s/(?<=\Qhttps://www.mydomain.com/shop/bags\E)\?cid=[^\']++(?=\')/SOMETHING_ELSE/g' *some_?glob*.pattern
EDIT
Another idea is to use a capturing group and a backreference, replace
(\Qhttps://www.mydomain.com/shop/bags\E)\?cid=[^']++
With
$1SOMETHING_ELSE
Another test case in Java:
public static void main(String[] args) {
final String[] in = {
"onclick=\"location.href='https://www.mydomain.com/shop/bags?cid=Black'",
"onclick=\"location.href='https://www.mydomain.com/shop/bags?cid=Beige'",
"onclick=\"location.href='https://www.mydomain.com/shop/bags?cid=Green'"
};
final Pattern pattern = Pattern.compile("(\\Qhttps://www.mydomain.com/shop/bags\\E)\\?cid=[^']++");
for(final String string : in) {
final Matcher m = pattern.matcher(string);
final String replaced = m.replaceAll("$1SOMETHING_ELSE");
System.out.println(replaced);
}
}
Output:
onclick="location.href='https://www.mydomain.com/shop/bagsSOMETHING_ELSE'
onclick="location.href='https://www.mydomain.com/shop/bagsSOMETHING_ELSE'
onclick="location.href='https://www.mydomain.com/shop/bagsSOMETHING_ELSE'

Find
(onclick="location.href='https://www.mydomain.com/shop/bags.*?)\?cid=.*?'
Replace
$1something'

you can use this pattern
\?cid=[^']*
The idea is to use a character class that exclude the final simple quote, then you avoid to use a lazy quantifier.
Note: you can use a possessive quantifier if supported to give the regex engine less work:
\?cid=[^']*+

Regular expression problem with back slashes

I'm having trouble with what seems like a simple regex capture. I'm using AutoIt's stringRegExp() function.
The source string is:
1 U:\some text here\more text over here\06-17-2011\Folder 2\161804\abc9831\xyz10007569.JPG
I'm trying to capture "abc9831" and "161804". The "abc" part can be "abc", "def", or "ghi", followed by a string of digits. The "161804" can be replaced with any string of text. Everything is case insensitive. I'm currently using the following regex pattern:
(?i)\\\\.+\\\\((abc\d+)|(def\d+)|(ghi\d+))
But it's only capturing the "abc9831" part. How do I pick up the text string preceding it?

When the regex below is used in AutoIt's StringRegExp() function (using the flag "1" to return an array of matches), it returns 161804\abc9831. Is this what you're wanting to return?
.*\\([^\\]+\\[a-z]{3}\d+)\\.*
Here's an example you can run yourself:
#include <Array.au3>
$string = 'U:\some text here\more text over here\06-17-2011\Folder 2\161804\abc9831\xyz10007569.JPG'
$capture = StringRegExp($string,'.*\\([^\\]+\\[a-z]{3}\d+)\\.*',1)
_ArrayDisplay($capture)

(?i)\\\\(.+\\\\(abc\d+)|(def\d+)|(ghi\d+))
should do the trick if you want it all in one string (with a \ in between).
If you want two separate captures:
(?i)\\\\(.+)\\\\((abc\d+)|(def\d+)|(ghi\d+))

Edit: New version...
The raw regex is \b(\d+)\\((?:abc|def|ghi)\d+). The escaped string is \\b(\\d+)\\\\((?:abc|def|ghi)\\d+)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Multiline Regular Expression replace - regex

Related

Dart RegEx is not splitting String

How to create "blocks" with Regex

Regex Match whole word string in coldfusion

Use regex to find a phrase with symbols in an URL

Regular expression problem with back slashes

Categories

Resources