Regular expression using positive lookbehind not working in Alteryx

Regular expression using positive lookbehind not working in Alteryx - regex

I am trying to match a string the 2nd word after "Vores ref.:" using positive lookbehind. It works in online testers like https://regexr.com/, but my tool Alteryx dont allow quantifiers like + in a lookbehind.
"ABC This is an example Vores ref.: 23244-2234 LW782837673 Test 2324324"
(?<=Vores\sref.:\s\d+-\d+\s+)\w+ is correctly matching the LW78283767, on regexr.com but not in Alteryx.
How can I rewrite the lookahead expression by using quantifiers but still get what I want?

You can use a replacement approach here using
.*?\bVores\s+ref\.:\s+\d+-\d+\s+(\w+).*
Replace with $1.
See the regex demo.
Details:
.*? - any 0+ chars other than line break chars, as few as possible
\bVores - whole word Vores
\s+ - one or more whitespaces
ref\.: - ref.: substring
\s+ - one or more whitespaces
\d+-\d+ - one or more digits, - and one or more digits
\s+ - one or more whitespaces
(\w+) - Capturing group 1: one or more word chars.
.* - any 0+ chars other than line break chars, as many as possible.

You can use a capture group instead.
Note to escape the dot \. to match it literally.
\bVores\sref\.:\s\d+-\d+\s+(\w+)
The pattern matches:
\bVores\sref\.:\s\d+-\d+\s+ Your pattern turned into a match
(\w+) Capture group 1, match 1+ word characters
Regex demo

Related

Negating duplicate words pattern

I am new to regex and have the following pattern that detects duplicate words separated with dashes
\b(\w+)-+\1\b
// matches: hey-hey
// not matches: hey-hei
What I really need is a negated version of this pattern.
I've tried negative lookahead, but no good.
(?!\b(\w+)-+\1\b)

You can use
\b(\w+)-+(?!\1\b)\w+
See the regex demo. Details:
\b - a word boundary
(\w+) - Group 1: one or more word chars
-+ - one or more hyphens
(?!\1\b)\w+ - one or more word chars that are not equal to the first capturing group value.

Regex to capture everything after optional token

I have fields which contain data in the following possible formats (each line is a different possibility):
AAA - Something Here
AAA - Something Here - D
Something Here
Note that the first group of letters (AAA) can be of varying lengths.
What I am trying to capture is the "Something Here" or "Something Here - D" (if it exists) using PCRE, but I can't get the Regex to work properly for all three cases. I have tried:
- (.*) which works fine for cases 1 and 2 but obviously not 3;
(?<= - )(.*) which also works fine for cases 1 and 2;
(?! - )(.+)| - (.+) works for cases 2 and 3 but not 1.
I feel like I'm on the verge of it but I can't seem to crack it.
Thanks in advance for your help.
Edit: I realized that I was unclear in my requirements. If there is a trailing " - D" (the letter in the data is arbitrary but should only be a single character), that needs to be captured as well.

About the patterns that you tried:
- (.*)This pattern will match the first occurrence of - followed by matching the rest of the line. It will match too much for the second example as the .* will also match the second occurrence of -
(?<= - )(.*)This pattern will match the same as the first example without the - as it asserts that is should occur directly to the left
(?! - )(.+)| - (.+) This pattern uses a negative lookahead which asserts what is directly to the right is not (?! - ). As none of the example start with - , the whole line will be matched directly after the negative lookahead due to .+ and the second part after the alternation | will not be evaluated
If the first group of letters can be of varying length, you could make the match either specific matching 1 or more uppercase characters [A-Z]+ or 1+ word characters \w+.
To get a more broad match, you could match 1 or more non whitespace characters using \S+
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*
Explanation
^ Start of string
(?:\S+\h-\h)? Optionally match the first group of non whitespace chars followed by - between horizontal whitespace chars
\K Clear the match buffer (Forget what is currently matched)
\S+ Match 1+ non whitespace characters
(?: Non capture group
\h(?!-\h) Match a horizontal whitespace char and assert what is directly to the right is not - followed by another horizontal whitespace char
\S+ Match 1+ non whitespace chars
)* Close non capture group and repeat 1+ times to match more "words" separated by spaces
Regex demo
Edit
To match an optional hyphen and trailing single character, you could add an optional non capturing group (?:-\h\S\h*)?$ and assert the end of the string if the pattern should match the whole string:
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*\h*(?:-\h\S\h*)?$
Regex demo

You may use
^(?:.*? - )?\K.*?(?= - | *$)
^(?:.*?\h-\h)?\K.*?(?=\h-\h|\h*$)
See the regex demo
Details
^ - start of string
-(?:.*? - )? - an optional non-capturing group matching any 0+ chars other than line break chars as few as possible up to the first space-space
\K - match reset operator
.*? - any 0+ chars other than line break chars as few as possible
(?= - | *$) - space-space or 0+ spaces till the end of string should follow immediately on the right.
Note that \h matches any horizontal whitespace chars.

^(?:[A-Z]+ - \K)?.*\S
demo
Since "Something Here" can be anything, there's no reason to specially describe the eventual last letter in the pattern. You don't need something more complicated.
With this pattern I assume that you are not interested by the trailing spaces, that's why I ended it with \S. If you want to keep them, remove the \S and change the previous quantifier to +.

Matching all non whitespace characters after a string in Regex

I'm trying to match all the non whitespace characters after a string in Regex. In this example, I want to match "b" without the whitespaces and the slashes around it:
a: /b/
I tried using (?<=a:)([^\s\/]+) but it doesn't work.

You still need to account for / before b, not just for whitespace.
You may use a \K based regex (if your regex flavor is PCRE/Onigmo/Boost):
a:\s*\/\K[^\s\/]+
See the regex demo.
Also, if you are using a regex engine that supports unknown width lookbehind patterns, you may use
(?<=a:\s*\/)[^\s\/]+
See this regex demo.
Else, you need to capture your substring with parentheses:
a:\s*\/([^\s\/]+)
See this regex demo.
Details
a: - a a: string
\s* - 0+ whitespaces
\/ - a / char
\K - a match reset operator
[^\s\/]+ - 1+ chars other than whitespace and /.

Matching Word Regex

Hello i want to match with regex this word
(Parc Installé)
from this text:
31/1/2017 17:19:23,4245986,ct0001#Intotel.int,Parc Installé,100.100.30.100
I did this regex ',[A-Za-zA-zÀ-ú+ \/\w+0-9._%+-]+,'
But the result is : 4245986 ans Parc Installé.
How can i match only Parc Installé

You may try a regex based on a lookahead that will require a comma and digits/commas after it up to the end of string:
[^,]+(?=\s*,[\d.]+$)
See this regex demo
Details:
[^,]+ - 1 or more chars other than ,
(?=\s*,[\d.]+$) - a lookahead requiring
\s* - zero or more whitespaces
, - a comma
[\d.]+ - 1+ digits or dots up to...
$ - ... the end of string
To make it a bit more restrictive, you may replace the lookahead with (?=\s*,\d+(?:\.\d+){3}$) to require 4 sequences of dot-separated 1+ digits. See this regex demo.
If a lookahead is not supported (case with a RE2 engine), you might want to use a capturing group based solution:
([^,]+)\s*,[\d.]+$
Here, the part within (...) will be captured into Group 1 and will be accessible via a backreference or a function like =REGEXEXTRACT in Google Spreasheets that only retrieves the contents of a capturing group if the latter is present in the pattern.

How to rearrange code using Regular Expressions in HaxeDevelop / FlashDevelop Find and Replace

I'm trying to turn cast(("Sparkles"), GetBitmapData); to GetBitmapData("Sparkles");
I've got this for my find code:
cast\(\(\"\.*\"\),\ .*\);
but this replace doesn't work:
$2\(\"$1\"\);
What do I need to do to make this work?

You regex does not contain capturing groups and you try to access them with numbered backreferences. Besides, you escaped the dot, and \.* just matches 0+ dot symbols.
You may use the following regex replacement:
Find what: cast\(\("(.*?)"\),\s*(\w+)\);
Replace with: $2("$1");
Here is a .NET regex demo (FlashDevelop S&R feature uses .NET regex flavor).
Pattern details:
cast\(\(" - a cast((" substring
(.*?) - Group 1 (referred to with $1) capturing any 0+ chars as few as possible up to the first...
"\), - a "), substring
\s* - 0+ whitespaces
(\w+) - Group 2 (referred to with $2) capturing 1+ word chars (letters/digits/_)
\); - a ); substring.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression using positive lookbehind not working in Alteryx - regex

You can use a capture group instead. Note to escape the dot \. to match it literally. \bVores\sref\.:\s\d+-\d+\s+(\w+) The pattern matches: \bVores\sref\.:\s\d+-\d+\s+ Your pattern turned into a match (\w+) Capture group 1, match 1+ word characters Regex demo

Related

Negating duplicate words pattern

Regex to capture everything after optional token

Matching all non whitespace characters after a string in Regex

Matching Word Regex

How to rearrange code using Regular Expressions in HaxeDevelop / FlashDevelop Find and Replace

Categories

Resources