Regex match on rubular but not in Ruby - regex

I'm trying to return "A1RKKUPIHCS9HS" from string cheese:
cheese = "#<struct Peddler::Marketplace id=\"A1RKKUPIHCS9HS\",..."
I tried both scan and match like this:
cheese.match(/(?<=id=\\").{14}/)
cheese.scan(/(?<=id=\\")./)
It works on Rubular, but when I try it in Ruby, it doesn't. No idea why.

Enter the following as your test string at Rubular:
#<struct Peddler::Marketplace id="A1RKKUPIHCS9HS",...
That is, do not put the string in double quotes or escape double quotes within the string. Rubular will take care of that, just as it surrounds your regex with two forward slashes.
You want your regex to be /(?<=id=").{14}/. That's the same as /(?<=id=\").{14}/ since the double quote need not be escaped, but escaping it leaves it unchanged and therefore does no harm. Ruby treats double (and single) quotes with the regex as ordinary characters with no special meaning.

Just out of curiosity, using String#[]:
cheese = "#<struct Peddler::Marketplace id=\"A1RKKUPIHCS9HS\",..."
cheese[/(?<=id=").*?(?=")/]
#⇒ "A1RKKUPIHCS9HS"

You could do .scan(/id="(.{14})"/) as a simpler way.

Related

lookaround in POSIX regex to match all spaces except the last (for gsub)

...freaking out because of this simple problem:
I'm using an Ingest pipeline with the gsub processor to replace all (white)spaces except the last.
E.g.:
"hello world regex is fubar " to result in "hello, world, regex, is, fubar"
How can I convert the PCRE syntax (which won't work gsub TRE patterns, as I found out)
"/\s(?=.\S*)/g"
To POSIX, like...
"/[[:space:]](?=.[[:space:]]*)/g"
(only spaces exchanged, not the lookaround)
Edit: As I can only provide the regex in a string, I cannot use another processor than gsub. '\s' or '\S' are apparently marked as "unknown".
Worked using " +([^ ])" - another solution would be " +(.)".
(Both without the double quotes)
with the replacement/substitution string ,$1.
Thanks to Wiktor Stribiżew for pointing this out.
For whatever reason the POSIX literal [:space] does not work, why [[:space:]]+(.) did not work either, even tho it is a correct regex.

Regex: Exact match string ending with specific character

I'm using Java. So I have a comma separated list of strings in this form:
aa,aab,aac
aab,aa,aac
aab,aac,aa
I want to use regex to remove aa and the trailing ',' if it is not the last string in the list. I need to end up with the following result in all 3 cases:
aab,aac
Currently I am using the following pattern:
"aa[,]?"
However it is returning:
b,c
If lookarounds are available, you can write:
,aa(?![^,])|(?<![^,])aa,
with an empty string as replacement.
demo
Otherwise, with a POSIX ERE syntax you can do it with a capture:
^(aa(,|$))+|(,aa)+(,|$)
with the 4th group as replacement (so $4 or \4)
demo
Without knowing your flavor, I propose this solution for the case that it does know the \b.
I use perl as demo environment and do a replace with "_" for demonstration.
perl -pe "s/\baa,|,aa\b/_/"
\b is the "word border" anchor. I.e. any start or end of something looking like a word. It allows to handle line end, line start, blank, comma.
Using it, two alternatives suffice to cover all the cases in your sample input.
Output (with interleaved input, with both, line ending in newline and line ending in blank):
aa,aab,aac
_aab,aac
aab,aa,aac
aab_,aac
aab,aac,aa
aab,aac_
aa,aab,aac
_aab,aac
aab,aa,aac
aab_,aac
aab,aac,aa
aab,aac_
If the \b is unknown in your regex engine, then please state which one you are using, i.e. which tool (e.g. perl, awk, notepad++, sed, ...). Also in that case it might be necessary to do replacing instead of deleting, i.e. to fine tune a "," or "" as replacement. For supporting that, please show the context of your regex, i.e. the replacing mechanism you are using. If you are deleting, then please switch to replacing beforehand.
(I picked up an input from comment by gisek, that the cpaturing groups are not needed. I usually use () generously, including in other syntaxes. In my opinion not having to think or look up evaluation orders is a benefit in total time and risks taken. But after testing, I use this terser/eleganter way.)
If your regex engine supports positive lookaheads and positive lookbehinds, this should work:
,aa(?=,)|(?<=,)aa,|(,|^)aa(,|$)
You could probably use the following and replace it by nothing :
(aa,|,aa$)
Either aa, when it's in the begin or the middle of a string
,aa$ when it's at the end of the string
Demo
As you want to delete aa followed by a coma or the end of the line, this should do the trick: ,aa(?=,|$)|^aa,
see online demo

How to do regular Expression in AutoIt Script

In Autoit script Iam unable to do Regular expression for the below string Here the numbers will get changed always.
Actual String = _WinWaitActivate("RX_IST2_AM [PID:942564 NPID:10991 SID:498702881] sbivvrwm060.dev.ib.tor.Test.com:30000","")
Here the PID, NPID & SID : will be changing and rest of the things are always constant.
What i have tried below is
_WinWaitActivate("RX_IST2_AM [PID:'([0-9]{1,6})' NPID:'([0-9]{1,5})' SID:'([0-9]{1,9})' sbivvrwm060.dev.ib.tor.Test.com:30000","")
Can someone please help me
As stated in the documentation, you should write the prefix REGEXPTITLE: and surround everything with square brackets, but "escape" all including ones as the dots (.) and spaces () with a backslash (\) and instead of [0-9] you might use \d like "[REGEXPTITLE:RX_IST2_AM\ \[PID:(\d{1,6})\ NPID:(\d{1,5})\ SID:(\d{1,9})\] sbivvrwm060\.dev\.ib\.tor\.Test\.com:30000]" as your parameter for the Win...(...)-Functions.
You can even omit the round brackets ((...)) but keep their content if you don't want to capture the content to process it further like with StringRegExp(...) or StringRegExpReplace(...) - using the _WinWaitActivete(...)-Function it won't make sense anyways as it is only matching and not replacing or returning anything from your regular expression.
According to regex101 both work, with the round brackets and without - you should always use a tool like this site to confirm that your expression is actually working for your input string.
Not familiar with autoit, but remember that regex has to completely match your string to capture results. For example, (goat)s will NOT capture the word goat if your string is goat or goater.
You have forgotten to add a ] in your regex, so your pattern doesn't match the string and capture groups will not be extracted. Also I'm not completely sold on the usage of '. Based on this page, you can do something like StringRegExp(yourstring, 'RX_IST2_AM [PID:([0-9]{1,6}) NPID:([0-9]{1,5}) SID:([0-9]{1,9})]', $STR_REGEXPARRAYGLOBALMATCH) and $1, $2 and $3 would be your results respectively. But maybe your approach works too.

RegEx for quoted string with missing open parenthesis

What is RegEx for find quoted string having only close parenthesis at the end, like this :
"People)"
But not
"(People)"
Something like so: "[^(]+?\)" should fit the bill. You might also need to escape the quotation marks and the backslash as well, depending on what regex engine you are using.
Some details on how does this regex work are available here.
Can you try the following ?
String REGEX_TEST_STRING="\"People)\"";
System.out.println(REGEX_TEST_STRING.matches("\"P.*\)\""));
This code returns true for "People)" and false for "(People)"
HTH.

regular expression to split up searchphrase

I was hoping someone could help me writing a regex for c++ that matches words in a searchphrase, and explain it bit by bit for learning purposes.
What I need is a regex that matches string within " " like "Hello you all", and single words that starts/ends with * like *ack / overfl*.
For the quote part I have \"[\^\\s][\^\"]*\" but I can't figure out the wildcard (*) part, and how I should combine it with the quote regex.
Try this regular expression:
(?:\*?\w+\*?|"(?:[^\x5C"]+|\x5C(?:\x5C\x5C)*")*")+
For readability I replaced the backslash characters by \x5C.
The expression "(?:[^\x5C"]+|\x5C(?:\x5C\x5C)*")*" will also match "foo \"bar\"" and other proper escaped quote sequences (but only the " might be escaped).
So foo* bar *baz *quux* "foo \"bar\"" should be splitted into:
foo*
bar
*baz
*quux*
"foo \"bar\""
If you don’t want to match bar in the example above, use this:
(?:\*\w+|\w+\*|"(?:[^\x5C"]+|\x5C(?:\x5C\x5C)*")*")+
As long as there is no quote nesting (nesting in general is something regex is bad at):
"(?:(?<=\\)"|[^"])*"|\*[^\s]+|[^\s]+\*
This regex allows for escaped double quotes ('\"'), though, if you need that. And the match includes the enclosing double quotes.
This regex matches:
"A string in quotes, possibly containing \"escaped quotes\""
*a_search_word_beginning_with_a_star
a_search_word_ending_with_a_star*
*a_search_word_enclosed_in_stars*
Be aware that it will break at strings like this:
A broken \"string "with the quotes all \"mangled up\""
If you expect (read: can't entirely rule out the possibility) to get these, please don't use regex, but write a small quote-aware parser. For a one-shot search and replace activity or input in a guaranteed format, the regex is okay to use.
For validating/parsing user input, it is not okay to use. That's where I would recommend a parser. Knowing the difference is the key.