regex grab a word - regex

i'm trying to grab a regex from source, but only name from this type.
"name":"HELP-PERP","posOnly":false,"price":40.3,"priceIncrement":0.01,"quote":null,"quoteV":73851918.483,"restricted":false,"sizeIncrement":0.01,"type":"future",
So i got here \b(\w*-PERP\w*)\b
This grabs the word HELP-PERP but duplicates it, so i'm trying to grab that word that matches the type =future.
Grab help-perp that is in the same line with type":"future".
Total nub at this, i've tried several things on regex101 and can't come up :(
Thank you

You can use
/\w*-PERP\w*\b(?=.*type":"future")/g
See the regex demo.
Details
\w*-PERP\w* - zero or more word chars, -PERP, and again zero or more chars
\b - a word boundary
(?=.*type":"future") - a positive lookahead that matches a location in string that is immediately followed with any zero or more chars other than line break chars as many as possible (.*) and then a type":"future" string.

Related

How to exclude a specific string with REGEX? (Perl)

For example, I have these strings
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLETEA1B
APPLEWINE3B
APPLEWINE1C
I want all of these strings except those that have TEA or WINE1C in them.
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLEWINE3B
I've already tried the following, but it didn't work:
^APPLE(?!.*(?:TEA|WINE1C)).*$
Any help is appreciated as I'm also kinda new to this.
If you indeed have mutliple strings as you claim, there's no need to jam all that in one regex pattern.
/^APPLE/ && !/TEA|WINE1C/
If you have a single string, the best approach is probably to splice it into lines (split /\n/), but you could also use a single regex match too
/^APPLE(?!.*TEA|WINE1C).*/mg
You can use
^APPLE(?!.*TEA)(?!.*WINE1C).*
See the regex demo.
Details:
^ - start of string
APPLE - a fixed string
(?!.*TEA) - no TEA allowed anywhere to the right of the current location
(?!.*WINE1C) - no WINE1C allowed anywhere to the right of the current location
.* - any zero or more chars other than line break chars as many as possible.
If you don't want to match a string that has both or them (which is not in the current example data):
^APPLE(?!.*(WINE1C|TEA).*(?!\1)(?:TEA|WINE1C)).*
Explanation
^ Start of string
APPLE match literally
(?! Negative lookahead
.*(WINE1C|TEA) Capture either one of the values in group 1
.* Match 0+ characters
(?!\1)(?:TEA|WINE1C) Match either one of the values as long as it is not the same as previously matched in group 1
) Close the lookahead
.* Match the rest of the line
Regex demo

Regex for replacing anything other than characters, more than one spaces and number only in end with empty char

I want to replace anything other than character, spaces and number only in end with empty string or in other words: we replace any number or spaces comes in-starting or in-middle of the string replace with empty string.
Example
**Input** **Output**
Ndd12 Ndd12
12Ndd12 Ndd12
Ndd 12 Ndd 12
Nav G45up Nav Gup
Attempted Code
regexp_replace(df1[col_name]), "(^[A-Za-z]+[0-9 ])", ""))
You may use:
\d+(?!\d*$)|[^\w\n]+(?!([A-Z]|$))
RegEx Demo
Explanation:
\d+(?!\d*$): Match 1+ digits that are not followed by 0+ digits and end of line
|: OR
[^\w\n]+(?!([A-Z]|$)): Match 1+ non-word characters that are not followed by an uppercase letter or and end of line
if you use python, you can use regular expressions.
You can use the re module.
import re
new_string = re.sub(r"[^a-zA-Z0-9]","",s)
Where ^ means exclusion.
Regular expressions exist in other languages. So it would be helpful to find a regular expression.
I came up with this regex to capture all characters that you want to remove from the string.
^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+
Do
regexp_replace(df1[col_name]), "^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+", ""))
Regex Demo
Explanation:
^\d+ - captures all digits in a sequence from the start.
(?<=\w)\d+(?![\d\s]) - Positive look behind for a word character with a negative look ahead for a number followed by space and capturing a sequence of digits in the middle. (Captures digits in G45up)
(?<=\s)\s+ - positive look behind for a space followed by one or more spaces, capturing all additional spaces.
Note : This regex could be inefficient when matching large strings as it uses expensive look-arounds.
^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+|(?<=\w)\W|\W(?=\w)|(?<!\w)\W|\W(?!\w)

Multiline PCRE, multiple conditions

just starting out with regex and have hit a stumbling block. Hoping someone might be able to explain the workaround.
Trying to carry out a multi-line search. I wish to use "*" as the 'flag', so to speak: if a line contains an asterisk it should match. The digits at the start of the line should be output, so should the word "Match" in the linked example, excluding the asterisk itself.
I assume my use of "|" is dividing the regex into two conditions, when it actually needs to satisfy both to match.
https://regex101.com/r/Pu56bi/2
(?m)(^\d+)|(?<=\*).*$
Any help kindly appreciated.
You could use a pos. lookahead as in
^(?=.*?\*)(\d+).+?(Match)$
See your modified example on regex101.com.
If Match is always at the end of the string, you could match the digits at the start of the string, then match an * and Match at the end of the string.
Use a word boundary \b to prevent the word of digits being part of a longer word.
^(\d+)\b.*\*.*\b(Match)$
Regex demo
If there can be test after the word Match you can assert * using a positive lookahead.
^(?=.*\*)(\d+)\b.*\b(Match)\b.*$
Regex demo

Ignore one word with regex

I know there are several similar questions already asked. But can't fix this issue with regex.
I have sentence like
Lorem IpsumĀ is http://stack.com text of the http://stack.com/wp-admin
printing and typesetting industry.
I want to cache the word "stack.com" but not stack.com/wp-admin
I have tried few regex but it's not working.
^(?!stack.com$).*
The ^(?!stack.com$).* regex matches any string (even an empty one) that does not start with stack.com.
To match stack.com but not inside stack.com/wp-admin, you need a negative lookahead:
/stack\.com(?!\/wp-admin)/
^^^^^^^^^^^^^
Or better, with word boundaries to only match whole words:
/\bstack\.com\b(?!\/wp-admin)/
See the regex demo
Details:
\b - a leading word boundary
stack\.com - a literal string stack.com (a dot must be escaped)
\b - a trailing word boundary
(?!\/wp-admin) - a negative lookahead that fails the match if there is /wp-admin immediately to the right of the current location.

Regular expression in Vim to match group capture

I want to find the words which contain the same string repeated twice.
(e.g. wookokss(ok/ok), ccsssscc(ss/ss)).
I think the expression is \(\w*\)\0.
Another try is to find the words which consist of the same string repeated twice. My answer is \<\(\w*\)\0\>. (word beginning + grouping(word) + group capture + word ending)
But they don't work. Could anybody help me?
To find a string repeated twice in a word, which is longer than two characters, you can use
/\(\w\{2,}\)\1
To match a whole word which contains beforementioned string, you can use
/\<\w\{-}\(\w\{2,}\)\1\w\{-}\>
Little bit of explanation
\1 - matches the same string that was matched by the first sub-expression in \( and \) (\0 matches the whole matched pattern)
\{n,} - matches at least n of the preceding atom, as many as possible
\{-} - matches 0 or more of the preceding atom, as few as possible
\w - the word character ([0-9A-Za-z_])
\< - the beginning of a word
\> - the end of a word
More in :help pattern
1.) words which contain the same string repeated twice. (e.g. wookokss(ok/ok),
To find words containing two or more repeated word characters try
\(\w\{2,}\)\1
\1 matches what's captured in first group.
2.) find the words which consist of the same string repeated twice...
To capture \w\+ one or more word characters followed by \1 what's captured in first group
\<\(\w\+\)\1\>
should be about it. Have a look at this tutorial.
For the first one use (.{2,})\1 example here: https://regex101.com/r/gK0mM2/2
That is assuming that you only look for duplicate strings that have more than 1 character.
and for the second one ^(.{2,})\1$ example here: https://regex101.com/r/lC2yT7/2
Edit: changed the second expression, it now also looks for strings with at least 2 characters