Include 'Somestring' but excluding those that contain 'Otherstring' [duplicate] - regex

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
I was reading this SO post on excluding cases with a given string.
The selected answer there uses ^((?!hede).)*$ to exclude string 'hede'.
I have strings like the following:
apples_IOS_QA
apples_Android_QA
oranges
bananas
banannas_QA
apples_Android
apples_IOS
QA_apples_IOS // note sometimes 'QA' is at the beginning of the string
I'd like to return non QA versions of apples.
Tried (Within. Presto SQl Query):
and regexp_like(game_name, '^((?!QA).*$^apples.*)')
No results returned
Then tried:
and regexp_like(game_name, '^apples.*(!?QA)')
This runs and returns apples but gives me QA results only when in fact I wanted to exclude those results.
Then tried:
and regexp_like(game_name, '^apples.*[^(QA)]')
This returns apples results only but includes those with string 'QA' within them.
How can I regex filter to include 'apples' but exclude any cases that contain sub string 'QA'?

You may use this lookahead based regex:
^(?!.*QA).*apples.*$
(?!.*QA) is the lookahead assertion that needs to be placed next to ^ to that condition is applied to entire string since it has .* before QA.

Related

How can I regex match a column in psql with or operators? [duplicate]

This question already has an answer here:
What regex matches any text that is entirely between square brackets?
(1 answer)
Closed 2 months ago.
I am trying to match a string containing a number, beginning with a [ or a space, and ending with a ] or a comma.
I have written the query as:
SELECT column FROM table WHERE column ~ '(\[| )1732529(\]|,)';
But the results contain rows where the entry includes other numbers as well such as: [1732604] or [1732561, 1738189].
I am expecting the return to include only rows where the entry matches the expression. For example: [1732529] or [1732529, 1728373]
Any advice on what I'm doing wrong with my regex is appreciated. This is my first time using it in psql.
Ended up getting a solution from ChatGPT to use the regex: r'^[[\s]1732529[],]' and it seems to work!

Combine two references into one or some serious regex work [duplicate]

This question already has an answer here:
Parse parameters and values of smarty-like string in PHP
(1 answer)
Closed 2 years ago.
Question, basically
If I have a regex ((key1)(value1)|(key2)(value2)), key1 is ref'd by $2 & key2 by $4. Is it possible to combine these into the same reference? (I'm guessing no)
Thus $7 might be key & $8 might be value, regardless of which capture group it originated in
Any regex masters who can solve the below? I've spent a couple hours on it and am kinda stuck.
I would like it to work across different regex engines with minimal modifications. Been testing with PCRE on regexr.com
What I'm doing
I'm trying to make a file format that is parsed into key/value pairs with a single regex.
There's just a few rules:
Keys are a string of characters at the start of a line, followed by a colon (:).
So far, I'm just using [a-z]+ for the keys, but that will be expanded to some more characters. I don't think that will functionally change the regex.
values can be multi-line
all white-space is trimmed from values
I don't think I've added this to the regex yet
Values end when another key begins
delimiters can be used to wrap values in the format key:DELIM: then the value, then :DELIM: on it's own line.
Delimiter can be an empty string, thus :: serves as a delimiter
The regex I have
Correctly matches non-delimited keys & values
([a-z]+):((?:(?:.|\n|\r)(?!^[a-z]+:))+)
Correctly matches delimited keys & values
([a-z]+):([A-Z]*:)((.|\r|\n)*)^:\2
Matches everything correctly, BUT requires two sets of references
(?:(?:([a-z]+):([A-Z]*:)((.|\r|\n)*)^:\2)|([a-z]+):((?:(?:.|\n|\r)(?!^[a-z]+:))+))
$1 & $5 are keys. $3 & $6 are values
Sample Input
key: value 1
nightmare:DELIM:
notakey:
obviously not a key
notakey:
:DELIM:
abc: value 2
new line
anotherkey:: value
nostring: on this one
::
Which would yield These key/value pairs
key
value1
nightmare
notakey:
obviously not a key
notakey:
abc
value 2
new line
anotherkey
value
nostring: on this one
My latest attempt
My latest attempt got me here, but it doesn't actually match anything:
^([a-z]+): # key CP#1
((?:[A-Z]*:)? # delimiter, optional
(?:\s*(\r?\n|$)) # whitespace, new line OR end of file (line?)
) # CP#2
( # value, CP#3
(?:(?:
(?:.|\n|\r) # characters we want
(?!^[a-z]+:) # But NOT if those characters make up a key
)+)
| # or
((.|\r|\n)*) # characters we want
^:\2 # Ends with delimiter
) # delimited value
Thanks to the commenter for the ?| operator, which turns out to be what I needed.
((key1)(value1)|(key2)(value2)) => (?|(key1)(value1)|(key2)(value2)).
(?|(?:([a-z]+):([A-Z]*:)((.|\r|\n)*)^:\2)|([a-z]+):()((?:(?:.|\n|\r)(?!^[a-z]+:))+)) basically does it, though the final product certainly still needs more work.

python regular expression splitting a string by all combinations of string starting with vowel [duplicate]

This question already has answers here:
How to get all overlapping matches in python regex that may start at the same location in a string?
(2 answers)
Closed 2 years ago.
problem
split a string by all combination where in each sub string starts with a vowels. For example a string like
BANANA need to be split into ANANA, ANAN, ANA, AN, A, ANA, AN, A, A
What I tried
import re
data_k=re.findall(r'(?=([AEIOU].*))','BANANA')
data_2=[s[:i] for s in data_k for i in range(1,len(s)+1)]
data_2
Do we have any faster method to do this , for large string they are giving me memory error, especially the second operation where I split each value in list.
Here is the solution without regular expression (but it only gives numbers of substrings in that string staring with vowel because If we try to get all combination then we will ends up with memory error for large strings.) .
(https://i.stack.imgur.com/aiA3j.jpg)!

Pattern matchin using regex in grep in end of line [duplicate]

This question already has answers here:
RegEx to match full string
(4 answers)
Closed 3 years ago.
I have a file, let's say abc.txt, which contains below kind of data:
AB8PDSYU_DFRH
AB8PDSPO_RET
AB8PDSYT_DPRO
AB0PDSTR_GHRJT
AB0PDSQW_GTJY
My expected output is just to be in format A{either B0 or B8}PDS{exactly 2 char}_{exactly 4 char}, as per this rule, my output should be only:
AB8PDSYU_DFRH
AB8PDSYT_DPRO
AB0PDSQW_GTJY
I am using the below grep command:
grep -E '^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}' abc.txt
and getting output:
AB8PDSYU_DFRH
AB8PDSYT_DPRO
AB0PDSTR_GHRJT
AB0PDSQW_GTJY
I have mentioned [[:alpha:]]{4}, which ideally should match exactly 4 alphabets only. But, it is not working like this and giving me AB0PDSTR_GHRJT as well in the output.
Please let me know what I am missing here.
You need to add a way to detect that you want nothing more after for your match, or else it matches a part of the line, like $ to precise the end of the string, or [[:space]] (equivalent to \s) for any whitespace.
I'm no expert in grep, depending on if it treats it multiline or not, one of these should work:
^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}$
^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}($|[[:space]])

How to find a specific string followed by a number, with any number of characters between? [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
I'm trying to write a regex for the following pattern:
[MyLiteralString][0 or more characters without restriction][at least 1 digit]
I thought this should do it:
(theColumnName)[\s\S]*[\d]+
As it looks for the literal string theColumnName, followed by any number of characters (whitespace or otherwise), and then at least one digit. But this matches more than I want, as you can see here:
https://www.regex101.com/r/HBsst1/1
(EDIT) Second set of more complex data - https://www.regex101.com/r/h7PCv7/1
Using the sample data in that link, I want the regex to identify the two occurrences of theColumnName] VARCHAR(10) and nothing more.
I have 300+ sql scripts which containing create statements for every type of database object: procedures, tables, triggers, indexes, functions -- everything. Because of that, I can't be too strict with my regex.
A stored procedure's file might include text like LEFT(theColumnName, 10) which I want to identify.
A create table statement would be like theColumnName VARCHAR(12).
So it needs to be very flexible as the number(s) isn't always the same. Sometimes it's 10, sometimes it's 12, sometimes it's 51 -- all kinds of different numbers.
Basically, I'm looking for the regex equivalent of this C# code:
//Get file data
string[] lines = File.ReadAllLines(filePath);
//Let's assume the first line contains 'theColumnName'
int theColumnNameIndex = lines[0].IndexOf("theColumnName");
if (theColumnNameIndex >= 0)
{
//Get the text proceeding 'theColumnName'
string temp = lines[0].Remove(0, theColumnNameIndex + "theColumnNameIndex".Length;
//Iterate over our substring
foreach (char c in temp)
{
if (Char.IsDigit(c))
//do a thing
}
}
(theColumnName).*?[\d]+
That'll make it stop capturing after the first number it sees.
The difference between * and *? is about greediness vs. laziness. .*\d for example would match abcd12ad4 in abcd12ad4, whereas .*?\d would have its first match as abcd1. Check out this page for more info.
Btw, if you don't want to match newlines, use a . (period) instead of [\s\S]