Pattern matchin using regex in grep in end of line [duplicate] - regex

This question already has answers here:
RegEx to match full string
(4 answers)
Closed 3 years ago.
I have a file, let's say abc.txt, which contains below kind of data:
AB8PDSYU_DFRH
AB8PDSPO_RET
AB8PDSYT_DPRO
AB0PDSTR_GHRJT
AB0PDSQW_GTJY
My expected output is just to be in format A{either B0 or B8}PDS{exactly 2 char}_{exactly 4 char}, as per this rule, my output should be only:
AB8PDSYU_DFRH
AB8PDSYT_DPRO
AB0PDSQW_GTJY
I am using the below grep command:
grep -E '^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}' abc.txt
and getting output:
AB8PDSYU_DFRH
AB8PDSYT_DPRO
AB0PDSTR_GHRJT
AB0PDSQW_GTJY
I have mentioned [[:alpha:]]{4}, which ideally should match exactly 4 alphabets only. But, it is not working like this and giving me AB0PDSTR_GHRJT as well in the output.
Please let me know what I am missing here.

You need to add a way to detect that you want nothing more after for your match, or else it matches a part of the line, like $ to precise the end of the string, or [[:space]] (equivalent to \s) for any whitespace.
I'm no expert in grep, depending on if it treats it multiline or not, one of these should work:
^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}$
^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}($|[[:space]])

Related

How can I regex match a column in psql with or operators? [duplicate]

This question already has an answer here:
What regex matches any text that is entirely between square brackets?
(1 answer)
Closed 2 months ago.
I am trying to match a string containing a number, beginning with a [ or a space, and ending with a ] or a comma.
I have written the query as:
SELECT column FROM table WHERE column ~ '(\[| )1732529(\]|,)';
But the results contain rows where the entry includes other numbers as well such as: [1732604] or [1732561, 1738189].
I am expecting the return to include only rows where the entry matches the expression. For example: [1732529] or [1732529, 1728373]
Any advice on what I'm doing wrong with my regex is appreciated. This is my first time using it in psql.
Ended up getting a solution from ChatGPT to use the regex: r'^[[\s]1732529[],]' and it seems to work!

Grep regex isn't fully matching invalid hexadecimal color codes

So I have a text file containing valid and invalid hexadecimal color codes. I want to be able to filter out the invalid codes and print just the valid ones. For a code to be valid it must have a hash symbol, be 6 or 8 characters long after the hash and characters must be a-f or 0-9. My grep command below, is stored in a makefile but it doesn't seem to be reason why my regex isn't working as it is getting rid of the codes that are less than 6 characters but not the ones that are 7 or more than 8 characters long or the codes that have non a-f letters. For testing purposes, I am using -v just to print out the invalid codes as I want to see what codes are matching and what ones aren't.
grep -vE '^#([a-f0-9]{6})|([a-f0-9]{8})$$' colours.txt
Codes:
#b293a6
#ead58f
#31511bxf
#a69d36a2
#067806
#afe6e
#7f0bf7ef
#dd85
#042847421
#1a283af
Output wanted:
#b293a6
#ead58f
#a69d36a2
#067806
#7f0bf7ef
Ouput I'm currently getting:
#afe6e
#dd85
Updated output:
sean#sean-VirtualBox:~/Desktop$ head colours.txt | cat -A
#b293a6^M$
#ead58f^M$
#a69d36a2^M$
#067806^M$
#7f0bf7ef^M$
#f8b366^M$
#042847421^M$
#8946d7^M$
#c927d4^M$
#3e568bff^M$
Your alternation pattern is incorrect. ^#([a-f0-9]{6})|([a-f0-9]{8})$$ means ^#[a-f0-9]{6} or [a-f0-9]{8})$ due to misplacement of brackets.
You may use this grep:
grep -ivE '^#[a-f0-9]{6}([a-f0-9]{2})?$' file
#31511bxf
#afe6e
#dd85
#042847421
#1a283af

Include 'Somestring' but excluding those that contain 'Otherstring' [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
I was reading this SO post on excluding cases with a given string.
The selected answer there uses ^((?!hede).)*$ to exclude string 'hede'.
I have strings like the following:
apples_IOS_QA
apples_Android_QA
oranges
bananas
banannas_QA
apples_Android
apples_IOS
QA_apples_IOS // note sometimes 'QA' is at the beginning of the string
I'd like to return non QA versions of apples.
Tried (Within. Presto SQl Query):
and regexp_like(game_name, '^((?!QA).*$^apples.*)')
No results returned
Then tried:
and regexp_like(game_name, '^apples.*(!?QA)')
This runs and returns apples but gives me QA results only when in fact I wanted to exclude those results.
Then tried:
and regexp_like(game_name, '^apples.*[^(QA)]')
This returns apples results only but includes those with string 'QA' within them.
How can I regex filter to include 'apples' but exclude any cases that contain sub string 'QA'?
You may use this lookahead based regex:
^(?!.*QA).*apples.*$
(?!.*QA) is the lookahead assertion that needs to be placed next to ^ to that condition is applied to entire string since it has .* before QA.

How to find a specific string followed by a number, with any number of characters between? [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
I'm trying to write a regex for the following pattern:
[MyLiteralString][0 or more characters without restriction][at least 1 digit]
I thought this should do it:
(theColumnName)[\s\S]*[\d]+
As it looks for the literal string theColumnName, followed by any number of characters (whitespace or otherwise), and then at least one digit. But this matches more than I want, as you can see here:
https://www.regex101.com/r/HBsst1/1
(EDIT) Second set of more complex data - https://www.regex101.com/r/h7PCv7/1
Using the sample data in that link, I want the regex to identify the two occurrences of theColumnName] VARCHAR(10) and nothing more.
I have 300+ sql scripts which containing create statements for every type of database object: procedures, tables, triggers, indexes, functions -- everything. Because of that, I can't be too strict with my regex.
A stored procedure's file might include text like LEFT(theColumnName, 10) which I want to identify.
A create table statement would be like theColumnName VARCHAR(12).
So it needs to be very flexible as the number(s) isn't always the same. Sometimes it's 10, sometimes it's 12, sometimes it's 51 -- all kinds of different numbers.
Basically, I'm looking for the regex equivalent of this C# code:
//Get file data
string[] lines = File.ReadAllLines(filePath);
//Let's assume the first line contains 'theColumnName'
int theColumnNameIndex = lines[0].IndexOf("theColumnName");
if (theColumnNameIndex >= 0)
{
//Get the text proceeding 'theColumnName'
string temp = lines[0].Remove(0, theColumnNameIndex + "theColumnNameIndex".Length;
//Iterate over our substring
foreach (char c in temp)
{
if (Char.IsDigit(c))
//do a thing
}
}
(theColumnName).*?[\d]+
That'll make it stop capturing after the first number it sees.
The difference between * and *? is about greediness vs. laziness. .*\d for example would match abcd12ad4 in abcd12ad4, whereas .*?\d would have its first match as abcd1. Check out this page for more info.
Btw, if you don't want to match newlines, use a . (period) instead of [\s\S]

Replace dots with underscores in right part of the line [duplicate]

This question already has answers here:
Substitution of characters limited to part of each input line
(4 answers)
Closed 6 years ago.
Say I have this piece of text:
some.blah.key={{blah.woot.wiz}}
some.another.foo.key={{foo.bar.qix.name}}
+ many other lines with a variable number of words separated by dots within {{ and }}
I'd like the following outcome after replacing dots with underscores in the right part (between the {{ and }} delimiters):
some.blah.key={{blah_woot_wiz}}
some.another.foo.key={{foo_bar_qix_name}}
...
I'm looking for the appropriate regex to perform the replacement in a one-liner sedcommand`.
I'm on a lead with this one: https://regex101.com/r/8wsLHo/1 but it capture all dots, including those on the left part, which I don't want.
I tried this variation to exclude those on the left part but then it doesn't capture anything anymore: https://regex101.com/r/d7WAmX/1
You can use a loop:
sed ':a;s/\({{[^}]*\)\./\1_/;ta' file
:a defines a label "a"
ta jumps to "a" when something is replaced.
I came up with this quite complex one-liner:
sed "h;s/=.*/=/;x;s/.*=//;s/\./_/g;H;x;s/\n//"
explanations:
h: put line in hold buffer
s/=.*/=/: clobber right part after =
x: swap to put line in main buffer again, first part in hold buffer
s/.*=//: clobber left part before =
s/\./_/g: perform replacement of dots now that there's only right part in main buffer
H: append main buffer to hold buffer
x: swap buffers again
s/\n//: remove linefeed or both parts appear on separate lines
that was quite fun, but maybe sed is not the best tool to perform that operation, this rather belongs to code golf