Grep exact matching string (path) using wildcards (regex) - regex

I have a text with a list of paths and I'd like only select one specific path to a yml inside all directories with a specific prefix (foo).
So when I run grep -Eo "^config/foo.*/db.yml", it also selects other db.yml files in subdirectories and that's not what I'm expecting :(
Actual output:
config/footestto/db.yml
config/foodummy/db.yml
config/footestto/prod/db.yml
Expected output:
config/footestto/db.yml
config/foodummy/db.yml
Could you please help me? there could be something wrong with my regex. Thanks.

$ egrep -w '^config\/foo[a-z]+\/db.yml' test.txt
Your regex is incorrect:
\/ matches the character /
[a-z]+ matches letters one and unlimited times
until it founds /db.yml

Related

Remove end of file after matching regex keeping the expression matched in multiple files (sed?)

I'm cleaning up a lot of markdown files to import them into Pelican (a static website generator). While compiling I get errors about the date format in multiple files. What I need to do is leave the date (yyyy-mm-dd) and delete to the end of the line after it. This is the last try I've made with sedand RegEx:
sed -i "s/\(\d{4}-\d{2}-\d{2}\)\*/\1 /g" *.md
My hope was that sed would take the whole pattern within the parenthesis as 1 and then keep it as the substitution string.
This is an example of the errors (all numbers change):
ERROR: Could not process ./2010-12-28-the-open-internet-a-case-for-net-neutrality.html.md
| ValueError: '2010-12-28 21:22:00.000000000 +01:00 true' is not a valid date
ERROR: Could not process ./2011-05-27-two-one-must-read-internet-business-book.html.md
| ValueError: '2011-05-27 13:08:00.000000000 +02:00 true' is not a valid date
I've looked around SO but all I've found is about static strings, while mine change all the time.
Thanks for your help.
Please take care of these files, at least make a backup before using sed on them.
This can be done by using the i flag with an extension: -i.bckup.
So I am not sure that You want to modify the content of the files or the names itself.
An expression that would only keep the date would be:
sed -r 's/([^-]*[-][^-]*[-][^-]*).*/\1/'
I suspect your sed is not seeing \d as a metacharacter meaning [0-9], so use it instead.
sed -i -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/' *.md
Note:
# with the -r extended regex option you do not escape your pattern groupings ()
# no need for the /g option since you are removing everything after the first match
# .* is probably the wildcard you meant to use. * matches any number of the preceeding pattern and . matches any single character.
Here is a command line test:
echo '2011-05-27 13:08:00.000000000 +02:00 true' | sed -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/'
which outputs:
2011-05-27

Regex to match a path, excluding the final directory. Perl error "search pattern not terminated at -e"

Say I have a directory path
/abc/de/fgh/i/jk
And I just want to match
/abc/de/fgh/i/
How would I do this? I have tried:
(\/.*\/)*
In https://regexr.com/ and it seems to do the trick.
However, when I try:
path="/abc/de/fgh/i/jk"
sliced=`echo $path | perl -pe '(\/.*\/)*'`
I get the error:
Search pattern not terminated at -e line 1
Is there a better way to do this besides perl? What is wrong here anyway to give a perl error? Is my regex even correct here?
There is a coreutil to do this, dirname:
path="/abc/de/fgh/i/jk"
sliced=`dirname "$path"`
However, it also removes the trailing /. If you, for whatever reason, need it, just append it to sliced:
sliced=`dirname "$path"`/
Be sure to quote $path; it won't work with paths with spaces in them if you don't.
Your Perl program isn't working because you haven't put any regex operators around it. It ignores the first \ since it isn't in a string or regex, then finds the /, which begins a regex. Since there's no unescaped / to end it, it gives you that error. You probably want something like s/(\/.*\/)*.*/$1/, which should erase everything after the last /.
Use sed, but match what you want to remove (rather than what you want to keep):
echo $path | sed 's,[^/]*$,,'
Outputs /abc/de/fgh/i/
The regex matches all non-slash chars at the end and replaces them with nothing, removing them.
Why not use Parameter Expansion ?
echo "${path%/*}/"

grep regex to find emails of a certain tld

I'd like to run a grep command that searches a text file and should match email address with a certain tld.
Example, if the text file contains the following lines
tom#google.com
mark#google.com
tom.comber#google.cz
And I'm searching for the .com tld emails:
It should match tom#google.com and mark#google.com but not tom.comber#google.cz
I'm currently using the follow grep command, which matches pretty much every string that contains a .com. I want it to match specifically the tld of the domain
grep -rnwi "/Users/Me/Desktop/Folder/" -e ".com"
EDIT
grep -rnwi '#.+\.com$' "/Users/Me/Desktop/Folder/" matches nothing. but grep -rnwi "/Users/Me/Desktop/Folder/" -e "hotmail.com" matches plenty. I don't want just hotmail.com but all .com emails
EDIT2, this seem to match nothing either. is it because I'm searching in multiple text files in a folder?
grep -rnwi '#.\+\.com$' "/Users/Me/Desktop/Folder/"
EDIT3: wasn't totally clear. There are characters after the .tld extension so I had to leave off the trailing $. That works.
Do:
grep '#.\+\.com$' file.txt
#.\+ matches a # followed by one or more characters
\.com$ matches literal .com at the end
to do the same for other TLDs, replace com at the end with that TLD.

extract pattern using powershell script

My bad, I have updated the question-its using Powershell
my file contains 1000s of lines like below:
<dependency org="${abcd}" name="some-random-name" rev="100.100" conf="compile;runtime"/>
I would like to get only the output like:
name="some-random-name"
how can i achieve this. please help
This probably will solve your issue:
cat <file> | grep -oP 'name="[\w-]*"'
Explaining:
grep is the tool that print lines matching a pattern
-o option will print only the matching parts
-P option will use Perl-style regex in order to allow the \w metacharacter.
[\w-]* will match any string containing only 'word' characters or dash with size >= 0

Unix: Cut string by regex delimiter

I have function that prints out the longest path in directory tree. Let's say the function prints this: ./.mozilla/firefox/z6upkljn.default/storage/permanent/chrome/idb/2918063365piupsah.files
What I want to do is to cut this string after match with user defined regex.
For example if user puts in regex like: *de?a*, the only match is z6upkljn.default. So at the end, the output will be ./.mozilla/firefox
Here is a code sample I found sed 's/My_expression.*//'
Where the My_expression is regular expression and delimiter for cutting defined by user.
It works for this input $echo /homes/eva/xm/xmikfi00 | sed 's/mik.*//', where for output I get /homes/eva/xm/x. As expected.
But if I enter simple regex $echo /homes/eva/xm/xmifki00 | sed 's/mi?.*//', the output is /homes/eva/xm/xmikfi00. Anyone who can help me how to get the same output as in the previous example?
I'll be glad for any help or suggestions, thanks.
Sed uses (by default) POSIX BREs, not EREs. If what you're trying to match with your ? is "any character", use a .: echo /homes/eva/xm/xmifki00 | sed 's/mi..*//'.
See man 7 re_format for more details.