This question already has answers here:
Extract value from a list of key-value pairs using grep
(3 answers)
Closed 3 years ago.
Hopefully this is a simple mistake I am making, I am fairly new to regex in general. Basically I am trying to extract the name of a website from a text file.
myfile.txt example:
Hello please enjoy your stay at%sbananas.com%sfor the rest of the day. Bye now!
I am trying to extract only the word bananas from this. My regex is as follows:
/(?<=m%s)(.*?)(?=\.com)/
Using regexr online it works just fine but in GREP code I just can't figure out how to get this to work properly. It doesn't return any results. I have tried several variants of the following:
grep "/(?<=m%s)(.*?)(?=\.com)/" myfile.txt
grep -E "/(?<=m%s)(.*?)(?=\.com)/" myfile.txt
grep '/(?<=m%s)(.*?)(?=\.com)/' myfile.txt
grep "(?<=m%s)(.*?)(?=\.com)" myfile.txt
grep '(?<=m%s)(.*?)(?=\.com)' myfile.txt
Nothing seems to work. I would love if someone could point me in the right direction.
The problem with regular expressions in grep and other Unix tools is that they usually support one, two or three different kinds of regular expressions. These are:
Basic regular expressions (BRE)
Extended regular expressions (ERE or EREG)
Perl compatible regular expressions (PCRE or PREG)
Your pattern is in PCRE syntax, therefore you need to identify your pattern as one (using -P). Note that I also removed the m between = and % (I don't know what that was supposed to do).
grep -Po "(?<=%s)(.*?)(?=\.com)" myfile.txt
With -o, you say you only want to print the matching part. My grep man page declares PCRE in grep as experimental so there probably might be cases where you'd get a segmentation fault or where the evaluation takes unusually much time.
I'm trying to make what I'm able to match in Notepad++ using Regular Expression, have the ability to be grepped. I want to match email:32characters(a-f0-9):3characters(ANYCHARACTER/SYMBOL)
Here's an example:
Stack#overflow.com:999999999999999999999999999999a1:&U,
So far I've been able to match using regex using:
[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}:[a-f0-9]{32}:
But i'm unsure on how to match the last 3 characters (WHICH CAN BE ANYTHING).
Furthermore, when trying to:
grep "[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}:[a-f0-9]{32}:" input.txt > output.txt
Nothing is being outputted to my file which seems strange to me. I am using Cygwin Terminal on Windows to perform these greps.
Use grep -E or egrep if that is available in your environment.
-E, --extended-regexp
Interpret pattern as an extended regular expression (i.e. force grep to behave as egrep).
I'm trying to write a linux bash script that takes in input a csv file with lines written in the following format (something can be blank):
something,something,,number,something,something,something,something,something,something,,,
something,something.something,,number,something,something,something,something,something,something,,,
and i have to have as output the following format (if the lines contains . it has to separate the two substring in substring1,substring2 and remove one , character, else do nothing)
something,something,,number,something,something,something,something,something,something,,,
something,something,something,number,something,something,something,something,something,something,,,
I tried to parse each line of the file and check if it respects a regex, but the command starts a never ending loop (don't know why) and morevor don't know how to divide the substring to have as output substring1,substring2
for f in /filepath/filename.csv
do
while read p; do
if [[$p == .\..]] ; then echo $p; fi
done <$f
done
Thanks in advance!
I can't provide you with a working code at the moment but a piece of quick advice:
1. Try with tool called sed
2. Learn about "capture groups" for regex to get info on how to divide the text based on expressions.
To separate strings AWK will be useful
echo "Hello.world" | awk -F"." '{print "STR1="$1", STR2="$2 }'
Hope it will help.
As your task is more about transforming unrelated lines of text than of parsing fields of csv formatted files, sed is indeed the tool to go.
Learning to use sed properly, even for the most basic tasks, is synonym to learning regular expressions. The following invocation of sed command transforms your input sample to your expected output:
sed 's/\.\([^,]*\),/,\1/g' input.csv >output.csv
In the above example, s/// is the replacement command.
From the manpage:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful,
replace that portion matched with replacement. [...]
Explaining the regexp and replacement of the above command is probably out of the scope for the question, so I'll finish my answer here... Hope it helps!
Ok, i managed to use regexp, but the following command seems not working again:
sed '\([^,]*\),\([^,]*\)\.\([^,]*\),,\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,\11,\12,'
sed: -e expression #1, char 125: unknown command: `\'
grep '[:digit:]{1,}-{1,}' *.txt| wc -l
This command outputs: 0
grep '1-' *.txt| wc -l
However, this command outputs: 10598
Both commands are being run from the same directory. The first command should have returned greater than or equal to the output of the second command. Can anyone shed some insight about what is going on here?
echo 1 | grep '[:digit:]'
#nothing....
grep uses a different syntax, you need [[:digit:]] or [0-9].
The {1,} syntax is not supported by basic grep, you can use other modes, like the extended one with -E... Note: Normally one would use + for matching one or more characters....
General note: always test regexes in small parts to see that each part really does what you thought it does. Once the expression gets complicated, it's really hard to tell what went wrong.
I'm trying to search my codebase for instances of the following pattern:
m_vParts[foo] =
And similar instances with varying whitespace. So I can up with this regex:
m_vParts\[.*\]\s*=[^=]\s*
When I test this at http://gskinner.com/RegExr/ and other regex-tester type sites, it finds exactly what I want. However, when I actually grep (or egrep) I get no results. My guess is my regex isn't well-formed for grep's dialect of regexes, but I'm not sure exactly where I'm off.
Here is the actual command I give:
[e]grep -Irn "m_vParts\[.*\]\s*=[^=]\s*" .
I've tried with both single and double quotes.
Here's a small sample of code that is exemplary of the codebase:
pcTab->m_vParts[iLastPart] = pcPart;
if ( m_pcCurrentTab->m_vParts[i]== pcPart )
I would expect that the first line would be a match, and the second line would not.
Also, I should note that I'm using GnuWin32 grep on Windows 7 x64.
Thanks in advance for any guidance here; very much trying to avoid the non-automated search :)
Just add quotes around the re:
$ vim 1.txt
$ egrep 'm_vParts\[.*\]\s*=[^=]\s*' 1.txt
pcTab->m_vParts[iLastPart] = pcPart;
As you can see, all works.