im trying to print the content of a html table cell.
i thought the easiest way to do this was with grep,
but for some reason the regex works on regexr.com but not within Grep.
Maybe something with escaping? i tried escaping al the smaller and larger than <> symbols.
This is the code i'm using
wget -q -O login.html --save-cookies cookies.txt --keep-session-cookies --post-data 'username=sssss&password=fffff' http://ffffff/login
wget -q -O page.html --load-cookies cookies.txt http://ffffff/somepage |grep -P '(?<=<tr><td class=list2>www</td><td class=list2 align=center>A</td><td class=list2 >)(.*?)(?=</td><td class=list2 align=center><input type=checkbox name=arecs5)' |recode html...ascii
Can anybody help me please? I'm from the netherlands so sorry for my english.
i aslo tried adding the -c option and it printed 0
EDIT:
Added my full code, i found 1 mistake. i didn't have the -O parameter to output the page's html. but it still doesnt work. it prints nothing
Traditional grep doesn't support lookarounds the way you're using it.
Try using grep -P (PCRE):
grep -P 'pattern' file
Consider using Ack or ag that supports natively PCRE.
Finally, it works.
I added -qO- to wget, i don't know why but when adding a - after the -O it works.
Related
I can split one large mp3 file into several files based on silence using the mp3split command / program below
mp3splt -f -t 4.0 -a -d split audio_file.mp3
and I get
split/audio_file_000m_00s_005m_00s.mp3
but how can I get
split/000m_00s_005m_00s_audio_file.mp3
or increment by one in the front
split/000_audio_file_000m_00s_005m_00s.mp3
split/001_audio_file_005m_00s_010m_00s.mp3
I looked at the syntax http://wiki.librivox.org/index.php/How_To_Split_With_Mp3Splt but couldn't figure out what needs to change in my syntax.
I'm using ubuntu 16.04 64bit linux
You need to set the -o (output format) option.
Try something like:
mp3splt -o #N3_#f -f -t 4.0 -a -d split audio_file.mp3
Giving you:
001_audio_file.mp3,
002_audio_file.mp3,
003_audio_file.mp3…
The man page is a little messy, but it's all there.
I used
mp3splt -o #N3_#mm_#ss_#f -f -t 4.0 -a -d split audio_file.mp3
which gives me
/split/001_000m_00s_audio_file.mp3
/split/002_004m_00s_audio_file.mp3
I wonder why in the new version of grep (Ubuntu 16.04) my bash script stopped working:
...
COMMIT_REGEX='^\[[A-Z]+-[0-9]+\] \s*\S+(?:.|\n|\r)*\s* \(review: ([a-z]+\.[a-z]+|MYSELF)\)$'
if ! grep -Paz "$COMMIT_REGEX" "$1"; then
...
I get "grep: unescaped ^ or $ not supported with -Pz". I've tried to escape ^ and $ symbols, but it doesn't help.
In Ubuntu 15.10 script works perfectly.
It seems that the problem is the result of a bug with grep -Pz (credit to Lars Fischer for finding the relevant report).
I would suggest dropping the -P switch and using -E instead:
commit_re='^\[[A-Z]+-[0-9]+\] \s*\S+(.|\n|\r)*\s* \(review: ([a-z]+\.[a-z]+|MYSELF)\)$'
if ! grep -qEaz "$commit_re" "$1"; then
The only changes that I've made are to change -P to -E and add the -q (quiet) switch, since you're only interested in the return code. You don't really need a non-capturing group, so I changed it to a normal one.
I also don't like to see ALL_CAPS variable names as they should really be reserved for use by the shell.
On a MAC how do I GREP? I have a large TXT file (200MB). The sample data is below. I want to run a GREP with a regex and be able to get ONLY the following data values in my terminal response:
00424730350000190100130JEAN DANIELE &
I want everything up to 82700. Once I have this information, I can copy it into another file for other purpose. Now I just get back tons of information.
Sample Record:
00424730350000190100130JEAN DANIELE & 82700 TINEPORK CT LAT BORAN AK 12345 3342843470224201400003980000002664300001216IWD QD0415200800004005880002281300000671IWD QM0330200500004900000001836800000431IWD QM0325199900002455270001147700000969IWD QM
sample Grep I wrote:
grep -E "^(.*?)82700" MYFILE.TXT
grep -E "^(.*?)[0-9]" MYFILE.TXT
This still doesn't work, it gives back tons of info and the 82700 can be any value...I would like it to be Any help suggestions? thank you
For the sample data
grep -E -o "^[0-9]{23}[^0-9]+[0-9]+" MYFILE.TXT
seems to do the job:
00424730350000190100130JEAN DANIELE & 82700
using grep (BSD grep) 2.5.1-FreeBSD on Darwin 14.4.0.
Please comment, if and as this requires adjustment / further detail.
I am working on building a .sed file to start scripting the setup of multiple apache servers. I am trying to get sed to match the default webmaster email addresses in the .conf file which works great with this egrep. However when I use sed to try and so a substitute search and replace i get no errors back but it also does not do any substituting. I test this by running the same egrep command again.
egrep -o '\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b' /home/test/httpd.conf
returns
admin#your-domain.com
root#localhost
webmaster#dummy-host.example.com
The sed command I'm trying to use is
sed -i '/ServerAdmin/ s/\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b/MY_ADMIN_ADDRESS#gmail.com/g' /home/test/httpd.conf
After running I try and verify the results by running the egrep again and it returns the same 3 email address indicating nothing was replaced.
Don't assume that any two tools use the same regular expression syntax. If you're going to be doing replacements with sed, use sed to test - not egrep. It's easy to use sed as if it were a grep command: sed -ne '/pattern/p'.
sed must be told that it needs to use extended regular expressions using the -r option then making the sed command as follows.
sed -ir '/ServerAdmin/ s/\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b/MY_ADMIN_ADDRESS#gmail.com/g' /home/test/httpd.conf
Much thanks to Kent for pointing out that the address it was missing wasnt following a ServerName
Given the input
echo abc123def | grep -o '[0-9]*'
On one computer (with GNU grep 2.5.4), this returns 123, and on another (with GNU grep 2.5.1) it returns the empty string. Is there some explanation for why grep 2.5.1 fails here, or is it just a bug? I'm using grep -o in this way in a bash script that I'd like to be able to run on different computers (which may have different versions of grep). Is there a "right way" to get consistent behavior?
Yes, 2.5.1's -o handling was buggy:
http://www.mail-archive.com/bug-grep#gnu.org/msg00993.html
Grep is probably not the right tool for this; sed or tr or even perl might be better depending on what the actual task is.
you can use the shell. its faster
$ str=abc123def
$ echo ${str//[a-z]/}
123
I had the same issue and found that egrep was installed on that machine. A quick solution was using
echo abc123def | egrep -o '[0-9]*'
This will give similar results:
echo abc123def | sed -n 's/[^0-9]*\([0-9]\+\).*/\1/p'
Your question is a near-duplicate of this one.
Because you are using a regex so you must use either:
grep -E
egrep (like Sebastian posted).
Good luck!