search multiple strings in a single line with multiple spaces in between - regex

I want to search for the whole line below from my /etc/pam.d/su file to use it in a script:
auth required pam_wheel.so use_uid
there might be multiple spaces in between, and it might be commented out also, with multiple #'s
This is what I'm using :
grep "^#*auth +required +pam_wheel\.so +use_uid$"
, but it doesn't yield anything
I'm certainly doing something wrong, but what am I doing wrong? Sorry, have always been bad with regular expressions

egrep is the way to go, but the question said "multiple" spaces. That can be done like this
egrep "^([[:space:]]*#)*[[:space:]]*auth[[:space:]]+required[[:space:]]+pam_wheel\.so[[:space:]]+use_uid[[:space:]]*$"
A backslashed space "\ " is not listed in the special escapes in regex(7) Instead, the POSIX character class can be used. You could also use blank (a GNU extension) rather than space to make this only space/tab.

you can use grep -E (extended regexp)
grep -E "^\ +auth\ +required\ +pam_wheel\.so +use_uid$"
this works:
echo " auth required pam_wheel.so use_uid" | grep -E "^\ +auth\ +required\ +pam_wheel\.so +use_uid$"
gives
auth required pam_wheel.so use_uid

Well this finally works for me:
[root#server4 ~]# egrep "^#*auth.*required.*pam_wheel\.so.*use_uid" /etc/pam.d/su
#auth required pam_wheel.so use_uid
I think the issue is in how we are mentioning the spaces.

Related

sed regular expression does not work as expected. Differs on pipe and file

I have a string in text file where i want to replace the version number. Quotation marks can vary from ' to ". Also spaces around = can be there and can be not as well:
$data['MODULEXXX_VERSION'] = "1.0.0";
For testing i use
echo "_VERSION'] = \"1.1.1\"" | sed "s/\(_VERSION.*\)[1-9]\.[1-9]\.[1-9]/\11.1.2/"
which works perfectly.
When i change it to search in the file (the file has the same string):
sed "s/\(_VERSION.*\)[1-9]\.[1-9]\.[1-9]/\11.1.2/" -i test.php
, it does not find anything.
After after playing with the search part of regex, i found one more odd thing:
sed "s/\(_VERSION.*\)[1-9]\./\1***/" -i test.php
works and changes the string to $data['MODULEXXX_VERSION'] = "***0.0";, but
sed "s/\(_VERSION.*\)[1-9]\.[1-9]/\1***/" -i test.php
does not find anything anymore. Why?
I am using Ubuntu 17.04 desktop.
Anyone can explain what am I doing wrong? What would be the best command for replacing version numbers in the file for the string $data['MODULEXXX_VERSION'] = "***0.0";?
The main problem is that [1-9] doesn't match the 0s in the version number. You need to use [0-9].
Besides that, you may use the following sed command:
sed -r 's/(.*_VERSION['\''"]]\s*=\s*).*/\1"1.0.1";/' conf.php
This doesn't look at the current value, it simply replaces everything after the =.
I've used -r which enables extended posix regular expressions which makes it a bit simpler to formulate the pattern.
Another, probably cleaner attempt is to store the conf.php as a template like conf.php.tpl and then use a template engine to render the file. Or if you really want to use sed, the file may look like:
$data['FOO_VERSION'] = "FOO_VERSION_TPL";
Then just use:
sed 's/FOO_VERSION_TPL/1.0.1/' conf.php.tpl > conf.php
If there are multiple values to replace:
sed \
-e 's/FOO/BAR/' \
-e 's/HELLO/WORLD/' \
conf.php.tpl > conf.php
But I recommend a template engine instead of sed. That becomes more important when the content of the variables to replace may contain characters special to regular expressions.

sed : match all instances of regex in infile1.txt, and output only these to outfile2.txt

I have a text file infile1 with 1,000's of lines.
I wish to use sed to extract the occuring instances of a regex pattern match to outfile2.
NB
Each instance of the regex pattern match may occur more than once on each line of infile1.
Each instance of the extracted regex pattern should be printed to a new line in outfile2.
Does anyone know the syntax within sed to place the regex into?
ps the regex pattern is
\(Google[ ]{1,3}“[a-zA-Z0-9 ]{1,100}[., ]{0,3}”\)
Thank you :)
I think you want
grep -oE 'Google[ ]{1,3}"[a-zA-Z0-9 ]{1,100}[., ]{0,3}"' filename
-o tells grep to print only the matches, each on a line of its own, and -E instructs it to interpret the regex in extended POSIX syntax, which your regex appears to be.
Note that [ ] could be replaced with just a space, and you might want to use [[:alnum:] ] instead of [a-zA-Z0-9 ] to cover umlauts and suchlike if they exist in the current locale.
Addendum: It is also possible to do this with sed. I don't recommend it, but you could write (using GNU sed):
sed -rn 's/Google[ ]{1,3}"[A-Za-z0-9 ]{1,100}[., ]{0,3}"/\n&\n/g; s/[^\n]*\n([^\n]*\n)/\1/g; s/\n[^\n]*$//p' filename
To make this work with older versions of BSD sed, use -En instead of -rn. -r and -E enable extended regex syntax. -r was historically used by GNU sed, -E by BSD sed; newer versions of them support both for compatibility. -n disables auto-printing.
The code works as follows:
# mark all occurrences of the regex by circumscribing them with newlines
s/Google[ ]{1,3}"[A-Za-z0-9 ]{1,100}[., ]{0,3}"/\n&\n/g
# Isolate every other line from the pattern space (the matches). This will
# leave the part behind the last match...
s/[^\n]*\n([^\n]*\n)/\1/g
# ...so we remove it afterwards and print the result of the transformation if it
# happened (the s///p flag does that). The transformation will not happen if
# there were no matches in the line (because then no newlines will have been
# inserted), so in those cases nothing will be printed.
s/\n[^\n]*$//p
It can be done with sed too, but it isn't pretty:
sed -n ':start /foo/{ h; s/\(foo\).*/\1/; s/.*\(foo\)/\1/; p; g; s/foo\(.*\)/\1/; b start; }' infile1 >outfile2
-- provided that you replace the four occurences of foo above with your pattern Google {1,3}“[a-zA-Z0-9 ]{1,100}[., ]{0,3}”.
Yeah, I told you it isn't pretty. :)

looking for regExp to return line between two strings that works with pdfgrep

Though I'm not totally new to regExp, they always give me headaches. Especially when not all forms of regular expressions can be used.
The pattern has to work with pdfgrep as the information I'm looking for is inside a pdf Document.
Obviously the document is multiline
The resulting pattern will be used in a bash script if this does make any difference
The keywords usually can be found more than once in the same file, while I need only the data between the first occurences of both keywords
The data looks like:
some text
some more text
even more information Date
02.Feb.2014
Customer
some more text
some more information
even more information Date
02.Feb.2014
Customer
some more text
some more information
...
The result of the command should be: 02.Feb.2014
I don't know which characters might be around this date (tabs, spaces ...) and I don't want to rely on them.
I tried
pdfgrep -h 'Date(.*?)Customer' *.pdf
which gave no result at all.
Next try was
pdfgrep -h '(?<=Date)(.*)(?=Customer)' *.pdf
which resulted in an error "Invalid preceding regular expression"
The best shot I can come up until now is
pdfgrep -h '(Date)[[:space:]]{,1}.{,100}[[:space:]](Customer){,1}' *.pdf
This returns all matching dates together with the first keyword. But I'd like a much more elegant way as regExp should be able to provide it.
I'd appreciate any useful hint ;)
Regards
Manuel
The only document you should ever read when using grep, awk, or sed regular expressions is here. It cleared a lot of stuff up for me.
sed -n -e '/even more information Date/ {' \
-e ' n' \
-e ' s/^[[:space:]]*//' \
-e ' p' \
-e '}'
UNIX regular expressions only look at lines in the file. you can't capture stuff in an RE across lines.
The above sed command looks for a line looking like even more information Date, looks at the next line, removes the white space, and prints that line (the one with 02.Feb.2014 on it). The -n option is used to suppress output (only print lines if "I tell you to", sed).
The hint to use gs in combination with sed does the trick. Though I had to do some testing until it worked as desired.
The command used now is:
gs -q -dBATCH -dNOPAUSE -sDEVICE=txtwrite -dFirstPate=1 -dLastPage=1 \
-sOutputFile=- /path/to/my.pdf 2>/dev/null | sed -n -e '/Date/ {' \
-e'n' -e's/^[[:space:]]*//' -e 'p' -e '}'
Thanks to all contributors :)

Regular Expression to parse Common Name from Distinguished Name

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...
CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu
I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)
DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.
Thank you very much for your help!
I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:
CN=Doe\, John,CN=Users,DC=example,DC=local
Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:
echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'
(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.
Two cut commands is probably the simplest (although not necessarily the best):
DSCL | cut -d, -f1 | cut -d= -f2
First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.
Using sed:
sed 's/^CN=\([^,]*\).*/\1/' input_file
^ matches start of line
CN= literal string match
\([^,]*\) everything until a comma
.* rest
http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt
I like awk too, so I print the substring from the fourth char:
DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt
This regex will parse a distinguished name, giving name and val a capture groups for each match.
When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:
(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+
Here is is nicely formatted:
(?:^|,\s?)
(?:
(?<name>[A-Z]+)=
(?<val>"(?:[^"]|"")+"|[^,]+)
)+
Here's a link so you can see it in action:
https://regex101.com/r/zfZX3f/2
If you want a regex to get only the CN, then this adapted version will do it:
(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))

grep - search for "<?\n" at start of a file

I have a hunch that I should probably be using ack or egrep instead, but what should I use to basically look for
<?
at the start of a file? I'm trying to find all files that contain the php short open tag since I migrated a bunch of legacy scripts to a relatively new server with the latest php 5.
I know the regex would probably be '/^<\?\n/'
I RTFM and ended up using:
grep -RlIP '^<\?\n' *
the P argument enabled full perl compatible regexes.
If you're looking for all php short tags, use a negative lookahead
/<\?(?!php)/
will match <? but will not match <?php
[meder ~/project]$ grep -rP '<\?(?!php)' .
find . -name "*.php" | xargs grep -nHo "<?[^p^x]"
^x to exclude xml start tag
if you worried about windows line endings, just add \r?.
grep '^<?$' filename
Don't know if that is showing up correctly. Should be
grep ' ^ < ? $ ' filename
Do you mean a literal "backslash n" or do you mean a newline?
For the former:
grep '^<?\\n' [files]
For the latter:
grep '^<?$' [files]
Note that grep will search all lines, so if you want to find matches just at the beginning of the file, you'll need to either filter each file down to its first line, or ask grep to print out line numbers and then only look for line-1 matches.