Using Grep + Regex linux

Using Grep + Regex linux - regex

I am trying to find the value "PASS_MAX_DAYS" equal to 90 or less in the /etc/login.defs file But it does not work, I am testing on a suse 12 server but the command does not work.
grep "^PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)" /etc/login.defs
Thanks for your support and time

It is better to test the number against your value, and not test a string against any pattern (as already suggested in comments). For example like this:
awk '/^PASS_MAX_DAYS/ && $2<=90' /etc/login.defs
This way, you can easily modify your command, if your limit changes to 30 or to 365 days. Also I guess values like 090 are still valid for that configuration.

grep, by default, doesn't understand extended regular expressions ..
grep -E "^PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)\s*$" /etc/login.defs
will give you SOME result.
That said, how many different entries for PASS_MAX_DAYS do you expect in that file?

Related

Finding files using regular expressions/wildcards

Within a particular directory, I have a series of files that are labelled sequentially:image0000.png, image0001.png, image0002.png, etc.. They are labelled by number, but I don't necessarily know how many preceding zeroes there are in the filename, i.e. whether it will be image0001.png or image00001.png.
Within a bash script, I wish to find a single file at a time (over a for loop), and then apply some processing to the file. This search could start at zero and end before I've reached the end, or could be of varying steps. To expand, I could want to find image0000.png, image0001.png, image0002.png and so forth, or I could start at image0010.png and find every other file, i.e. the next two would be image0012.png and image0014.png.
To try and find the first file (image0000.png), I've tried using find and ls, with the following outputs:
$ find video/figs/ -name 'image*[0]0.png'
video/figs/image00100.png
video/figs/image00000.png
$ ls video/figs/image*[0]0.png
-rw-r--r-- 1 user machine 165K Feb 19 09:06 video/figs/image00000.png
-rw-r--r-- 1 user machine 207K Feb 19 09:06 video/figs/image00100.png
Similar results occur for finding the second (i.e., find video/figs/ -name 'image*[0]0.png' finds image00101.png and image00001.png. So it's finding the file I want (image00001.png), but is also finding one that I don't (image00101.jpg). Can anyone help me understand why, and fix it?

I would use ls and grep for that:
ls | grep -oP 0*[1-9]+.png
Example:
$:/tmp/test$ ls
00001.png 00002.png 00010.png 00013.png 00201.png
$:/tmp/test$ ls | grep -oP 0*[1-9]+.png
00001.png
00002.png
00013.png
01.png

I suspect you don't want to dive into subdirectories, and find files, sorted by number, spread over subdirs.
So find isn't necessary.
ls image*{08..10}.png
image00010.png image0008.png image0009.png image0010.png image008.png image009.png
Part 2 of your question, only find every other file:
ls image*{08..10..2}.png
image00010.png image0008.png image0010.png image008.png
Maybe you know for-loops. It's like that,
for (i in 8 to 10 by 2)
or
for (int i=8; i <= 10; i+=2)
Restricting the search to find image image00010.png but not imageAB010.png wouldn't work.
The reason to exclude 101 is still unclear. Maybe it's only a sorting thing.
With directories, which aren't the PWD, there is no big difference:
ls video/figs/image*{08..10..2}.png
Note, that instead of ls, you use just the program, you want to process on the files, if the program is able to handle more than one file at a time, like ls.

Sincere thanks to everyone who contributed an answer - perhaps I explained it poorly, or I was too wedded to the code I'd already written to use any of the provided answers. However, I've found the following solutions:
1) Why did I find more answers than I expected?
find video/figs/ -name 'image*[0]0.png' uses very limited comprehension of wildcards, and thus the above was interpreted as finding a file with name image<wildcard>00.png. There is no way, using the -name option, to restrict the application of * to match only a given character (in this case, only find zero or more matches to 0.
2) How do I find the image files with an unknown number of padding zeroes?
The following is a MWE from my final code. It demonstrates how to search within a given directory SEARCH_DIR (not necessarily including subdirectories, but I haven't checked)
f1=0 # Starting number
f2=10 # End number
df=2 # number to skip between images
for ((f=$f1; f<=$f2; f=$f+$df)); do
export iFile=$(find $SEARCH_DIR -regex '.*/image0*'$f'.png')
done
The export ensures the variable is available to sub-processes, with the iFile=$() syntax allowing me to export the result of the command to the variable iFile. The bit within the parentheses is the bit I was looking for: find $SEARCH_DIR -regex '.*/image[0]*'$f'.png'
a) find $SEARCH_DIR specifies the location for the search
b) -regex specifies to use regular expressions, which are more powerful than standard bash scripting and allow me to limit wildcards as required
c) '.*/image0*'$f'.png': The regular expression search looks over the entire string, so apparently I need the initial .*/ to perform the match. The 0* now performs as I originally wanted - the * wildcard is now searching for zero or more matches of the preceding term, which here is 0 (so if I wanted to search for zero or more matches of any digit, I would use [0-9]*). The $f term is to search for the numbered file in the for loop.

What is the difference b/w two sed commands below?

Information about the environment I am working in:
$ uname -a
AIX prd231 1 6 00C6B1F74C00
$ oslevel -s
6100-03-10-1119
Code Block A
( grep schdCycCleanup $DCCS_LOG_FILE | sed 's/[~]/ \
/g' | grep 'Move(s) Exist for cycle' | sed 's/[^0-9]*//g' ) > cycleA.txt
Code Block B
( grep schdCycCleanup $DCCS_LOG_FILE | sed 's/[~]/ \n/g' | grep 'Move(s) Exist for cycle' | sed 's/[^0-9]*//g' ) > cycleB.txt
I have two code blocks(shown above) that make use of sed to trim the input down to 6 digits but one command is behaving differently than I expected.
Sample of input for the two code blocks
Mar 25 14:06:16 prd231 ajbtux[33423660]: 20160325140616:~schd_cem_svr:1:0:SCHD-MSG-MOVEEXISTCYCLE:200705008:AUDIT:~schdCycCleanup - /apps/dccs/ajbtux/source/SCHD/schd_cycle_cleanup.c - line 341~ SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210~
I get the following output when the sample input above goes through the two code blocks.
cycleA.txt content
389210
cycleB.txt content
25140616231334236602016032514061610200705008341389210
I understand that my last piped sed command (sed 's/[^0-9]*//g') is deleting all characters other than numbers so I omitted it from the block codes and placed the output in two additional files. I get the following output.
cycleA1.txt content
SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210
cycleB1.txt content
Mar 25 15:27:58 prd231 ajbtux[33423660]: 20160325152758: nschd_cem_svr:1:0:SCHD-MSG-MOVEEXISTCYCLE:200705008:AUDIT: nschdCycCleanup - /apps/dccs/ajbtux/source/SCHD/schd_cycle_cleanup.c - line 341 n SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210 n
I can see that the first code block is removing every thing other that (SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210) and is using the tilde but the second code block is just replacing the tildes with the character n. I can also see that it is necessary in the first code block for a line break after this(sed 's/[~]/ ) and that is why I though having \n would simulate a line break but that is not the case. I think my different output results are because of the way regular expressions are being used. I have tried to look into regular expressions and searched about them on stackoverflow but did not obtain what I was looking for. Could someone explain how I can achieve the same result from code block B as code block A without having part of my code be on a second line?
Thank you in advance

This is an example of the XY problem (http://xyproblem.info/). You're asking for help to implement something that is the wrong solution to your problem. Why are you changing ~s to newlines, etc when all you need given your posted sample input and expected output is:
$ sed -n 's/.*schdCycCleanup.* \([0-9]*\).*/\1/p' file
389210
or:
$ awk -F'[ ~]' '/schdCycCleanup/{print $(NF-1)}' file
389210
If that's not all you need then please edit your question to clarify your requirements for WHAT you are trying to do (as opposed to HOW you are trying to do it) as your current approach is just wrong.

Etan Reisner's helpful answer explains the problem and offers a single-line solution based on an ANSI C-quoted string ($'...'), which is appropriate, given that you originally tagged your question bash.
(Ed Morton's helpful answer shows you how to bypass your problem altogether with a different approach that is both simpler and more efficient.)
However, it sounds like your shell is actually something different - presumably ksh88, an older version of the Korn shell that is the default sh on AIX 6.1 - in which such strings are not supported[1]
(ANSI C-quoted strings were introduced in ksh93, and are also supported not only in bash, but in zsh as well).
Thus, you have the following options:
With your current shell, you must stick with a two-line solution that contains an (\-escaped) actual newline, as in your code block A.
Note that $(printf '\n') to create a newline does not work, because command substitutions invariably trim all trailing newlines, resulting in the empty string in this case.
Use a more modern shell that supports ANSI C-quoted strings, and use Etan's answer. http://www.ibm.com/support/knowledgecenter/ssw_aix_61/com.ibm.aix.cmds3/ksh.htm tells me that ksh93 is available as an alternative shell on AIX 6.1, as /usr/bin/ksh93.
If feasible: install GNU sed, which natively understands escape sequences such as \n in replacement strings.
[1] As for what actually happens when you try echo 'foo~bar~baz' | sed $'s/[~]/\\\n/g' in a POSIX-like shell that does not support $'...': the $ is left as-is, because what follow is not a valid variable name, and sed ends up seeing literal $s/[~]/\\\n/g, where the $ is interpreted as a context address applying to the last input line - which doesn't make a difference here, because there is only 1 line. \\ is interpreted as plain \, and \n as plain n, effectively replacing ~ instances with literal \n sequences.

GNU sed handles \n in the replacement the way you expect.
OS X (and presumably BSD) sed does not. It treats it as a normal escaped character and just unescapes it to n. (Though I don't see this in the manual anywhere at the moment.)
You can use $'' quoting to use \n as a literal newline if you want though.
echo 'foo~bar~baz' | sed $'s/[~]/\\\n/g'

Complex changes to a URL with sed

I am trying to parse an RSS feed on the Linux command line which involves formatting the raw output from the feed with sed.
I currently use this command:
feedstail -u http://www.heise.de/newsticker/heise-atom.xml -r -i 60 -f "{published}> {title} {link}" | sed 's/^\(.\{3\}\)\(.\{13\}\)\(.\{6\}\)\(.\{3\}\)\(.*\)/\1\3\5/'
This gives me a number of feed items per line that look like this:
Sat 20:33 GMT> WhatsApp-Ausfall: Server-Probleme blockieren Messaging-Dienst http://www.heise.de/newsticker/meldung/WhatsApp-Ausfall-Server-Probleme-blockieren-Messaging-Dienst-2121664.html/from/atom10?wt_mc=rss.ho.beitrag.atom
Notice the long URL at the end. I want to shorten this to better fit on the command line. Therefore, I want to change my sed command to produce the following:
Sat 20:33 GMT> WhatsApp-Ausfall: Server-Probleme blockieren Messaging-Dienst http://www.heise.de/-2121664
That means cutting everything out of the URL except a dash and that seven digit number preceeding the ".html/blablabla" bit.
Currently my sed command only changes stuff in the date bit. It would have to leave the title and start or the URL alone and then cut stuff out of it until it reaches the seven digit number. It needs to preserve that and then cut everything after it out. Oh yeah, and we need to leave a dash right in front of that number too.
I have no idea how to do that and can't find the answer after hours of googling. Help?
EDIT:
This is the raw output of a line of feedstail -u http://www.heise.de/newsticker/heise-atom.xml -r -i 60 -f "{published}> {title} {link}", in case it helps:
Sat, 22 Feb 2014 20:33:00 GMT> WhatsApp-Ausfall: Server-Probleme blockieren Messaging-Dienst http://www.heise.de/newsticker/meldung/WhatsApp-Ausfall-Server-Probleme-blockieren-Messaging-Dienst-2121664.html/from/atom10?wt_mc=rss.ho.beitrag.atom
EDIT 2:
It seems I can only pipe that output into one command. Piping it through multiple ones seems to break things. I don't understand why ATM.

Unfortunately (for me), I could only think of solving this with extended regexp syntax (either -E or -r flag on different systems):
... | sed -E 's|(://[^/]+/).*(-[0-9]+)\.html/.*|\1\2|'
UPDATE: In basic regexp syntax, the best I can do is
... | sed 's|\(://[^/]*/\).*\(-[0-9][0-9]*\)\.html/.*|\1\2|'

The key to writing this sort of regular expression is to be very careful about what the boundaries of what you expect are, so as to avoid the random gunk that you want to get rid of causing you problems. Also, you should bear in mind that you can use characters other than / as part of a s operation's delimiters.
sed 's!\(http://www\.heise\.de/\)newsticker/meldung/[^./]*\(-[0-9]+\)\.html[^ ]*!\1\2!'
Be aware that getting the RE right can be quite tricky; assume you'll need to test it! (This is a key part of the “now you have two problems” quote; REs very easily become horrendous.)

Something like this maybe?
... | awk -F'[^0-9]*' '{print "http://www.heise.de/-"$2}'

This might work for you (GNU sed):
sed 's|\(//[^/]*/\).*\(-[0-9]\{7\}\).*|\1\2|' file
You can place the first sed command so:
feedstail -u http://www.heise.de/newsticker/heise-atom.xml -r -i 60 -f "{published}> {title} {link}" |
sed 's/^\(.\{3\}\)\(.\{13\}\)\(.\{6\}\)\(.\{3\}\)\(.*\)/\1\3\5/;s|\(//[^/]*/\).*\(-[0-9]\{7\}\).*|\1\2|'

Bash script - mass modify files sed regular expression

I have a set of .csv files (all in one folder) with the format shown below:
170;151;104;137;190;125;170;108
195;192;164;195;171;121;133;104
... (a lot more rows) ...
The thing is I screwed up a bit and it should look like this
170;151;104.137;190.125;170;108
195;192;164.195;171.121;133;104
In case the difference is too subtle to notice:
I need to write a script that changes every third and fifth semicolon into a period in every row in efery file in that folder.
My research indicate that I have to devise some clever sed s/ command in my script. The problem is I'm not very good with regular expressions. From reading the tutorial it's probably gonna involve something with /3 and /5.

Here's a really short way to do it:
sed 's/;/./3;s/;/./4' -iBAK *
It replaces the 3rd and then the 5th (which is now the 4th) instances of the ; with ..
I tested it on your sample (saved as sample.txt):
$ sed 's/;/./3;s/;/./4' <sample.txt
170;151;104.137;190.125;170;108
195;192;164.195;171.121;133;104
For safety, I have made my example back up your originals as <WHATEVER>.BAK. To prevent this, change -iBAK to -i.
This script may not be totally portable but I've tested it on Mac 10.8 with BSD sed (no idea what version) and Linux with sed (gsed) 4.1.4 (2003). #JonathanLeffler notes that it's standard POSIX sed as of 2008. I also just found it and like it a lot.
Golf tip: If you run the command from bash, you can use brace expansion to achieve a supremely short version:
sed -es/\;/./{3,4} -i *

Here's one way:
sed -i 's/^\([^;]*;[^;]*;[^;]*\);\([^;]*;[^;]*\);/\1.\2./' foldername/*
(Disclaimer: I did test this, but some details of sed are not fully portable. I don't think there's anything non-portable in the above, so it should be fine, but please make a backup copy of your folder first, before running the above. Just in case.)

Extracting username from UNIX path using Regex

I need to get a username from an Unix path with this format:
/home/users/myusername/project/number/files
I just want "myusername" I've been trying for almost a hour and I'm completely clueless.
Any idea?
Thanks!

Maybe just /home/users/([a-zA-Z0-9_\-]*)/.*?
Note that the critical part [a-zA-Z0-9_\-]* has to contain all valid characters for unix usernames. I took from here, that a username should only contain digits, characters, dashes and underscores.
Also note that the extracted username is not the whole matching, but the first group (indicated by (...)).

The best answer to this depends on what you are trying to achieve. If you want to know the user who owns that file then you can use the stat command, this unfortunately has slightly different syntax dependant on the operating system however the following two commands work
Max OS/X
stat -f '%Su' /home/users/myusername/project/number/files
Redhat/Fedora/Centos
stat -c '%U' /home/users/myusername/project/number/files
If you really do want the string following /home/users then the either of the Regexes provided above will do that, you could use that in a bash script as follows (Mac OS/X)
USERNAME=$(echo '/home/users/myusername/project/number/files' | \
sed -E -e 's!^/home/users/([^/]+)/.*$!\1!g')

Check http://rubular.com/r/84zwJmV62G. The first match, not the entire match, is the username.

in a bourne shell something like :
string="/home/users/STRINGWEWANT/some/subdir/here"
echo $string | awk -F\/ '{print $3}'
would be one option, assuming its always the third element of the path. There are more lightweight that use only the shell builtins :
echo ${x#*users/}
will strip out everything up to and including 'users/'
echo ${y%%/*}
Will strip out the remainder.
So to put it all together :
export path="/home/users/STRINGWEWABT/some/other/dirs"
export y=`echo ${path#*users/}` && echo ${y%%/*}
STRINGWEWABT
Also checkout the bash manpage and search for "Parameter Expansion"

(\/home\/users\/)([^\/]+)
The 2nd capture group (index 1) will be myusername

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using Grep + Regex linux - regex

I am trying to find the value "PASS_MAX_DAYS" equal to 90 or less in the /etc/login.defs file But it does not work, I am testing on a suse 12 server but the command does not work. grep "^PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)" /etc/login.defs Thanks for your support and time

grep, by default, doesn't understand extended regular expressions .. grep -E "^PASS_MAX_DAYS\s([0-9]|[1-8][0-9]|90)\s$" /etc/login.defs will give you SOME result. That said, how many different entries for PASS_MAX_DAYS do you expect in that file?

Related

Finding files using regular expressions/wildcards

What is the difference b/w two sed commands below?

Complex changes to a URL with sed

Bash script - mass modify files sed regular expression

Extracting username from UNIX path using Regex

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using Grep + Regex linux - regex

I am trying to find the value "PASS_MAX_DAYS" equal to 90 or less in the /etc/login.defs file But it does not work, I am testing on a suse 12 server but the command does not work. grep "^PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)" /etc/login.defs Thanks for your support and time

grep, by default, doesn't understand extended regular expressions .. grep -E "^PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)\s*$" /etc/login.defs will give you SOME result. That said, how many different entries for PASS_MAX_DAYS do you expect in that file?

Related

Finding files using regular expressions/wildcards

What is the difference b/w two sed commands below?

Complex changes to a URL with sed

Bash script - mass modify files sed regular expression

Extracting username from UNIX path using Regex

Categories

Resources

grep, by default, doesn't understand extended regular expressions .. grep -E "^PASS_MAX_DAYS\s([0-9]|[1-8][0-9]|90)\s$" /etc/login.defs will give you SOME result. That said, how many different entries for PASS_MAX_DAYS do you expect in that file?