I need to parse a file and randomized the last digits for a given string when the pattern is found.
I am able to perform the desired result when using a simple case but it fails for a more complex case.
I am wondering what is wrong with the second case.
This example here works.
echo 'AB111-1-13' | sed 's/\(AB111\)-\([0-9]*\)-\([0-9]*\)/echo \1-\2-$(echo \3*$RANDOM | bc )/ge'
But this one doesn't work.
echo '<http://name/link#AB111-1-13>' | sed 's/\(AB111\)-\([0-9]*\)-\([0-9]*\)/echo \1-\2-$(echo \3*$RANDOM | bc )/ge'
Any ideas?
EDIT
This is the error message when trying to run the second example.
sh: -c: line 0: syntax error near unexpected token newline'
sh: -c: line 0:'
The GNU sed e flag executes the pattern space as a shell command.
In your first example your pattern space starts as AB111-1-13 and becomes echo AB111-1-$(echo 13*$RANDOM | bc ) which is a valid shell command and gets executed. (I should point out that bc is entirely unnecessary here as the shell can perform integer arithmetic just fine by itself echo $((13 * RANDOM)).)
But in your second example you pattern space starts as <http://name/link#AB111-1-13> and becomes <http://name/link#echo AB111-1-$(echo 13*$RANDOM | bc )> which is very much not a valid shell command and so, presumably, you get a shell error (would have been good of you to include it in the question though) when it tries to get executed.
So don't use sed for this. Use something that can evaluate arbitrary expressions like awk or perl or python, etc.
Related
I'm trying to write a script that, among other things, automatically enable multilib. Meaning in my /etc/pacman.conf file, I have to turn this
#[multilib]
#Include = /etc/pacman.d/mirrorlist
into this
[multilib]
Include = /etc/pacman.d/mirrorlist
without accidentally removing # from lines like these
#[community-testing]
#Include = /etc/pacman.d/mirrorlist
I already accomplished this by using this code
linenum=$(rg -n '\[multilib\]' /etc/pacman.conf | cut -f1 -d:)
sed -i "$((linenum))s/#//" /etc/pacman.conf
sed -i "$((linenum+1))s/#//" /etc/pacman.conf
but I'm wondering, whether this can be solved in a single line of code without any math expressions.
With GNU sed. Find row starting with #[multilib], append next line (N) to pattern space and then remove all # from pattern space (s/#//g).
sed -i '/^#\[multilib\]/{N;s/#//g}' /etc/pacman.conf
If the two lines contain further #, then these are also removed.
Could you please try following, written with shown samples only. Considering that multilib and it's very next line only you want to deal with.
awk '
/multilib/ || found{
found=$0~/multilib/?1:""
sub(/^#+/,"")
print
}
' Input_file
Explanation:
First checking if a line contains multilib or variable found is SET then following instructions inside it's block.
Inside block checking if line has multilib then set it to 1 or nullify it. So that only next line after multilib gets processed only.
Using sub function of awk to substitute starting hash one or more occurences with NULL here.
Then printing current line.
This will work using any awk in any shell on every UNIX box:
$ awk '$0 == "#[multilib]"{c=2} c&&c--{sub(/^#/,"")} 1' file
[multilib]
Include = /etc/pacman.d/mirrorlist
and if you had to uncomment 500 lines instead of 2 lines then you'd just change c=2 to c=500 (as opposed to typing N 500 times as with the currently accepted solution). Note that you also don't have to escape any characters in the string you're matching on. So in addition to being robust and portable this is a much more generally useful idiom to remember than the other solutions you have so far. See printing-with-sed-or-awk-a-line-following-a-matching-pattern/17914105#17914105 for more.
A perl one-liner:
perl -0777 -api.back -e 's/#(\[multilib]\R)#/$1/' /etc/pacman.conf
modify in place with a backup of original in /etc/pacman.conf.back
If there is only one [multilib] entry, with ed and the shell's printf
printf '/^#\[multilib\]$/;+1s/^#//\n,p\nQ\n' | ed -s /etc/pacman.conf
Change Q to w to edit pacman.conf
Match #[multilib]
; include the next address
+1 the next line (plus one line below)
s/^#// remove the leading #
,p prints everything to stdout
Q exit/quit ed without error message.
-s means do not print any message.
Ed can do this.
cat >> edjoin.txt << EOF
/multilib/;+j
s/#//
s/#/\
/
wq
EOF
ed -s pacman.conf < edjoin.txt
rm -v ./edjoin.txt
This will only work on the first match. If you have more matches, repeat as necessary.
This might work for you (GNU sed):
sed '/^#\[multilib\]/,+1 s/^#//' file
Focus on a range of lines (in this case, two) where the first line begins #[multilib] and remove the first character in those lines if it is a #.
N.B. The [ and ] must be escaped in the regexp otherwise they will match a single character that is m,u,l,t,i or b. The range can be extended by changing the integer +1 to +n if you were to want to uncomment n lines plus the matching line.
To remove all comments in a [multilib] section, perhaps:
sed '/^#\?\[[^]]*\]$/h;G;/^#\[multilib\]/M s/^#//;P;d' file
Weird problem here that I don't seem to see repeated anywhere else, so posting here. Thanks in advance.
I have the following multiline sed code that is printing further sed and copy commands into a script (yep, using a script to insert code into a script). The code looks like this:
sed -i -r '/(rpub )([$][a-zA-Z0-9])/i\
sed -i '\''/#PBS -N/d'\'' \1\
cp \1 '"$filevariable"'' $masterscript
which is supposed to do the following:
1.) Open the master script
2.) Navigate to each instance of rpub $[a-zA-Z0-9] in the script
3.) Insert the second line (sed) and third line (cp) as lines before the rpub instance, using \1 as a backreference of the matched $[a-zA-Z0-9] from step 1.
This works great; all lines print well enough in relation to each other. However, all of my \1 references are appearing explicitly, minus their backslashes. So all of my \1's are appearing as 1.
I know my pattern match specifications are working correctly, as they nail all instances of rpub $[a-zA-Z0-9] well enough, but I guess I'm just not understanding the use of backreferences. Anyone see what is going on here?
Thanks.
EDIT 1
Special thanks to Ed Morton below, implemented the following, which gets me 99% closer, but I still can't close the gap with unexpected behavior:
awk -v fv="$filevariable" '
match($0, /rpub( [$][[:alnum:]])/, a)
{
print "sed -i '\''/#PBS -N/d'\''", a[1]
}
1' "$masterscript" > tmpfile && mv tmpfile "$masterscript"
Note: I removed one of the multiline print statements, as it isn't important here. But, as I said, though this gets me much closer I am still having an issue where the printed lines appear between every line in the masterscript; it is as if the matching function is considering every line to be a match. This is my fault, as I should probably have specified that I'd like the following to occur:
stuff here
stuff here
rpub $name
stuff here
rpub $othername
stuff here
would become:
stuff here
stuff here
inserted line $name
rpub $name
stuff here
insertedline $othername
rpub $othername
Any help would be greatly appreciated. Thanks!
It LOOKS like what you're trying to do could be written simply in awk as:
awk -i inplace -v fv="$filevariable" '
match($0,/rpub ([$][[:alnum:]])/,a) {
print "sed -i \"/#PBS -N/d\", a[1]
print "cp", a[1], fv
}
1' "$masterscript"
but without sample input and expected output it's just a guess.
The above uses GNU awk for inplace editing and the 3rd arg for match().
If you want a backreference to work the regular expression for it should be enclosed in parentheses, your second line is a second invocation of sed, nothing is saved from the first line.
I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...
CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu
I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)
DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.
Thank you very much for your help!
I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:
CN=Doe\, John,CN=Users,DC=example,DC=local
Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:
echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'
(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.
Two cut commands is probably the simplest (although not necessarily the best):
DSCL | cut -d, -f1 | cut -d= -f2
First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.
Using sed:
sed 's/^CN=\([^,]*\).*/\1/' input_file
^ matches start of line
CN= literal string match
\([^,]*\) everything until a comma
.* rest
http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt
I like awk too, so I print the substring from the fourth char:
DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt
This regex will parse a distinguished name, giving name and val a capture groups for each match.
When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:
(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+
Here is is nicely formatted:
(?:^|,\s?)
(?:
(?<name>[A-Z]+)=
(?<val>"(?:[^"]|"")+"|[^,]+)
)+
Here's a link so you can see it in action:
https://regex101.com/r/zfZX3f/2
If you want a regex to get only the CN, then this adapted version will do it:
(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))
The overarching problem:
So I have a file name that comes in the form of
JohnSmith14_120325_A10_6.raw
and I want to match it using regex. I have a couple of issues in building a working example but unfortunately my issues won't be solved unless I get the basics.
So I have just recently learned about piping and one of the cool things I learned was that I can do the following.
X=ll_paprika.sc (don't ask)
VAR=`echo $X | cut -p -f 1`
echo $VAR
which gives me paprika.sc
Now when I try to execute the pipe idea in grep, nothing happens.
x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR
Can anyone explain what I am doing wrong?
Second question:
How does one match a single underscore using regex?
Here's what I am ultimately trying to do;
VAR=`echo $X | grep -e "^[a-bA-Z][a-bA-Z0-9]*(_){1}[0-9]*(_){1}[a-bA-Z0-9]*(_){1}[0-9](\.){1}(raw)"
So the basic idea of my pattern here is that the file name must start with a letter
and then it can have any number of letters and numbers following it and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers and then it must have a single period following by raw. This looks grossly wrong and ugly (because I am not sure about the syntax). So how does one match a file extension? Can someone put up a simple example for something ll_parpika.sc so that I can figure out how to do my own regex?
Thanks.
x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR
The reason this isn't doing what you want is that the grep matches a line and returns it. *.sc does in fact match 11_paprika.sc, so it returns that whole line and sticks it in $VAR.
If you want to just get a part of it, the cut line probably better. There is a grep -o option that returns only the matching portion, but for this you'd basically have to put in the thing you were looking for, at which point why bother?
the file name must start with a letter
`grep -e "^[a-zA-Z]
and then it can have any number
of letters and numbers following it
[a-zA-Z0-9]*
and it must have an _ delimit a
series of numbers and another _ to delimit the next set of numbers and
characters and another _ to delimit the next set of numbers
(_[0-9]+){3}
and then it must have a single period following by raw.
.raw"
For the first, use:
VAR=`echo $X | egrep '\.sc$'`
For the second, you can try this alternative instead:
VAR=`echo $X | egrep '^[[:alpha:]][[:alnum:]]*_[[:digit:]]+_[[:alnum:]]+_[[:digit:]]+\.raw'`
Note that your character classes from your expression differ from the description that follows in that they seem to only be permissive of a-b for lower case characters in some places. This example is permissive of all alphanumeric characters in those places.
Using sed, how do I return the last occurance of a match until the End Of File?
(FYI this has been simplified)
So far I've tried:
sed -n '/ Statistics |/,$p' logfile.log
Which returns all lines from the first match onwards (almost the entire file)
I've also tried:
$linenum=`tail -400 logfile.log | grep -n " Statistics |" | tail -1 | cut -d: -f1`
sed "$linenum,\$!d" logfile.log
This works but won't work over an ssh connection in one command, really need it all to be in one pipeline.
Format of the log file is as follows:
(There are statistics headers with sub data written to the log file every minute, the purpose of this command is to return the most recent Statistics header together with any associated errors that occur after the header)
Statistics |
Stuff
More Stuff
Even more Stuff
Statistics |
Stuff
More Stuff
Error: incorrect value
Statistics |
Stuff
More Stuff
Even more Stuff
Statistics |
Stuff
Error: error type one
Error: error type two
EOF
Return needs to be:
Statistics |
Stuff
Error: error type one
Error: error type two
Your example script has a space before Statistics but your sample data doesn't seem to. This has a regex which assumes Statistics is at beginning of line; tweak if that's incorrect.
sed -n '/^Statistics |/h;/^Statistics |/!H;$!b;x;p'
When you see Statistics, replace the hold space with the current line (h). Otherwise, append to the hold space (H). If we are not at the end of file, stop here (b). At end of file, print out the hold space (x retrieve contents of hold space; p print).
In a sed script, commands are optionally prefixed by an "address". Most commonly this is a regex, but it can also be a line number. The address /^Statistics |/ selects all lines matching the regular expression; /^Statistics |/! selects lines not matching the regular expression; and $! matches all lines except the last line in the file. Commands with no explicit address are executed for all input lines.
Edit Explain the script in some more detail, and add the following.
Note that if you need to pass this to a remote host using ssh, you will need additional levels of quoting. One possible workaround if it gets too complex is to store this script on the remote host, and just ssh remotehost path/to/script. Another possible workaround is to change the addressing expressions so that they don't contain any exclamation marks (these are problematic on the command line e.g. in Bash).
sed -n '/^Statistics |/{h;b};H;${x;p}'
This is somewhat simpler, too!
A third possible workaround, if your ssh pipeline's stdin is not tied up for other things, is to pipe in the script from your local host.
echo '/^Statistics |/h;/^Statistics |/!H;$!b;x;p' |
ssh remotehost sed -n -f - file
If you have tac available:
tac INPUTFILE | sed '/^Statistics |/q' | tac
This might work for you:
sed '/Statistics/h;//!H;$!d;x' file
Statistics |
Stuff
Error: error type one
Error: error type two
If you're happy with an awk solution, this kinda works (apart from getting an extra blank line):
awk '/^Statistics/ { buf = "" } { buf = buf "\n" $0 } END { print buf }' input.txt
sed ':a;N;$!ba;s/.*Statistics/Statistics/g' INPUTFILE
should work (GNU sed 4.2.1).
It reads the whole file to one string, then replaces everything from the start to the last Statistics (word included) with Statistics, and prints what's remaining.
HTH
This might also work, slightly more simple version of the sed solution given by the others above:
sed -n 'H; /^Statistics |/h; ${g;p;}' logfile.log
Output:
Statistics |
Stuff
Error: error type one
Error: error type two