Indent line ranges with sed?

Indent line ranges with sed? - regex

I'm trying here to convert old fashion phpBB code blocks to MARKDOWN using sed.
Please consider following data sample:
cat sed.txt
[code]xxxx-YYY-xxxx[/code]
Some text
[code]yyyy-ZZZ-yyyy[/code]
More text
Bogus code block[/code]
[code]zzzz-XXX-zzzz[/code]
After long trial and error I've ended up with the following strategy:
sed -ne '
/\[code\].*\[\/code\]/ {
s#\[/*code\]##g
s#^#\n\n #
s#$#\n\n#p
}' sed.txt | cat -Av
$
$
xxxx-YYY-xxxx$
$
$
$
$
yyyy-ZZZ-yyyy$
$
$
$
$
zzzz-XXX-zzzz$
$
$
This works great, however I find it would be easier and seem more natural to do it this way:
sed -ne '
/\[code\].*\[\/code\]/ {
s#\[/*code\]#\n\n#g
s#^# #p
}' sed.txt | cat -Av
$
$
xxxx-YYY-xxxx$
$
$
$
$
yyyy-ZZZ-yyyy$
$
$
$
$
zzzz-XXX-zzzz$
$
$
But that does not work as expected. Any suggestions why, how to get around this?
Thank you

sed '/\[code\].*\[\/code\]/ {
s#\[code]#& #g
s#\[/*code\]#\
\
#g
}' sed.txt
order of substitution is important and changed between your two sample
I also change a bit the behavior, the -n and p are not needed in this text sample (but maybe if coming from a biggest structure)
(test on my aix so posix version)

This might work for you (GNU sed):
sed -nr 's/^\[(code\])(.*)\[\/\1$/\n\n \2\n\n/p' file | sed -n l
N.B. In your script you prepend 2 newlines to the beginning of the pattern space and then prepend 4 spaces again, thus the indentation is added infront of the first of the newlines not infront of the text.

Related

awk to parse the ldap data between two strings linux

Hi I want to get the strings between two string but in my case the first string like kdp2002 or kdp1005 this is not going to be constant for all entries across the output, that means the numbers after KDP and always changing and that KDP+number don't want to be printed.
$ ldapsearch -x -LLL -o ldif-wrap=no -b ou=Projects,ou=People,ou=KDI,o=KDP cn="alltest1p1" KDPHomeDirectory
dn: cn=alltest1p1,ou=Projects,ou=People,ou=KDI,o=KDP
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch_c/q,Quota=20000,Id=scratch_c
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=economy,NisMap=KDP2002:/proj/KDP2002_alltest1p1/q,Quota=10000
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch/q,Quota=20000,Id=scratch
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch_a/q,Quota=20000,Id=scratch_a
Trial that works Partially:
$ ldapsearch -x -LLL -o ldif-wrap=no -b ou=Projects,ou=People,ou=KDI,o=KDP cn="alltest1p1" KDPHomeDirectory | grep -o -P '(?<=NisMap=).*(?=,Quota)'
KDP2002:/proj/KDP2002_alltest1p1/q
KDP2002:/proj/KDP2002_alltest1p1_scratch/q
KDP2002:/proj/KDP2002_alltest1p1_scratch_a/q
Expected output:
/proj/KDP2002_alltest1p1/q
/proj/KDP2002_alltest1p1_scratch/q
/proj/KDP2002_alltest1p1_scratch_a/q

I would harness GNU sed for this task following way, let file.txt content be
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch_c/q,Quota=20000,Id=scratch_c
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=economy,NisMap=KDP2002:/proj/KDP2002_alltest1p1/q,Quota=10000
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch/q,Quota=20000,Id=scratch
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch_a/q,Quota=20000,Id=scratch_a
then
sed 's/.*KDP2002:\([^,]*\).*/\1/' file.txt
gives output
/proj/KDP2002_alltest1p1_scratch_c/q
/proj/KDP2002_alltest1p1/q
/proj/KDP2002_alltest1p1_scratch/q
/proj/KDP2002_alltest1p1_scratch_a/q
Explanation: I use single capturing group denoted by \( and \) which containg zero-or-more (*) non(^) ,, which is located after KDP2002: with whole replacement prefixed by .* and suffixed by .* to span whole line.
(tested in GNU sed 4.2.2)

1st solution: With your shown samples only, please try following GNU awk code.
awk -v RS='=KDP[0-9]+:([^,]+)' 'RT{split(RT,arr,":");print arr[2]}' Input_file
2nd solution: With any awk version, using awk's match function, with your shown samples please try following code.
awk '
match($0,/=KDP[0-9]+:([^,]+)/){
split(substr($0,RSTART,RLENGTH),arr,":")
print arr[2]
}
' Input_file

Using gnu-grep you can use:
grep -oP '=KDP\d+:\K[^,]+'
/proj/KDP2002_alltest1p1_scratch_c/q
/proj/KDP2002_alltest1p1/q
/proj/KDP2002_alltest1p1_scratch/q
/proj/KDP2002_alltest1p1_scratch_a/q
Here \K resets/discards matched info to give you desired output after KDP\d+: only.
Alternatively you can use this gnu-awk command:
awk 'match($0, /=KDP[0-9]+:([^,]+)/, a) {print a[1]}' file
/proj/KDP2002_alltest1p1_scratch_c/q
/proj/KDP2002_alltest1p1/q
/proj/KDP2002_alltest1p1_scratch/q
/proj/KDP2002_alltest1p1_scratch_a/q

Why doesn't this sed expression remove lines with Korean as expected?

I combined these two answers to produce this sed command:
sed '/[\u3131-\uD79D]/d' text.txt # Remove all lines with Korean characters
However it outputs only the lines with Korean characters:
$ cat text.txt
1
00:00:00,000 --> 00:00:05,410
안녕하세요 오늘은 버터플라이 가드를 하고 있는 상대에게
Hello, today we're going to explain how to use the
$ sed '/[\u3131-\uD79D]/d' text.txt # Korean characters pattern fails
안녕하세요 오늘은 버터플라이 가드를 하고 있는 상대에게
$ sed '/Hello/d' text.txt # Simple pattern works
1
00:00:00,000 --> 00:00:05,410
안녕하세요 오늘은 버터플라이 가드를 하고 있는 상대에게
$ sed '/[0-9]/d' text.txt # Simple range works
안녕하세요 오늘은 버터플라이 가드를 하고 있는 상대에게
Hello, today we're going to explain how to use the
$ sed --version # Git Bash for Windows 2.33.0.windows.2
sed (GNU sed) 4.8
Is this a bug with sed? I was able to use the equivalent command in gVim successfully:
:g/[\u3131-\uD79D]/d

It has to do with the collation order of the expression in the bracket due to sed following POSIX. You need a collation order that sort by numeric Unicode point, C.UTF-8, and then, you need to encode your range characters in utf8. There is an explanation of the details here.
This is how you apply it to your range on a bash shell (I used linux to test it):
$ # first get octal representation of range unicode code points
$ # iconv is to convert to utf-8 in case your locale is not utf-8
$ printf "\u3131\uD79D" | iconv -t utf-8 | od -An -to1
343 204 261 355 236 235
$ # format it as a sed range
$ printf '\o%s\o%s\o%s-\o%s\o%s\o%s' $(printf "\u3131\uD79D" | iconv -t utf-8 | od -An -to1); echo
\o343\o204\o261-\o355\o236\o235
$ # use the range in sed
$ LC_ALL=C.UTF-8 sed '/[\o343\o204\o261-\o355\o236\o235]/d' text.txt
...
$
Here is the output:
$ LC_ALL=C.UTF-8 sed '/[\o343\o204\o261-\o355\o236\o235]/d' text.txt
1
00:00:00,000 --> 00:00:05,410
Hello, today we're going to explain how to use the
$ sed '/[\u3131-\uD79D]/d' text.txt # Korean characters pattern fails
$ sed '/Hello/d' text.txt # Simple pattern works
1
00:00:00,000 --> 00:00:05,410
$ sed '/[0-9]/d' text.txt # Simple range works
Hello, today we're going to explain how to use the
$
EDIT: helper scrip/functions
This bash script or its functions can be used to obtain a sed unicode range:
#!/bin/bash
# sur - sed unicode range
#
# Converts a unicode range into an octal utf-8 range suitable for sed
#
# Usage:
# sur \\u452 \\u490
#
# sur \\u3131 \\uD79D
to_octal() {
printf "$1" | iconv -t utf-8 | od -An -to1 | sed 's/ \([0-9][0-9]*\)/\\o\1/g'
}
sur () {
echo "$(to_octal $1)-$(to_octal $2)"
}
sur $1 $2
To use the script, make sure it is executable and in your PATH. Here is an example on how to use the functions. I just copied and pasted them into a bash shell:
$ to_octal() {
> printf "$1" | iconv -t utf-8 | od -An -to1 | sed 's/ \([0-9][0-9]*\)/\\o\1/g'
> }
$
$ sur () {
> echo "$(to_octal $1)-$(to_octal $2)"
> }
$
$ sur \\u3131 \\uD79D
\o343\o204\o261-\o355\o236\o235
$ sur \\u452 \\u490
\o321\o222-\o322\o220
$

How do I replace the second occurrence of a whitespace in each line with 'sed' or 'awk'?

I have a file hashes which has many lines that look like this:
wget https://ipfs.io/ipfs/QmbKi6XiMmf4YfvKXhqVPymD1HDwJ3WqukjyLuEvnrZrCz The_Supremes_-_My_World_Is_Empty_Without_You_(lyrics).mkv
All the lines in hashes will follow the pattern:
wget https://ipfs.io/ipfs/hashthatis46characterlong nameOfAfileWithoutSpaces
as they are written by my script with the following lines of code:
find ~/pCloudDrive/VisualArts/Films/Fiction_Movies -maxdepth 1 -type f -size +200M -exec ipfs add --nocopy {} \;>>~/CS/ipfs/hashes && \
sed -i 's;added ;wget https://ipfs.io/ipfs/;g' ~/CS/ipfs/hashes
All hashes are going to be 46-character long and they typically start with 'Qm' but this may not necessarily be
the case in the future.
I want to replace the second space of each line of this file with ' -O ' so that it looks like:
wget https://ipfs.io/ipfs/hashthatis46characterlong -O nameOfAfileWithoutSpaces
I tried sed 's/[0-9A-z]{46,46}\s/& -O /g' hashes but to no avail - I get the following output:
sed: -e expression #1, char 27: Invalid range end
How do I do this? Would awk present a better solution for this problem than sed?

Using GNU awk and gensub() to change the second occurrence on each record:
$ awk '{print gensub(/ /," -O ","2")}' file
For example:
$ echo 1 2 3 4 5 | awk '{print gensub(/ /," -O ","2")}'
1 2 -O 3 4 5

As simple as this
sed 's/ / -O /2' input
where the trailing 2 in the sed command means "the second occurrence".

As you have nameOfAfileWithoutSpaces it is possible to get desired result another way using GNU sed, namely:
s/\([^[:space:]]*\)$/-O \1/
this does capture non-whitespace characters which are followed by end of line ($) then does replace by -O followed by these characters. I tested in using sed.js.org and for input
wget https://ipfs.io/ipfs/hashthatis46characterlong nameOfAfileWithoutSpaces
wget https://ipfs.io/ipfs/hashthatis46characterlong anotherName
output is
wget https://ipfs.io/ipfs/hashthatis46characterlong -O nameOfAfileWithoutSpaces
wget https://ipfs.io/ipfs/hashthatis46characterlong -O anotherName

Another awk:
$ awk '{$3="-O" OFS $3}1' file

Regex and sed in sh script not evaluating properly

first post here. Trying to capture just the integer output from an SNMP reply with regex. I've used a regex tester to come up with the correct pattern match but sed refuses to output the result. This is just a primitive fact finding script right now, it'll grow into something more complex but right now this is my stumbling block.
The reply to each line of the snmpget statements are:
IF-MIB::ifInOctets.1001 = Counter32: 692749329
IF-MIB::ifOutOctets.1001 = Counter32: 3119381688
I want to capture just the value after "Counter32: " and the regex (?<=: )(\d+) accomplishes that in the testers I could find online.
#!/bin/sh
SED_IFACES="-e '/(?<=: )(\d+)/g'"
INTERNET_IN=`snmpget -v 2c -c public 123.45.678.9 1.3.6.1.2.1.2.2.1.10.1001` | eval sed $SED_IFACES
INTERNET_OUT=`snmpget -v 2c -c public 123.45.678.9 1.3.6.1.2.1.2.2.1.16.1001` | eval sed $SED_IFACES
echo $INTERNET_IN
echo $INTERNET_OUT

$ cat file
IF-MIB::ifInOctets.1001 = Counter32: 692749329
IF-MIB::ifOutOctets.1001 = Counter32: 3119381688
$ awk '{print $NF}' file
692749329
3119381688
$ sed 's/.* //' < file
692749329
3119381688

You can do
sed 's/^.*Counter32: \(.*\)$/\1/'
Which captures the value and prints it out with the \1.
Also note that you are using Perl regular expressions in your example, and sed does not support these. It is also missing the substitution "s/" part.

Renaming files with sed, escaping issues

I'm trying to write a bash script to remove spaces, underscores and dots and replace them with dashes. I also set to lowercase and remove brackets. That's the (long) second sed command, which seems to work.
The first sed call escapes the original names with spaces with '\ ' like when I tab complete, and this is the issue I think.
If I replace 'mv -i' with 'echo' I get what I think I want: the original filename escaped with backslashes and then the new name. If I paste this into the terminal it works, but with mv in the script the spaces cause problems. The escaping doesn't work.
#!/bin/bash
for a in "$#"; do
mv -i $(echo "$a" | sed -e 's/ /\\\ /g') $(echo "$a" | sed -e 's/\(.*\)/\L\1/' -e 's/_/-/g' -e 's/ /-/g' -e 's/---/--/g' -e 's/(//g' -e 's/)//g' -e 's/\[//g' -e 's/\]//g' -e 's/\./-/g' -e 's/-\([^-]*\)$/\.\1/')
done
The other solution is to put quotes around the names, but I can't work out how I would do this. I feel like I've got close, but I'm stumped.
I've also considered the 'rename' command, but you cannot do multiple operations like you can with sed.
Please point out any other issues, this is one of my first scripts. I'm not sure I got the "$#" or "$a" bits completely correct.
Cheers.
edit:
sample input filename
I am a Badly [named] (file) - PLEASE.rename_me.JPG
should become
i-am-a-badly-named-file--please-rename-me.jpg
edit2: my solution, tweaked from gniourf_gniourf's really helpful pure bash answer:
#!/bin/bash
for a in "$#"; do
b=${a,,} #lowercase
b=${b//[_[:space:]\.]/-} #subst dot,space,underscore with dash
b=${b//---/--} #remove triple dash
b=${b//[()\[\]]/} #remove brackets
if [ "${b%-*}" != "$b" ]; then #if there is a dash (prevents filename.filename)
b=${b%-*}.${b##*-} #replace final dash with a dot for extension
fi
if [ "$a" != "$b" ]; then #if there has been a change
echo '--->' "$b" #
#mv -i -- "$a" "$b" #rename
fi
done
This only fails if the file had spaces etc and no extension (e.g this BAD_filename becomes this-bad.filename. But these are media files and should have an extension, so I would have to sort them anyway.
Again, corrections and improvements welcome. I'm new at this stuff

Try doing this with rename :
rename 's/[_\s\.]/-/g' *files
from the shell prompt. It's very useful, you can put some perl code inside if needed.
You can remove the -n (dry-run mode switch) when your tests become valids.
There are other tools with the same name which may or may not be able to do this, so be careful.
If you run the following command (linux)
$ file $(readlink -f $(type -p rename))
and you have a result like
.../rename: Perl script, ASCII text executable
then this seems to be the right tool =)
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo update-alternatives --set rename /path/to/rename
(replace /path/to/rename to the path of your perl's rename command.
Last but not least, this tool was originally written by Larry Wall, the Perl's dad.

Just for the records, look:
$ a='I am a Badly [named] (file) - PLEASE.rename_me.JPG'
$ # lowercase that
$ echo "${a,,}"
i am a badly [named] (file) - please.rename_me.jpg
$ # Cool! let's save that somewhere
$ b=${a,,}
$ # substitution 's/[_ ]/-/g:
$ echo "${b//[_ ]/-}"
i-am-a-badly-[named]-(file)---please.rename-me.jpg
$ # or better, yet:
$ echo "${b//[_[:space:]]/-}"
i-am-a-badly-[named]-(file)---please.rename-me.jpg
$ # Cool! let's save that somewhere
$ c=${b//[_[:space:]]/-}
$ # substitution 's/---/--/g' (??)
$ echo "${c//---/--}"
i-am-a-badly-[named]-(file)--please.rename-me.jpg
$ d=${c//---/--}
$ # substitution 's/()[]//g':
$ echo "${d//[()\[\]]/}"
i-am-a-badly-named-file--please.rename-me.jpg
$ e="${d//[()\[\]]/}"
$ # substitution 's/\./-/g':
$ echo "${e//\./-}"
i-am-a-badly-named-file--please-rename-me-jpg
$ f=${e//\./-}
$ # substitution 's/-\([^-]*\)$/\.\1/':
$ echo "${f%-*}.${f##*-}"
i-am-a-badly-named-file--please-rename-me.jpg
$ # Done!
Now, here's a 100% bash implementation of what you're trying to achieve:
#!/bin/bash
for a in "$#"; do
b=${a,,}
b=${b//[_[:space:]]/-}
b=${b//---/--}
b=${b//[()\[\]]/}
b=${b//\./-}
b=${b%-*}.${b##*-}
mv -i -- "$a" "$b"
done
yeah, done!
All this standard and known as shell parameter expansion.
Remark. For a more robust script, you could check whether a has an extension (read: a period in its name), otherwise the last substitution of the algorithm fails a little bit. For this, put the following line just below the for statement:
[[ a != *.* ]] && { echo "Oh no, file \`$a' has no extension..."; continue; }
(and isn't the *.* part of this line so cute?)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Indent line ranges with sed? - regex

Related

awk to parse the ldap data between two strings linux

Why doesn't this sed expression remove lines with Korean as expected?

How do I replace the second occurrence of a whitespace in each line with 'sed' or 'awk'?

Regex and sed in sh script not evaluating properly

Renaming files with sed, escaping issues

Categories

Resources