How to extract a substring using sed on OS X? - regex

Im trying to iterate over each file and folder inside a directory and extract part of the file name into a variable, but I can't make sed work correctly. I either get all of the file name or none of it.
This version of the script should capture the entire file name:
#!/bin/bash
for f in *
do
substring=`echo $f | sed -E -n 's/(.*)/\1/'`
echo "sub: $substring"
done
But instead I get nothing:
sub:
sub:
sub:
sub:
...
This version should give me just the first character in the filename:
#!/bin/bash
for f in *
do
substring=`echo $f | sed -E 's/^([a-zA-Z])/\1/'`
echo "sub: $substring"
done
But instead I get the whole file name:
sub: Adlm
sub: Applications
sub: Applications (Parallels)
sub: Desktop
...
I've tried numerous iterations of it and what it basically boils down to is that if I use -n I get nothing and if I don't I get the whole file name.
Can someone show me how to get just the first character?
Or, my overall goal is to be able to extract a substring and store it into a variable, if anybody has a better approach to it, that would be appreciated as well.
Thanks in advance.

If you want to modify a shell parameter, you probably want to use a parameter expansion.
for f in *; do
# This version should expand to the whole parameter
echo "$f"
# This version should expand to the first character in the filename
echo "${f::1}"
done
Parameter expansions are not as powerful as sed, but they are built in to the shell (no launching a separate process or subshell necessary) and there are expansions for:
Substrings (as above)
Replacing and substituting characters
Altering the case of strings (bash 4+)
and more.

This version of the script should capture the entire file name:
sed -E -n 's/(.*)/\1/'
But instead I get nothing.
You used -n so naturally it won't yield anything. Perhaps you should remove -n or add p:
sed -E -n 's/(.*)/\1/p'
This version should give me just the first character in the filename:
sed -E 's/^([a-zA-Z])/\1/'
But instead I get the whole file name,
You didn't replace anything there. Perhaps what you wanted was
sed -E 's/^([a-zA-Z]).*/\1/'
Also I suggest quoting your arguments well:
substring=`echo "$f" | sed ...'`
Finally the simpler method is to use substring expansion if you're using Bash as suggested by kojiro.

You forget to add .* after the capturing group in sed,
$ for i in *; do substring=`echo $i | sed -E 's/^(.).*$/\1/'`; echo "sub: $substring"; done
It's better to use . instead of [a-zA-Z] because it may fail if the first character starts with any special character.

I prefer awk to sed. It seems to be easier for me to understand.
#!/bin/bash
#set -x
for f in *
do
substring=`echo $f | awk '{print substr($1,1,1)}'`
echo "sub: $substring"
done

Related

replace string with underscore and dots using sed or awk

I have a bunch of files with filenames composed of underscore and dots, here is one example:
META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
I want to remove the part that contains .bed.nodup.sortedbed.roadmap.sort.fgwas.gz. so the expected filename output would be META_ALL_whrAdjBMI_GLOBAL_August2016.r0-ADRL.GLND.FET-EnhA.out.params
I am using these sed commands but neither one works:
stringZ=META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
echo $stringZ | sed -e 's/\([[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.\)//g'
echo $stringZ | sed -e 's/\[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.//g'
Any solution is sed or awk would help a lot
Don't use external utilities and regexes for such a simple task! Use parameter expansions instead.
stringZ=META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
echo "${stringZ/.bed.nodup.sortedbed.roadmap.sort.fgwas.gz}"
To perform the renaming of all the files containing .bed.nodup.sortedbed.roadmap.sort.fgwas.gz, use this:
shopt -s nullglob
substring=.bed.nodup.sortedbed.roadmap.sort.fgwas.gz
for file in *"$substring"*; do
echo mv -- "$file" "${file/"$substring"}"
done
Note. I left echo in front of mv so that nothing is going to be renamed; the commands will only be displayed on your terminal. Remove echo if you're satisfied with what you see.
Your regex doesn't really feel too much more general than the fixed pattern would be, but if you want to make it work, you need to allow for more than one lower case character between each dot. Right now you're looking for exactly one, but you can fix it with \+ after each [[:lower:]] like
printf '%s' "$stringZ" | sed -e 's/\([[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.\)//g'
which with
stringZ="META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params"
give me the output
META_ALL_whrAdjBMI_GLOBAL_August2016.r0-ADRL.GLND.FET-EnhA.out.params
Try this:
#!/bin/bash
for line in $(ls -1 META*);
do
f2=$(echo $line | sed 's/.bed.nodup.sortedbed.roadmap.sort.fgwas.gz//')
mv $line $f2
done

sed to insert a string after matching pattern on a same line

I need to insert a command (as string) to an existing file after a certain match. The existing string is a long make command and I only need to modify it by inserting another string at specific location. I tried using sed but it either adds a new line before/after the matching string or replaces it. I'd like to know if at least it is possible to accomplish what I want with sed or should I be using something else? Could you please provide me with some hints?
Example:
The file contains two make commands and I am only interested in the second one without bbnote.
oe_runmake_call() {
bbnote make -j 8 CROSS_COMPILE=arm-poky-linux-gnueabi- CC="arm-poky-linux-gnueabi-gcc" "$#"
make -j 8 CROSS_COMPILE=arm-poky-linux-gnueabi- CC="my_command_here arm-poky-linux-gnueabi-gcc" --sysroot=/some/path "$#"
}
Thanks in advance!
Here's the code:
http://hastebin.com/tigatoquje.go
You could do something like this using Sed:
sed -r 's:(^\s+make.+ CC=\"):\1your_command_here :g' file.log >outfile.log
or with sed in-place edit:
sed -ir 's:(^\s+make.+ CC=\"):\1your_command_here :g' file.log
Without sed regex option:
sed 's:\(^\s\+make.\+ CC=\"\):\1your_command_here :g' file.log > outfile.log
Outputs:
oe_runmake_call() {
bbnote make -j 8 CROSS_COMPILE=arm-poky-linux-gnueabi- CC="arm-poky-linux-gnueabi-gcc" "$#"
make -j 8 CROSS_COMPILE=arm-poky-linux-gnueabi- CC="your_command_here arm-poky-linux-gnueabi-gcc" --sysroot=/some/path "$#"
}
How:
sed -r 's:(^\s+make.+ CC=\"):\1your_command_here :g'
-r = regex option
^make(CC=\") = starts with make and set a capture group on CC="
\1your_command_here = \1 reference capture group then add command text
You could use perl.
Replace YOUR_COMMAND with what you want added. This assumes your file is in file.txt:
perl -i.bak -pl -e '/^make/ and s/(CC=".*")/$1 YOUR_COMMAND /' file.txt

How to cut a string from a string

My script gets this string for example:
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
let's say I don't know how long the string until the /importance.
I want a new variable that will keep only the /importance/lib1/lib2/lib3/file from the full string.
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
Here is the command in my code:
find <main_path> -name file | sed 's/.*importance//
I am not familiar with the regex, so I need your help please :)
Sorry my friends I have just wrong about my question,
I don't need the output /importance/lib1/lib2/lib3/file but /importance/lib1/lib2/lib3 with no /file in the output.
Can you help me?
I would use awk:
$ echo "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file" | awk -F"/importance/" '{print FS$2}'
importance/lib1/lib2/lib3/file
Which is the same as:
$ awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
importance/lib1/lib2/lib3/file
That is, we set the field separator to /importance/, so that the first field is what comes before it and the 2nd one is what comes after. To print /importance/ itself, we use FS!
All together, and to save it into a variable, use:
var=$(find <main_path> -name file | awk -F"/importance/" '{print FS$2}')
Update
I don't need the output /importance/lib1/lib2/lib3/file but
/importance/lib1/lib2/lib3 with no /file in the output.
Then you can use something like dirname to get the path without the name itself:
$ dirname $(awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file")
/importance/lib1/lib2/lib3
Instead of substituting all until importance with nothing, replace with /importance:
~$ echo $var
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
~$ sed 's:.*importance:/importance:' <<< $var
/importance/lib1/lib2/lib3/file
As noted by #lurker, if importance can be in some dir, you could add /s to be safe:
~$ sed 's:.*/importance/:/importance/:' <<< "/dir1/dirimportance/importancedir/..../importance/lib1/lib2/lib3/file"
/importance/lib1/lib2/lib3/file
With GNU sed:
echo '/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file' | sed -E 's#.*(/importance.*)#\1#'
Output:
/importance/lib1/lib2/lib3/file
pure bash
kent$ a="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
kent$ echo ${a/*\/importance/\/importance}
/importance/lib1/lib2/lib3/file
external tool: grep
kent$ grep -o '/importance/.*' <<<$a
/importance/lib1/lib2/lib3/file
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
You were very close. All you had to do was substitute back in importance:
sed 's/.*importance/importance/'
However, I would use Bash's built in pattern expansion. It's much more efficient and faster.
The pattern expansion ${foo##pattern} says to take the shell variable ${foo} and remove the largest matching glob pattern from the left side of the shell variable:
file_name="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
file_name=${file_name##*importance}
Removeing the /file at the end as you ask:
echo '<path>' | sed -r 's#.*(/importance.*)/[^/]*#\1#'
Input /dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
Returns: /importance/lib1/lib2/lib3
See this "Match groups" tutorial.

replace string to asterisk bash

I am trying to get from user a path as an input.
The user will enter a specific path for specific application:
script.sh /var/log/dbhome_1/md5
I've wanted to convert the number of directory (in that case - 1) to * (asterisk). later on, the script will do some logic on this path.
When i'm trying sed on the input, i'm stuck with the number -
echo "/var/log/dbhome_1/md5" | sed "s/dbhome_*/dbhome_\*/g"
and the input will be -
/var/log/dbhome_*1/md5
I know that i have some problems with the asterisk wildcard and as a char...
maybe regex will help here?
Code for GNU sed:
sed "s#1/#\*/#"
.
$echo "/var/log/dbhome_1/md5" | sed "s#1/#\*/#"
"/var/log/dbhome_*/md5"
Or more general:
sed "s#[0-9]\+/#\*/#"
.
$echo "/var/log/dbhome_1234567890/md5" | sed "s#[0-9]\+/#\*/#"
"/var/log/dbhome_*/md5"
use this instead:
echo "/var/log/dbhome_1/md5" | sed "s/dbhome_[0-9]\+/dbhome_\*/g"
[0-9] is a character class that contains all digits
Thus [0-9]\+ matches one or more digits
If your script is in bash (which I assume when I see the tag, but I also doubt it when I see its name script.sh which seems to have the wrong extension for a bash script), you might as well use pure bash stuff: /var/log/dbhome_1/md5 will very likely be in positional parameter $1, and what you want will be achieved by:
echo "${1//dbhome_+([[:digit:]])/dbhome_*}"
If this seems to fail, it's probably because your extglob shell optional behavior is turned off. In this case, just turn it on with
shopt -s extglob
Demo:
$ shopt -s extglob
$ a=/var/log/dbhome_1234567/md5
$ echo "${a//dbhome_+([[:digit:]])/dbhome_*}"
/var/log/dbhome_*/md5
$
Done!

Sed substitute recursively

echo ddayaynightday | sed 's/day//g'
It ends up daynight
Is there anyway to make it substitute until no more match ?
My preferred form, for this case:
echo ddayaynightday | sed -e ':loop' -e 's/day//g' -e 't loop'
This is the same as everyone else's, except that it uses multiple -e commands to make the three lines and uses the t construct—which means "branch if you did a successful substitution"—to iterate.
This might work for you:
echo ddayaynightday | sed ':a;s/day//g;ta'
night
The g flag deliberately doesn't re-match against the substituted portion of the string. What you'll need to do is a bit different. Try this:
echo ddayaynightday | sed $':begin\n/day/{ s///; bbegin\n}'
Due to BSD Sed's quirkiness the embedded newlines are required. If you're using GNU Sed you may be able to get away with
sed ':begin;/day/{ s///; bbegin }'
with bash:
str=ddayaynightday
while true; do tmp=${str//day/}; [[ $tmp = $str ]] && break; str=$tmp; done
echo $str
The following works:
$ echo ddayaynightday | sed ':loop;/day/{s///g;b loop}'
night
Depending on your system, the ; may not work to separate commands, so you can use the following instead:
echo ddayaynightday | sed -e ':loop' -e '/day/{s///g
b loop}'
Explanation:
:loop # Create the label 'loop'
/day/{ # if the pattern space matches 'day'
s///g # remove all occurrence of 'day' from the pattern space
b loop # go back to the label 'loop'
}
If the b loop portion of the command is not executed, the current contents of the pattern space are printed and the next line is read.
Ok, here they're: while and strlen in bash.
Using them one may implement my idea:
Repeat until its length will stop changing.
There's neither way to set flag nor way to write such regex, to "substitute until no more match".