How to parse input parameter using regex in bash script? - regex

I have the following bash script (I execute it using msysgit). The file is named git-open:
#!/usr/bin/env bash
tempfile=`mktemp` || exit 1
git show $1 > $tempfile
notepad++ -multiInst -notabbar -nosession -noPlugin $tempfile
rm $tempfile
I invoke it through git like so:
git open master:Applications/Survey/Source/Controller/SurveyManager.cpp
Before I open this in notepad++, I want it to append the extension to the temporary file so that the editor automatically applies the correct syntax highlighting. If there is no extension specified, then mktemp shouldn't have to add an extension.
How can I modify my script above to work like this? I have very little experience with linux scripting, so I'm not sure how to implement a regex for this (assuming regex is necessary).

You can pass mktemp a template for your file name.
tempfile=$(mktemp -t git-open.XXXXXXXX.${1##*.}) || exit 1

Regular expressions are overkill for this. Glob patterns in parameter expansion with prefix removal is completely sufficient.
tempfile=`mktemp`.${1##*.}
The ${1##*.} means "expand $1 but remove longest prefix that matches the globbing pattern *.. * matches everything and . matches itself, so this removes everything up to and including last .. What remains is the extension.
Instead of the ## you can also use # for shortest prefix, % for shortest suffix and %% for longest prefix.
Ok, you probably want to handle cases where there is no extension. That can be done with help of case and more glob patterns:
case ${1##*/} in
*.*) suffix=${1##*.};;
*) suffix='';;
esac
tempfile=`mktemp`$suffix
This will take the filename without leading directories, test whether it contains . and use the suffix only if it does. Or you can compare the expansion with the original as devnull suggests.

Related

Sed - How to read a file line by line and go the path mentioned in the file then replace string?

I am on a new project where I need to add some strings to all the API names, which are exported
Someone hinted this can be done with simple sed commands.
What really needed is : Example :
In my project say 100 files and many files have something like the below pattern
in file1 its mentioned at some line : export(xyx);
in file2 its mentioned at some line : export (abc);
What is needed here is to replace the
xyz with xyz_temp and
abc with abc_temp.
Now the problem is these APIs are in different folders and different files.
Fortunately, I got to know we can redirect the result of cscope tool to some file with matching patterns.
so I did redirect the result of a search of the "export" string and I got below. Say file I have exported the scope result - export_api.txt as below.
/path1/file1.txt export(xyz);
/path2/file2.txt export(abc);
Now, I am not sure how to use sed to do this automation of
Reading this export_ap.txt
Reading each line
Replacing the string as above.
Any direction would highly appriciated.
Thanks in advance.
If you have a list of files which need to be changed and your replacement only needs to append _tmp, then this can be accomplished with a single sed call:
sed -i 's/export(\(abc\|xyz\));/export(\1_tmp);/' files...
-i will modify the files in-place, overwriting them.
If you don't care for what you are going to replace, but append a postfix to all export expressions, match any identifier. Here is one such example:
export(\([^)]*\))
Depending on your expressions and valid identifier names, you might want to or need to change this to one of:
export(\(.*\))
export(\([_a-zA-Z][_a-zA-Z0-9]*\))
export(\([_a-zA-Z"'][_a-zA-Z0-9"']*\))
export(\([_a-zA-Z]*\))
…
Another option would be to only match lines containing "export(" and then replace the closing parenthisis (given that your input lines contain the token ");" only once):
sed -i '/export(/s/);/_tmp);/' files...
# or reusing the complete match:
sed -i '/export(/s/);/_tmp&/' files...
This avoids the backreference and makes the regular expression simpler, because they can now be of fixed size
You can use the read builtin to parse the line in your export_api.txt file, then call sed on each file. Pattern match the export snippet to choose the correct sed invocation. The way read is invoked here assumes that your path and snippet are delimited by IFS and that path does not contain any whitespace or separators:
while read -r path snippet; do
case "$snippet" in
*abc*) sed -i 's/export(abc);/export(abc_tmp);/' "$path" ;;
*xyz*) sed -i 's/export(xyz);/export(xyz_tmp);/' "$path" ;;
esac
done < export_api.txt
NOTE: this will change/overwrite any of your files. Your files might be left in a broken state.
PS I wonder why you cannot use your IDE to search/replace those occurrences?

regular expression for "11th to 16th letter"

I am new to regular expression. Need help for reading files in unix system. I want to apply regular expression on ls command.
I have below files :
DLERMS08001708161708209683.csv.gz
DLERMS13001708161330170816.csv.gz
DLERMS13001708171330170816.csv.gz
and would like to extract files which have 170816 between 11th record to 16th digit.
I tried with below command ls *170816*.gz. However I am getting 3 filenames instead of two. I want only first two filenames instead of all 3. Could you please help.
Also want to add here that my third filename already contains 170816 at the end DLERMS13001708171330170816.csv.gz. I want to avoid this in my ls command output.
Using bash parameter-expansion alone,
for file in *.csv.gz; do
[ -e "$file" ] || continue
[ "${file:10:6}" == "170816" ] && printf "%s\n" "$file"
done
${PARAMETER:OFFSET:LENGTH}
This one can expand only a part of a parameter's value, given a position to start and maybe a length. If LENGTH is omitted, the parameter will be expanded up to the end of the string. If LENGTH is negative, it's taken as a second offset into the string, counting from the end of the string
Based on comments from below, apparently OP wants to copy the files intended to an alternate path, in which case the printf() should be replaced with cp with necessary arguments
[ "${file:10:6}" == "170816" ] && cp -- "$file" path/to/destination
Firstly, be careful not to confuse regular expressions with shell glob patterns (which is what you want here).
Your glob could be:
??????????170816*.gz
Which matches 10 unknown characters followed by the sequence you specified.
Depending on your next step, you might not need to use ls at all, for example you can loop over these files like this:
for file in ??????????170816*.gz; do
something_with "$file"
done
Or output the files that match using one of the following:
echo ??????????170816*.gz
printf '%s\n' ??????????170816*.gz
If there is a possibility that no files match, then you may wish to consider enabling nullglob (using shopt -s nullglob), which would expand to nothing in that case.
If you want to use globbing, it's not the same as using regular expression.
In your example you can use "?" as a placeholder for matching a single character:
Hence to achieve what you want as output, use ls with pattern below -
ls ??????????170816*
You want to use the wildcard (not regex) "any single letter" ? appropriatly often.
ls DLERMS????170816*.csv.gz
Regexes are much more flexible/powerful and overkill for this simple use case.
But as far as I know, ls does not support them, so you would have to go via other bash tools to identify the files in case you ever need to actually use regexes for anything.
I also reflected what I perceive to be another common of your filenames, the DLERMS at the beginning, if that is NOT common, replace those letter by ?, too.
Try this:
ls ??????????170816*
A solution with find and regex
find . -regextype egrep -regex "^.{12}170816.*\.gz"
find read: ./xxxxxxxxxxxxx and .{12} means the first twelve, so 170816 is the expression between 13th record to 18th
I don't think you can use regular expressions with ls directly, but with egrep, it works fine.
ls * | egrep "DLERMS[0-9]{4}170816[0-9]{10}.csv.gz"
[0-9]{4} - any number, four times.
[0-9]{10} - any number, ten times.
Also could be used instead "egrep" the command "grep -E", the -E option allows especial regular expressions like "[{|" without need to escape them "\".

How to rename a file using regex capture group in Linux?

I want to rename a_1.0.tgz to b_1.0.tgz, since 1.0 may be changed to any version number, how can I achieve that?
For example, I can use mv a*.tgz b.tgz if I don't need to keep the version number.
zsh comes with the utility zmv, which is intended for exactly that. While zmv does not support regex, it does provide capture groups for filename generation patterns (aka globbing).
First, you might need to enable zmv. This can be done by adding the following to your ~/.zshrc:
autoload -Uz zmv
You can then use it like this:
zmv 'a_(*)' 'b_$1'
This will rename any file matching a_* so, that a_ is replaced by b_. If you want to be less general, you can of course adjust the pattern:
to rename only .tgz files:
zmv 'a_(*.tgz)' 'b_$1'
to rename only .tgz files while changing the extension to .tar.gz
zmv 'a_(*).tgz' 'b_$1.tar.gz'
to only rename a_1.0.tgz:
zmv 'a_(1.0.tgz)' 'b_$1'
To be on the save side, you can run zmv with the option -n first. This will only print, what would happen, but not actually change anything. For more information have a look at the man zshcontrib.
I'm not too familiar with zsh so I don't know if it supports regular expressions but I don't think you really need them here.
You can match the file using a glob and use a substitution:
for file in a_[0-9].[0-9].tgz; do
echo "$file" "${file/a/b}"
done
In the glob pattern, [0-9] matches any number between 0 and 9. ${file/a/b} substitutes the first occurrence of a with b.
Change the echo to mv if you're happy with the result.
Assuming you would like to replace the first character in all files matching a*.tgz with the letter b:
for f in a*.tgz; do
echo mv "$f" "b${f:1}"
done
Remove the echo when you are certain that this does what you want it to do.
The ${f:1} uses the ${name:offset} parameter expansion. From the zshexpn manual (on OS X):
If offset is non-negative, then if the variable name is a
scalar substitute the contents starting offset characters
from the first character of the string, [...]

Rename multiple files with regular expression

I downloaded some files from internet. In the name field of those files each ' ' character is replaced by "%20". I want to rename all of those but number of files is too high. So manual approach would be clumsy. I know from command line with regular expression this can be done but I am not very familiar with it. So little help is needed.
Summary is, I want to rename all files in a directory by replacing all "%20" patterns with " ". How can I do it?
Sample:
17%20Clipping.cpp --> 17 Clipping.cpp
14%20Mouse%20(Button)%20Listener.cpp --> 14 Mouse (Button) Listener.cpp
you can rename a group of files using command rename that accept regular expression
For example, to rename all files matching "*.bak" to strip the extension, you might say
rename 's/\.bak$//' *.bak
To translate uppercase names to lower, you'd use
rename 'y/A-Z/a-z/' *
and your answer:
rename 's/%20/ /' *.cpp
I would recommend against putting spaces in filenames (maybe use underscore instead). Regardless, here is a command that will do it:
for i in *%20*; do new=$(echo $i|sed 's/%20/ /'); echo mv $i "$new"; done
In its current form it merely prints the commands it would execute. Once you're sure it does what you want, remove the echo.
As #ronmrdechai suggests, the following is an improvement:
for i in *%20*; do echo mv $i "${i/\%20/ }"; done
The backslash is needed in the pattern because % is a metacharacter (match at end) in this case.

Create directory based on part of filename

First of all, I'm not a programmer — just trying to learn the basics of shell scripting and trying out some stuff.
I'm trying to create a function for my bash script that creates a directory based on a version number in the filename of a file the user has chosen in a list.
Here's the function:
lav_mappe () {
shopt -s failglob
echo "[--- Choose zip file, or x to exit ---]"
echo ""
echo ""
select zip in $SRC/*.zip
do
[[ $REPLY == x ]] && . $HJEM/build
[[ -z $zip ]] && echo "Invalid choice" && continue
echo
grep ^[0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}$ $zip; mkdir -p $MODS/out/${ver}
done
}
I've tried messing around with some other commands too:
for ver in $zip; do
grep "^[0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}$" $zip; mkdir -p $MODS/out/${ver}
done
And also find | grep — but I'm doing it wrong :(
But it ends up saying "no match" for my regex pattern.
I'm trying to take the filename the user has selected, then grep it for the version number (ALWAYS x.xx.x somewhere in the filename), and fianlly create a directory with just that.
Could someone give me some pointers what the command chain should look like? I'm very unsure about the structure of the function, so any help is appreciated.
EDIT:
Ok, this is how the complete function looks like now: (Please note, the sed(1) commands besides the directory creation is not created by me, just implemented in my code.)
Pastebin (Long code.)
I've got news for you. You are writing a Bash script, you are a programmer!
Your Regular Expression (RE) is of the "wrong" type. Vanilla grep uses a form known as "Basic Regular Expressions" (BRE), but your RE is in the form of an Extended Regular Expression (ERE). BRE's are used by vanilla grep, vi, more, etc. EREs are used by just about everything else, awk, Perl, Python, Java, .Net, etc. Problem is, you are trying to look for that pattern in the file's contents, not in the filename!
There is an egrep command, or you can use grep -E, so:
echo $zip|grep -E '^[0-9]\.[0-9]{1,2}\.[0-9]{1,2}$'
(note that single quotes are safer than double). By the way, you use ^ at the front and $ at the end, which means the filename ONLY consists of a version number, yet you say the version number is "somewhere in the filename". You don't need the {1} quantifier, that is implied.
BUT, you don't appear to be capturing the version number either.
You could use sed (we also need the -E):
ver=$(echo $zip| sed -E 's/.*([0-9]\.[0-9]{1,2}\.[0-9]{1,2}).*/\1/')
The \1 on the right means "replace everything (that's why we have the .* at front and back) with what was matched in the parentheses group".
That's a bit clunky, I know.
Now we can do the mkdir (there is no merit in putting everything on one line, and it makes the code harder to maintain):
mkdir -p "$MODS/out/$ver"
${ver} is unnecessary in this case, but it is a good idea to enclose path names in double quotes in case any of the components have embedded white-space.
So, good effort for a "non-programmer", particularly in generating that RE.
Now for Lesson 2
Be careful about using this solution in a general loop. Your question specifically uses select, so we cannot predict which files will be used. But what if we wanted to do this for every file?
Using the solution above in a for or while loop would be inefficient. Calling external processes inside a loop is always bad. There is nothing we can do about the mkdir without using a different language like Perl or Python. But sed, by it's nature is iterative, and we should use that feature.
One alternative would be to use shell pattern matching instead of sed. This particular pattern would not be impossible in the shell, but it would be difficult and raise other questions. So let's stick with sed.
A problem we have is that echo output places a space between each field. That gives us a couple of issues. sed delimits each record with a newline "\n", so echo on its own won't do here. We could replace each space with a new-line, but that would be an issue if there were spaces inside a filename. We could do some trickery with IFS and globbing, but that leads to unnecessary complications. So instead we will fall back to good old ls. Normally we would not want to use ls, shell globbing is more efficient, but here we are using the feature that it will place a new-line after each filename (when used redirected through a pipe).
while read ver
do
mkdir "$ver"
done < <(ls $SRC/*.zip|sed -E 's/.*([0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}).*/\1/')
Here I am using process substitution, and this loop will only call ls and sed once. BUT, it calls the mkdir program n times.
Lession 3
Sorry, but that's still inefficient. We are creating a child process for each iteration, to create a directory needs only one kernel API call, yet we are creating a process just for that? Let's use a more sophisticated language like Perl:
#!/usr/bin/perl
use warnings;
use strict;
my $SRC = '.';
for my $file (glob("$SRC/*.zip"))
{
$file =~ s/.*([0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}).*/$1/;
mkdir $file or die "Unable to create $file; $!";
}
You might like to note that your RE has made it through to here! But now we have more control, and no child processes (mkdir in Perl is a built-in, as is glob).
In conclusion, for small numbers of files, the sed loop above will be fine. It is simple, and shell based. Calling Perl just for this from a script will probably be slower since perl is quite large. But shell scripts which create child processes inside loops are not scalable. Perl is.