Updating version numbers before git commit using OS X Bash sed - regex

I wish to use git log --oneline | wc -l to provide the revision number for sed substitution in the following Lua file (simplified) during each commit using a script and $GIT/hooks:
-- settings.lua
local module = {}
-- major, minor, build, and revision number
module.version = "10.0.0.0"
return module
I'm testing from the command-line with this:
sed -E 's/(version\s*=\s*\"\d+\.\d+\.\d+\.)/\1foo"/' settings.lua
All sed outputs is the following verbatim copy of the input:
-- settings.lua
local module = {}
-- major, minor, build, and revision number
module.version = "10.0.0.0"
return module
I must be totally misunderstanding the things I'm ready about sed and its purpose in life. Coming from a 20 years background C/C++ and in now C# I'm amazed I can't get my head around this. I know that pattern matches!
My sed understanding is that it will match version = "10.0.0.0" and then change the entire matched string to version = "10.0.0.foo" which once done I can simple " and a > settings.au to replace the original file with the substation in place.
As usual I've drifted way too far off track thinking something was going to be simple but wasn't and just because I want it. I was really enjoying the Lua part too.

In sed, \d wouldn't work the way you expect it to work. Use [0-9] or [[:digit:]] instead of \d
EDIT
Another way to do it:
sed -i.bak '/module.version/s/"$/foo"/' File
AMD$ cat File
-- settings.lua
local module = {}
-- major, minor, build, and revision number
module.version = "10.0.0.0foo"
return module
For lines matching module.version, substitute the last " with foo".
The above command will edit the file inplace, keeping a backup.

Related

Using regex in git shell when checking out multiple branches

If I have the following git shell command:
for branch in `git branch -a | grep remotes | grep -v master | sed 's/^.*old\/\(.*\)/\1/g'`; do git branch --track $branch remotes/old/$branch; done
This checks out every single remote branch that exists on the old remote and tracks them using the same name that they have on that remote. However, what if I wanted to slightly change the name that the local branches are checked out has?
What if I have the following remote branches:
release/1.2.1.0
release/1.2.1.1
And I want to check them out under the same parent folder release but I only want the last 3 digits in the version number. So I want my local branches to be:
release/2.1.0
release/2.1.1
I have a simple javascript regex that matches the last 3 digits of the version string: (?:\d\.)(\d.*)
This uses a non-matching group to toss out the first digit followed by the period. The question is, how do I apply that regex to the $branch variable in the git shell bash script above?
First, avoid git branch for loops like this. The correct tool here is git for-each-ref, which is designed to work with scripting languages (git branch is aimed at users and the output format may change in the future, for instance).
To loop over all remote-tracking branches, simply tell for-each-ref to scan the remote-tracking branch namespace. Since you want, more specifically, the remote named old, you can do that very easily by adding /old as well:
git for-each-ref refs/remotes/old
The output here defaults to a triple of objectname objecttype refname. We only care about the refname part (and we can use the :short modifier to drop refs/remotes/ as well, if we like, although we still need to drop the old/ too so we could get away without the modifier). Thus we want to include --format=%(refname:short).
Moving on to bash, bash has built-in regular expression support. Its RE syntax is not quite the same, though, so your existing RE must change. Here is one that probably works for your needs:
bash$ x=1.2.3.4
bash$ [[ $x =~ ([0-9]\.)([0-9.]+) ]] && echo ${BASH_REMATCH[2]}
2.3.4
(There is a bit of subtlety here: using $x changes the way the =~ match applies, which in our case is probably good. As an old school Unix person I generally prefer using expr myself, but in this case I might resort to doing this in Python, which has Perl-style REs, and Javascript/ECMAscript REs are modeled on Perl's. But all that is more or less irrelevant. The most important is that this RE is slightly sub-par as a version number matcher. For instance, it matches strings like "1.3..6". We're safe in that these are invalid branch names—double dots are verboten since they would conflict with the set subtraction syntax in gitrevisions—but it's generally a bit sloppy; with some work we could come up with a tighter expression. It also fails to match revisions starting with two or more digits, but your original RE did as well, so I left that in on purpose.)
Reading in a loop in shell, using -r is generally wise (see Etan Reisner's comment), although in this case we could omit it safely since git controls branch names. I will use it in the example just for form's sake.
Putting these all together:
warn() {
echo "warning: $#" 1>&2
}
# Given an input name release/\d\.(\d|\.)+, make
# a local branch named release/\2 (more or less).
make_local_release_branch() {
local relnum newname
relnum=${1#old/release/}
[[ $relnum =~ ([0-9]\.)([0-9.]+) ]] || {
warn "remote-tracking branch $1 does not conform to name style, ignored"
return
}
newname=release/${BASH_REMATCH[2]}
git rev-parse -q --verify refs/heads/$newname >/dev/null && {
warn "branch $newname already exists, remote-tracking branch $1 ignored"
return
}
git branch --track $branch $1
}
git for-each-ref --format='%(refname:short)' refs/remotes/old |
while read -r rmtbranch; do
case $rmtbranch in
old/release/[0-9]*) make_local_release_branch "$rmtbranch";;
*) warn "skipping remote branch $rmtbranch -- not old/release/[digit]";;
esac
done
(this whole thing is entirely untested).

What is the difference b/w two sed commands below?

Information about the environment I am working in:
$ uname -a
AIX prd231 1 6 00C6B1F74C00
$ oslevel -s
6100-03-10-1119
Code Block A
( grep schdCycCleanup $DCCS_LOG_FILE | sed 's/[~]/ \
/g' | grep 'Move(s) Exist for cycle' | sed 's/[^0-9]*//g' ) > cycleA.txt
Code Block B
( grep schdCycCleanup $DCCS_LOG_FILE | sed 's/[~]/ \n/g' | grep 'Move(s) Exist for cycle' | sed 's/[^0-9]*//g' ) > cycleB.txt
I have two code blocks(shown above) that make use of sed to trim the input down to 6 digits but one command is behaving differently than I expected.
Sample of input for the two code blocks
Mar 25 14:06:16 prd231 ajbtux[33423660]: 20160325140616:~schd_cem_svr:1:0:SCHD-MSG-MOVEEXISTCYCLE:200705008:AUDIT:~schdCycCleanup - /apps/dccs/ajbtux/source/SCHD/schd_cycle_cleanup.c - line 341~ SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210~
I get the following output when the sample input above goes through the two code blocks.
cycleA.txt content
389210
cycleB.txt content
25140616231334236602016032514061610200705008341389210
I understand that my last piped sed command (sed 's/[^0-9]*//g') is deleting all characters other than numbers so I omitted it from the block codes and placed the output in two additional files. I get the following output.
cycleA1.txt content
SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210
cycleB1.txt content
Mar 25 15:27:58 prd231 ajbtux[33423660]: 20160325152758: nschd_cem_svr:1:0:SCHD-MSG-MOVEEXISTCYCLE:200705008:AUDIT: nschdCycCleanup - /apps/dccs/ajbtux/source/SCHD/schd_cycle_cleanup.c - line 341 n SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210 n
I can see that the first code block is removing every thing other that (SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210) and is using the tilde but the second code block is just replacing the tildes with the character n. I can also see that it is necessary in the first code block for a line break after this(sed 's/[~]/ ) and that is why I though having \n would simulate a line break but that is not the case. I think my different output results are because of the way regular expressions are being used. I have tried to look into regular expressions and searched about them on stackoverflow but did not obtain what I was looking for. Could someone explain how I can achieve the same result from code block B as code block A without having part of my code be on a second line?
Thank you in advance
This is an example of the XY problem (http://xyproblem.info/). You're asking for help to implement something that is the wrong solution to your problem. Why are you changing ~s to newlines, etc when all you need given your posted sample input and expected output is:
$ sed -n 's/.*schdCycCleanup.* \([0-9]*\).*/\1/p' file
389210
or:
$ awk -F'[ ~]' '/schdCycCleanup/{print $(NF-1)}' file
389210
If that's not all you need then please edit your question to clarify your requirements for WHAT you are trying to do (as opposed to HOW you are trying to do it) as your current approach is just wrong.
Etan Reisner's helpful answer explains the problem and offers a single-line solution based on an ANSI C-quoted string ($'...'), which is appropriate, given that you originally tagged your question bash.
(Ed Morton's helpful answer shows you how to bypass your problem altogether with a different approach that is both simpler and more efficient.)
However, it sounds like your shell is actually something different - presumably ksh88, an older version of the Korn shell that is the default sh on AIX 6.1 - in which such strings are not supported[1]
(ANSI C-quoted strings were introduced in ksh93, and are also supported not only in bash, but in zsh as well).
Thus, you have the following options:
With your current shell, you must stick with a two-line solution that contains an (\-escaped) actual newline, as in your code block A.
Note that $(printf '\n') to create a newline does not work, because command substitutions invariably trim all trailing newlines, resulting in the empty string in this case.
Use a more modern shell that supports ANSI C-quoted strings, and use Etan's answer. http://www.ibm.com/support/knowledgecenter/ssw_aix_61/com.ibm.aix.cmds3/ksh.htm tells me that ksh93 is available as an alternative shell on AIX 6.1, as /usr/bin/ksh93.
If feasible: install GNU sed, which natively understands escape sequences such as \n in replacement strings.
[1] As for what actually happens when you try echo 'foo~bar~baz' | sed $'s/[~]/\\\n/g' in a POSIX-like shell that does not support $'...': the $ is left as-is, because what follow is not a valid variable name, and sed ends up seeing literal $s/[~]/\\\n/g, where the $ is interpreted as a context address applying to the last input line - which doesn't make a difference here, because there is only 1 line. \\ is interpreted as plain \, and \n as plain n, effectively replacing ~ instances with literal \n sequences.
GNU sed handles \n in the replacement the way you expect.
OS X (and presumably BSD) sed does not. It treats it as a normal escaped character and just unescapes it to n. (Though I don't see this in the manual anywhere at the moment.)
You can use $'' quoting to use \n as a literal newline if you want though.
echo 'foo~bar~baz' | sed $'s/[~]/\\\n/g'

Bash script - mass modify files sed regular expression

I have a set of .csv files (all in one folder) with the format shown below:
170;151;104;137;190;125;170;108
195;192;164;195;171;121;133;104
... (a lot more rows) ...
The thing is I screwed up a bit and it should look like this
170;151;104.137;190.125;170;108
195;192;164.195;171.121;133;104
In case the difference is too subtle to notice:
I need to write a script that changes every third and fifth semicolon into a period in every row in efery file in that folder.
My research indicate that I have to devise some clever sed s/ command in my script. The problem is I'm not very good with regular expressions. From reading the tutorial it's probably gonna involve something with /3 and /5.
Here's a really short way to do it:
sed 's/;/./3;s/;/./4' -iBAK *
It replaces the 3rd and then the 5th (which is now the 4th) instances of the ; with ..
I tested it on your sample (saved as sample.txt):
$ sed 's/;/./3;s/;/./4' <sample.txt
170;151;104.137;190.125;170;108
195;192;164.195;171.121;133;104
For safety, I have made my example back up your originals as <WHATEVER>.BAK. To prevent this, change -iBAK to -i.
This script may not be totally portable but I've tested it on Mac 10.8 with BSD sed (no idea what version) and Linux with sed (gsed) 4.1.4 (2003). #JonathanLeffler notes that it's standard POSIX sed as of 2008. I also just found it and like it a lot.
Golf tip: If you run the command from bash, you can use brace expansion to achieve a supremely short version:
sed -es/\;/./{3,4} -i *
Here's one way:
sed -i 's/^\([^;]*;[^;]*;[^;]*\);\([^;]*;[^;]*\);/\1.\2./' foldername/*
(Disclaimer: I did test this, but some details of sed are not fully portable. I don't think there's anything non-portable in the above, so it should be fine, but please make a backup copy of your folder first, before running the above. Just in case.)

find and replace within file

I have a requirement to search for a pattern which is something like :
timeouts = {default = 3.0; };
and replace it with
timeouts = {default = 3000.0;.... };
i.e multiply the timeout by factor of 1000.
Is there any way to do this for all files in a directory
EDIT :
Please note that some of the files are symlinks in the directory.Is there any way to get this done for symlinks also ?
Please note that timeouts exists as a substring also in the files so i want to make sure that only this line gets replaced. Any solution is acceptable using sed awk perl .
Give this a try:
for f in *
do
sed -i 's/\(timeouts = {default = [0-9]\+\)\(\.[0-9]\+;\)\( };\)/\1000\2....\3/' "$f"
done
It will make the replacements in place for each file in the current directory. Some versions of sed require a backup extension after the -i option. You can supply one like this:
sed -i .bak ...
Some versions don't support in-place editing. You can do this:
sed '...' "$f" > tmpfile && mv tmpfile "$f"
Note that this is obviously not actually multiplying by 1000, so if the number is 3.1 it would become "3000.1" instead of 3100.0.
you can do this
perl -pi -e 's/(timeouts\s*=\s*\{default\s*=\s*)([0-9.-]+)/print $1; $2*1000/e' *
One suggestion for whichever solution above you decide to use - it may be worth it to think through how you could refactor to avoid having to modify all of these files for a change like this again.
Do all of these scripts have similar functionality?
Can you create a module that they would all use for shared subroutines?
In the module, could you have a single line that would allow you to have a multiplier?
For me, anytime I need to make similar changes in more than one file, it's the perfect time to be lazy to save myself time and maintenance issues later.
$ perl -pi.bak -e 's/\w+\s*=\s*{\s*\w+\s*=\s*\K(-?[0-9.]+)/sprintf "%0.1f", 1000 * $1/eg' *
Notes:
The regex matches just the number (see \K in perlre)
The /e means the replacement is evaluated
I include a sprintf in the replacement just in case you need finer control over the formatting
Perl's -i can operate on a bunch of files
EDIT
It has been pointed out that some of the files are shambolic links. Given that this process is not idempotent (running it twice on the same file is bad), you had better generate a unique list of files in case one of the links points to a file that appears elsewhere in the list. Here is an example with find, though the code for a pre-existing list should be obvious.
$ find -L . -type f -exec realpath {} \; | sort -u | xargs -d '\n' perl ...
(Assumes none of your filenames contain a newline!)

Extracting username from UNIX path using Regex

I need to get a username from an Unix path with this format:
/home/users/myusername/project/number/files
I just want "myusername" I've been trying for almost a hour and I'm completely clueless.
Any idea?
Thanks!
Maybe just /home/users/([a-zA-Z0-9_\-]*)/.*?
Note that the critical part [a-zA-Z0-9_\-]* has to contain all valid characters for unix usernames. I took from here, that a username should only contain digits, characters, dashes and underscores.
Also note that the extracted username is not the whole matching, but the first group (indicated by (...)).
The best answer to this depends on what you are trying to achieve. If you want to know the user who owns that file then you can use the stat command, this unfortunately has slightly different syntax dependant on the operating system however the following two commands work
Max OS/X
stat -f '%Su' /home/users/myusername/project/number/files
Redhat/Fedora/Centos
stat -c '%U' /home/users/myusername/project/number/files
If you really do want the string following /home/users then the either of the Regexes provided above will do that, you could use that in a bash script as follows (Mac OS/X)
USERNAME=$(echo '/home/users/myusername/project/number/files' | \
sed -E -e 's!^/home/users/([^/]+)/.*$!\1!g')
Check http://rubular.com/r/84zwJmV62G. The first match, not the entire match, is the username.
in a bourne shell something like :
string="/home/users/STRINGWEWANT/some/subdir/here"
echo $string | awk -F\/ '{print $3}'
would be one option, assuming its always the third element of the path. There are more lightweight that use only the shell builtins :
echo ${x#*users/}
will strip out everything up to and including 'users/'
echo ${y%%/*}
Will strip out the remainder.
So to put it all together :
export path="/home/users/STRINGWEWABT/some/other/dirs"
export y=`echo ${path#*users/}` && echo ${y%%/*}
STRINGWEWABT
Also checkout the bash manpage and search for "Parameter Expansion"
(\/home\/users\/)([^\/]+)
The 2nd capture group (index 1) will be myusername