Replace names in git log using sed on MacOs (for Gource) - regex

I'm trying to make a nice Gource video on our software develop project. Using Gource a can generate a combined git log of all repos with:
first gource --output-custom-log ../logs/repo1.txt then
cat *.txt | sort -n > combined.txt
This generates a combined.txt file which is a pipe delimited file like:
1551272464|John|A|repo1/file1.txt
1551272464|john_doe|A|repo1/folder/file9.py
1551272464|Doe, John|A|repo2/filex.py
So its: EPOCH|Committer name|A or D or C|committed file
The actual problem I want to solve is the fact that my developers have used different git clients with different committer names so id like to replace all of their names to a single version. I do not mind setting multiple sed per situation.
So find "John", "john_doe" and "Doe, John" and replace it with "John Doe". And it should be done on my MacBook.
So I tried sed -i -r "s/John/user_john/g" combined.txt but the problem here is that it finds "John" and "Doe, John" and replaces just the "John" part so I'm need to do a fuzzy search and replace the whole column.
Who can help me get the correct regex?

A regex would almost certainly be the wrong approach for this as you'd get false matches unless you were extremely careful and it's inefficient.
Just create an aliases file containing a line for each name you want in your output followed by all the names that should be mapped to it and then you can do this to change them all clearly, simply, robustly, portably, and efficiently in one call to awk:
$ cat tst.awk
BEGIN { FS="[|]" ; OFS="|" }
NR==FNR {
for (i=2; i<=NF; i++) {
alias[$i] = $1
}
next
}
$2 in alias { $2 = alias[$2] }
{ print }
.
$ cat aliases
John Doe|John|john_doe|Doe, John
Susan Barker|Susie B|Barker, Susan
.
$ cat file
1551272464|John|A|repo1/file1.txt
1551272464|Susie B|A|repo2/filex.py
1551272464|john_doe|A|repo1/folder/file9.py
1551272464|Doe, John|A|repo2/filex.py
1551272464|Barker, Susan|A|repo2/filex.py
.
$ awk -f tst.awk aliases file
1551272464|John Doe|A|repo1/file1.txt
1551272464|Susan Barker|A|repo2/filex.py
1551272464|John Doe|A|repo1/folder/file9.py
1551272464|John Doe|A|repo2/filex.py
1551272464|Susan Barker|A|repo2/filex.py

As #WiktorStribizew mentioned, you can do:
sed -i -r "s/Doe, John|john_doe|John/user_john/g" combined.txt
And with that, you can even do:
sed -i -r -e "s/Doe, John|john_doe|John/user_john/g" -e "s/Wayne, Bruce|bruce_wayne|Bruce/user_bruce/g" combined.txt
And add more replacements to chain with the -e option:
-e script, --expression=script
add the script to the commands to be executed

try gnu sed:
sed -E "s/^(\w+\|)(john([\s_]doe)?|doe,\s*john)/\1John Doe/i" combined.txt
add -i option after examining to edit it; sed -Ei...

Related

Use bash to remove symbols from text file

I have a bunch of txt-files containing stuff like this:
text_i_need_to_remove{text_i_need_to_retain}
text_i need_to_remove{text_i_need_to_retain}
...
How do I remove text before curly braces (and curly braces themselves) and retain just only text_i_need_to_retain?
Deleting everything upto { or } at end of line
:%s/.*{\|}$//g
From bash shell, you can use text processing tools like sed and awk. Assume file is named ip.txt
1) With sed, which is pretty similar to regex we used inside vim. The -i flag allows to make change in place, i.e it modifies the input file itself.
$ sed -i 's/.*{\|}$//g' ip.txt
2) With awk, one can again use substitution or in this case, split the line on curly brackets and use only the second column.
$ awk -F'{|}' '{print $2}' ip.txt > tmp && mv tmp ip.txt
If you have GNU awk, there is -i inplace option for in place editing
$ gawk -i inplace -F'{|}' '{print $2}' ip.txt
To make changed to all files in current directory, use
sed -i 's/.*{\|}$//g' *
Or if they have common extension, say .txt, use
sed -i 's/.*{\|}$//g' *.txt
:%s/^.*{\(.*\)}$/\1/ or in bash, sed 's/^.*{\(.*\)}$/\1/ foo.txt
\(.*\) is a control group which feeds into \1 and looks like a lumbering zombie.
you can use this in vim;
:%s/^.*{// | %s/}$//
you can also use this script; first run this, if everythink is ok, uncomment sed with -i option as below;
#!/bin/bash
for item in $(ls /dir/where/my/files/are)
do
sed -i 's/^.*{//;s/}$//' /dir/where/my/files/are/$item
done
sed -i ; inplace replace
or
Only use as below;
sed -i 's/^.*{//;s/}$//' /dir/where/my/files/are/*
Perl can be used to do the substitution on all files:
perl -i -pe 's/.*{|}$//g' *.txt

sed to insert a string after matching pattern on a same line

I need to insert a command (as string) to an existing file after a certain match. The existing string is a long make command and I only need to modify it by inserting another string at specific location. I tried using sed but it either adds a new line before/after the matching string or replaces it. I'd like to know if at least it is possible to accomplish what I want with sed or should I be using something else? Could you please provide me with some hints?
Example:
The file contains two make commands and I am only interested in the second one without bbnote.
oe_runmake_call() {
bbnote make -j 8 CROSS_COMPILE=arm-poky-linux-gnueabi- CC="arm-poky-linux-gnueabi-gcc" "$#"
make -j 8 CROSS_COMPILE=arm-poky-linux-gnueabi- CC="my_command_here arm-poky-linux-gnueabi-gcc" --sysroot=/some/path "$#"
}
Thanks in advance!
Here's the code:
http://hastebin.com/tigatoquje.go
You could do something like this using Sed:
sed -r 's:(^\s+make.+ CC=\"):\1your_command_here :g' file.log >outfile.log
or with sed in-place edit:
sed -ir 's:(^\s+make.+ CC=\"):\1your_command_here :g' file.log
Without sed regex option:
sed 's:\(^\s\+make.\+ CC=\"\):\1your_command_here :g' file.log > outfile.log
Outputs:
oe_runmake_call() {
bbnote make -j 8 CROSS_COMPILE=arm-poky-linux-gnueabi- CC="arm-poky-linux-gnueabi-gcc" "$#"
make -j 8 CROSS_COMPILE=arm-poky-linux-gnueabi- CC="your_command_here arm-poky-linux-gnueabi-gcc" --sysroot=/some/path "$#"
}
How:
sed -r 's:(^\s+make.+ CC=\"):\1your_command_here :g'
-r = regex option
^make(CC=\") = starts with make and set a capture group on CC="
\1your_command_here = \1 reference capture group then add command text
You could use perl.
Replace YOUR_COMMAND with what you want added. This assumes your file is in file.txt:
perl -i.bak -pl -e '/^make/ and s/(CC=".*")/$1 YOUR_COMMAND /' file.txt

How to cut a string from a string

My script gets this string for example:
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
let's say I don't know how long the string until the /importance.
I want a new variable that will keep only the /importance/lib1/lib2/lib3/file from the full string.
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
Here is the command in my code:
find <main_path> -name file | sed 's/.*importance//
I am not familiar with the regex, so I need your help please :)
Sorry my friends I have just wrong about my question,
I don't need the output /importance/lib1/lib2/lib3/file but /importance/lib1/lib2/lib3 with no /file in the output.
Can you help me?
I would use awk:
$ echo "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file" | awk -F"/importance/" '{print FS$2}'
importance/lib1/lib2/lib3/file
Which is the same as:
$ awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
importance/lib1/lib2/lib3/file
That is, we set the field separator to /importance/, so that the first field is what comes before it and the 2nd one is what comes after. To print /importance/ itself, we use FS!
All together, and to save it into a variable, use:
var=$(find <main_path> -name file | awk -F"/importance/" '{print FS$2}')
Update
I don't need the output /importance/lib1/lib2/lib3/file but
/importance/lib1/lib2/lib3 with no /file in the output.
Then you can use something like dirname to get the path without the name itself:
$ dirname $(awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file")
/importance/lib1/lib2/lib3
Instead of substituting all until importance with nothing, replace with /importance:
~$ echo $var
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
~$ sed 's:.*importance:/importance:' <<< $var
/importance/lib1/lib2/lib3/file
As noted by #lurker, if importance can be in some dir, you could add /s to be safe:
~$ sed 's:.*/importance/:/importance/:' <<< "/dir1/dirimportance/importancedir/..../importance/lib1/lib2/lib3/file"
/importance/lib1/lib2/lib3/file
With GNU sed:
echo '/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file' | sed -E 's#.*(/importance.*)#\1#'
Output:
/importance/lib1/lib2/lib3/file
pure bash
kent$ a="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
kent$ echo ${a/*\/importance/\/importance}
/importance/lib1/lib2/lib3/file
external tool: grep
kent$ grep -o '/importance/.*' <<<$a
/importance/lib1/lib2/lib3/file
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
You were very close. All you had to do was substitute back in importance:
sed 's/.*importance/importance/'
However, I would use Bash's built in pattern expansion. It's much more efficient and faster.
The pattern expansion ${foo##pattern} says to take the shell variable ${foo} and remove the largest matching glob pattern from the left side of the shell variable:
file_name="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
file_name=${file_name##*importance}
Removeing the /file at the end as you ask:
echo '<path>' | sed -r 's#.*(/importance.*)/[^/]*#\1#'
Input /dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
Returns: /importance/lib1/lib2/lib3
See this "Match groups" tutorial.

Regex with sed to parse archive name

I'd like to parse different kinds of Java archive with the sed command line tool.
Archives can have the followin extensions:
.jar, .war, .ear, .esb
What I'd like to get is the name without the extension, e.g. for Foobar.jar I'd like to get Foobar.
This seems fairly simple, but I cannot come up with a solution that works and is also robust.
I tried something along the lines of sed s/\.+(jar|war|ear|esb)$//, but could not make it work.
You were nearly there:
sed -E 's/\.+(jar|war|ear|esb)$//' file
Just needed to add the -E flag to sed to interpret the expression. And of course, respect the sed 's/something/new/' syntax.
Test
$ cat a
aaa.jar
bb.war
hello.ear
buuu.esb
hello.txt
$ sed -E 's/\.+(jar|war|ear|esb)$//' a
aaa
bb
hello
buuu
hello.txt
Using sed:
s='Foobar.jar'
sed -r 's/\.(jar|war|ear|esb)$//' <<< "$s"
Foobar
OR better do it in BASH itself:
echo "${s/.[jwe]ar/}"
Foobar
You need to escape the | and the () and also add ' if you do not add option like -r or -E
echo "test.jar" | sed 's/\.\(jar\|war\|ear\|esb\)$//'
test
* is also not needed, sine you normal have only one .
On traditionnal UNIX (tested with AIX/KSH)
File='Foobar.jar'
echo ${File%.*}
from a list having only your kind of file
YourList | sed 's/\....$//'
form a list of all kind of file
YouList | sed -n 's/\.[jew]ar$/p
t
s/\.esb$//p'

Grep regex contained in a file (not grep -f option!)

I am reading some equipment configuration output and check if the configuration is correct, according to the HW configuration. The template configurations are stored as files with all the params, and the lines contain regular expressions (basically just to account for variable number of spaces between "object", "param" and "value" in the output, also some index variance)
First of all, I cannot use grep -f $template $output, since I have to process each line of the template separately. I have something like this running
while read line
do
attempt=`grep -E "$line" $file`
# ...etc
done < $template
Which works just fine if the template doesn't contain regex.
Problem: grep interpretes the search option literally when these are read form file. I tested the regex themselves, they work fine from the command line.
With this background, the question is:
How to read regex from a file (line by line) and have grep not interprete them literally?
Using the following script:
#!/usr/bin/env bash
# multi-grep
regexes="$1"
file="$2"
while IFS= read -r rx ; do
result="$(grep -E "$rx" "$file")"
grep -q -E "$rx" "$file" && printf 'Look ma, a match: %s!\n' "$result"
done < "$regexes"
And files with the following contents:
$ cat regexes
RbsLocalCell=S.C1.+eulMaxOwnUuLoad.+100
$ cat data
RbsLocalCell=S1C1 eulMaxOwnUuLoad 100
I get this result:
$ ./multi-grep regexes data
Look ma, a match: RbsLocalCell=S1C1 eulMaxOwnUuLoad 100!
This works for different spacing as well
$ cat data
RbsLocalCell=S1C1 eulMaxOwnUuLoad 100
$ ./multi-grep regexes data
Look ma, a match: RbsLocalCell=S1C1 eulMaxOwnUuLoad 100!
Seems okay to me.
Use the -F option, or fgrep.
What's more, you seem to want to match full lines: add the -x option as well.
Another point: make sure the pattern is not interpreted in some wrong way by the shell by putting "$line" in quotes.
All in all that looks like you better write a perl than a shell script.