Substitute a regex pattern using awk

Substitute a regex pattern using awk - regex

I am trying to write a regex expression to replace one or more '+' symbols present in a file with a space. I tried the following:
echo This++++this+++is+not++done | awk '{ sub(/\++/, " "); print }'
This this+++is+not++done
Expected:
This this is not done
Any ideas why this did not work?

Use gsub which does global substitution:
echo This++++this+++is+not++done | awk '{gsub(/\++/," ");}1'
sub function replaces only 1st match, to replace all matches use gsub.

Or the tr command:
echo This++++this+++is+not++done | tr -s '+' ' '

The idiomatic awk solution would be just to translate the input field separator to the output separator:
$ echo This++++this+++is+not++done | awk -F'++' '{$1=$1}1'
This this is not done

Try this
echo "This++++this+++is+not++done" | sed -re 's/(\+)+/ /g'

You could use sed too.
echo This++++this+++is+not++done | sed -e 's/+\{1,\}/ /g'
This matches one or more + and replaces it with a space.

For this case I recommend sed, this is powerful for substitution and has a short syntax.
Solution sed:
echo This++++this+++is+not++done | sed -En 's/\\++/ /gp'
Result:
This this is not done
For awk:
You must use the gsub function for global line substitution (more than one substitution).
The syntax:
gsub(regexp, replacement [, target]).
If the third parameter is ommited then $0 is the target.
Target must a variable or array element. gsub works in target, overwritten target with the replacement.
Solution awk:
echo This++++this+++is+not++done | awk 'gsub(/\\++/," ")
Result:
This this is not done

echo "This++++this+++is+not++done" | sed 's/++*/ /g'

If you have access to node on your computer you can do it by installing rexreplace
npm install -g regreplace
and then run
rexreplace '\++' ' ' myfile.txt
Of if you have more files in a dir data you can do
rexreplace '\++' ' ' data/*.txt

Related

Bash extract project name from url github

I have the following code:
url='https://github.com/Project/name-app.git'
echo $url
I have to get this back to me, i.e. the name of the project regardless of the owner of the project.
Result:
name-app

You can use string manipulation in Bash:
url='https://github.com/Project/name-app.git'
url="${url##*/}" # Remove all up to and including last /
url="${url%.*}" # Remove up to the . (including) from the right
echo "$url"
# => name-app
See the online Bash demo.
Another approach with awk:
url='https://github.com/Project/name-app.git'
url=$(awk -F/ '{sub(/\..*/,"",$NF); print $NF}' <<< "$url")
echo "$url"
See this online demo. Here, the field delimiter is set to /, the last field value is taken and all after first . is removed from it (together with the .), and that result is returned.

You can use grep regexp:
url='https://github.com/Project/name-app.git'
echo $url | grep -oP ".*/\K[^\.]*"
You can use bash expansion:
url='https://github.com/Project/name-app.git'
mytemp=${url##*/}
echo ${mytemp%.*}

1st solution: With your shown samples please try following awk code.
echo $url | awk -F'\\.|/' '{print $(NF-1)}'
OR to make sure last value is always ending with .git try a bit tweak in above code:
echo "$url" | awk -F'\\.|/' '$NF=="git"{print $(NF-1)}'
2nd solution: Using sed try following:
echo "$url" | sed 's/.*\///;s/\..*//'

# my prefered solution
url='https://github.com/Project/name-app.git'
basename $url .git
> name-app
# others
sed 's/\(.*\/\)\([^\/]*\)\.\(\w*\)$/\2/g' <<<$url
1.https://github.com/Project/
2.name-app
#
awk -F"[/.]" '{print $(NF-1)}' <<<$url
awk -F"[/.]" '{print $6}' <<<$url
tr '/.' '\n' <<<$url|tail -2|grep -v git

bash - Extract part of string

I have a string something like this
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
I want to extract the following from it :
AppointmentManagementService.xsd6.xsd
I have tried using regex, bash and sed with no success. Can someone please help me out with this?
The regex that I used was this :
/AppointmentManagementService.xsd\d{1,2}.xsd/g

Your string is:
nampt#nampt-desktop:$ cat 1
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
Try with awk:
cat 1 | awk -F "\"" '{print $2}'
Output:
AppointmentManagementService.xsd6.xsd

sed doesn't recognize \d, use [0-9] or [[:digit:]] instead:
sed 's/^.*schemaLocation="\([^"]\+[[:digit:]]\{1,2\}\.xsd\)".*$/\1/g'
## or
sed 's/^.*schemaLocation="\([^"]\+[0-9]\{1,2\}\.xsd\)".*$/\1/g'

You can use bash native regex matching:
$ in='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
$ if [[ $in =~ \"(.+)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
Output:
AppointmentManagementService.xsd6.xsd
Based on your example, if you want to grant, at least, 1 or, at most, 2 digits in the .xsd... component, you can fine tune the regex with:
$ if [[ $in =~ \"(AppointmentManagementService.xsd[0-9]{1,2}.xsd)\" ]]; then echo "${BASH_REMATCH[1]}"; fi

using PCRE in GNU grep
grep -oP 'schemaLocation="\K.*?(?=")'
this will output pattern matched between schemaLocation=" and very next occurrence of "
Reference:
https://unix.stackexchange.com/a/13472/109046

Also we can use 'cut' command for this purpose,
[root#code]# echo "xsd:import schemaLocation=\"AppointmentManagementService.xsd6.xsd\" namespace=" | cut -d\" -f 2
AppointmentManagementService.xsd6.xsd

s='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
echo $s | sed 's/.*schemaLocation="\(.*\)" namespace=.*/\1/'

How to remove a space between matching words?

I've read a lot of questions about how to replace spaces from a file but I have the following problem:
I have a file like so:
<foo>"crazy foo"</foo> <bar>dull-bar</bar>
and I'm trying to remove spaces between > < and only those ones so the file would be like:
`<foo>"crazy foo"</foo><bar>dull-bar</bar>`
So far I've tried to remove then by using sed and tr. Sed is not working by any chance and using tr '> <' '><' outputs:
<foo>"crazy foo"</foo><<bar>dull-bar</bar>

sed -i -e "s/> *</></g" YourFile
-i means YourFile is modified. Remove this option to test your command and display the result in shell output.
* matches n spaces.
The g at the end of sed expression means "Replace all the occurrences".

You could try something like this
echo "<foo>"crazy foo"</foo> <bar>dull-bar</bar>" | sed 's/>[[:space:]]*</></g '

awk -F"\"" '{print $3}' file.txt | sed 's/ //g'

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After

Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.

This might work for you (GNU sed):
sed 's/-[^-]*//2g' file

You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u

#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.

awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).

This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u

awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After

This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

How to use sed to identify a string in brackets?

I want to find the string in that is placed with in the brackets. How do I use sed to pull the string?
# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
I'm not getting the exact result
# cat /sys/block/sdb/queue/scheduler | sed 's/\[*\]//'
noop anticipatory deadline [cfq
I'm expecting an output
cfq

It can be easier with grep, if it happens to be changing the position in which the text in between brackets is located:
$ grep -Po '(?<=\[)[^]]*' file
cfq
This is look-behind: whenever you find a string [, start fetching all the characters up to a ].
See another example:
$ cat a
noop anticipatory deadline [cfq]
hello this [is something] we want to [enclose] yeah
$ grep -Po '(?<=\[)[^]]*' a
cfq
is something
enclose
You can also use awk for this, in case it is always in the same position:
$ awk -F[][] '{print $2}' file
cfq
It is setting the field separators as [ and ]. And from that, prints the second one.
And with sed:
$ sed 's/[^[]*\[\([^]]*\).*/\1/g' file
cfq
It is a bit messy, but basically it is looking from the block of text in between [] and prints it back.

I found one possible solution-
cut -d "[" -f2 | cut -d "]" -f1
so the exact solution is
# cat /sys/block/sdb/queue/scheduler | cut -d "[" -f2 | cut -d "]" -f1

Another potential solution is awk:
s='noop anticipatory deadline [cfq]'
awk -F'[][]' '{print $2}' <<< "$s"
cfq

Another way by gnu grep :
grep -Po "\[\K[^]]*" file
with pure shell:
while read line; do [[ "$line" =~ \[([^]]*)\] ]] && echo "${BASH_REMATCH[1]}"; done < file

Another awk
echo 'noop anticipatory deadline [cfq]' | awk '{gsub(/.*\[|\].*/,x)}8'
cfq

perl -lne 'print $1 if(/\[([^\]]*)\]/)'
Tested here

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Substitute a regex pattern using awk - regex

I am trying to write a regex expression to replace one or more '+' symbols present in a file with a space. I tried the following: echo This++++this+++is+not++done | awk '{ sub(/\++/, " "); print }' This this+++is+not++done Expected: This this is not done Any ideas why this did not work?

Use gsub which does global substitution: echo This++++this+++is+not++done | awk '{gsub(/\++/," ");}1' sub function replaces only 1st match, to replace all matches use gsub.

Or the tr command: echo This++++this+++is+not++done | tr -s '+' ' '

The idiomatic awk solution would be just to translate the input field separator to the output separator: $ echo This++++this+++is+not++done | awk -F'++' '{$1=$1}1' This this is not done

Try this echo "This++++this+++is+not++done" | sed -re 's/(\+)+/ /g'

You could use sed too. echo This++++this+++is+not++done | sed -e 's/+\{1,\}/ /g' This matches one or more + and replaces it with a space.

echo "This++++this+++is+not++done" | sed 's/++*/ /g'

If you have access to node on your computer you can do it by installing rexreplace npm install -g regreplace and then run rexreplace '\++' ' ' myfile.txt Of if you have more files in a dir data you can do rexreplace '\++' ' ' data/*.txt

Related

Bash extract project name from url github

bash - Extract part of string

How to remove a space between matching words?

Remove everything after 2nd occurrence in a string in unix

How to use sed to identify a string in brackets?

Categories

Resources