regex -- grepping for alphabetic characters only

regex -- grepping for alphabetic characters only - regex

I have a quick regex question.
Let's say I have a list of packages:
packageA-0:8.39-6.fc24.x86_64
packageB-0:6.4-1.fc24.x86_64
packageB-utils-0:3.63-2.fc24.x86_64
What I want returned is:
packageA
packageB
packageB-utils
I've tried
grep -oP '^[a-z]*' myfile.txt
and
awk -F"[_-]" '{print $1}' myfile.txt
Any ideas? I think I'm sort of close, but I just can't get packageB-utils

.*?(?=-\d)
.*? => everything non greedy
(?=-\d) => until "-" followed by a digit

Try this. Selects everything upto the last alphabet:
grep -o "^[a-zA-Z-]*[a-zA-Z]" file.txt
Or, if your package name also contains digits, you can use sed to trim out everything after -0:...:
sed 's|-[0-9]*:.*||' file.txt

With sed using grouping:
sed -rn 's/([A-Za-z\-]+)\-(.*)/\1/p' packages.txt
Should yield:
#packageA
#packageB
#packageB-utils
packages.txt contains:
packageA-0:8.39-6.fc24.x86_64
packageB-0:6.4-1.fc24.x86_64
packageB-utils-0:3.63-2.fc24.x86_64

Related

Sed version extract

I am trying to extract the version number from a string. I am unable to find the exact regex to find what I need.
For eg -
1012-EPS-Test-OF-Something-1.3
I need sed to only extract 1.3 from the above line.
I have tried quite a few things until now something like but it is clearly not working out
sed 's/[^0-9.0-9]*//')

With your shown samples, easiest way could be. Simply print value of shell variable into awk program as input and then setting field separator as - and printing the last field value in it.
echo "$string" | awk -F'-' '{print $NF}'
2nd solution: In case you could have anything else also apart from version number in last field of your value(where - is field delimiter) then use match function of awk.
echo "$var" |
awk -F'-' 'match($NF,/[0-9]+(\.[0-9]+)*/){print substr($NF,RSTART,RLENGTH)}'
3rd solution: Using GNU grep try following once. Using \K option for GNU grep here. This will match everything till - and then mentioning \K will forget OR wouldn't consider that matched value for printing and will print all further matched value(with further mentioned regex).
echo "$var" | grep -oP '.*-\K\d+(\.\d+)*'

This should work in any grep:
s='1012-EPS-Test-OF-Something-1.3'
grep -Eo '[0-9]+(\.[0-9]+)+' <<< "$s"
1.3

This might work for you (GNU sed):
sed -n 's/.*[^0-9.]//p' file
The regexp is greedy and swallows the whole line .* then steps back a character at a time till the first match of [^0-9.], removes the front portion and prints the remainder.

You can use string manipulation to get the last part after -:
s='1012-EPS-Test-OF-Something-1.3'
s="${s##*-}"
See this online demo:
#!/bin/bash
s='1012-EPS-Test-OF-Something-1.3'
s="${s##*-}"
echo "$s"
# => 1.3
See 10.1. Manipulating Strings:
${string##substring}
    Deletes longest match of $substring from front of $string.

sed regex with alternative on Solaris doesn't work

Currently I'm trying to use sed with regex on Solaris but it doesn't work.
I need to show only lines matching to my regex.
sed -n -E '/^[a-zA-Z0-9]*$|^a_[a-zA-Z0-9]*$/p'
input file:
grtad
a_pitr
_aupa
a__as
baman
12353
ai345
ki_ag
-MXx2
!!!23
+_)#*
I want to show only lines matching to above regex:
grtad
a_pitr
baman
12353
ai345
Is there another way to use alternative? Is it possible in perl?
Thanks for any solutions.

With Perl
perl -ne 'print if /^(a_)?[a-zA-Z0-9]*$/' input.txt
The (a_)? matches a_ one-or-zero times, so optionally. It may or may not be there.
The (a_) also captures the match, what is not needed. So you can use (?:a_)? instead. The ?: makes () only group what is inside (so ? applies to the whole thing), but not remember it.

with grep
$ grep -xiE '(a_)?[a-z0-9]*' ip.txt
grtad
a_pitr
baman
12353
ai345
-x match whole line
-i ignore case
-E extended regex, if not available, use grep -xi '\(a_\)\?[a-z0-9]*'
(a_)? zero or one time match a_
[a-z0-9]* zero or more alphabets or numbers
With sed
sed -nE '/^(a_)?[a-zA-Z0-9]*$/p' ip.txt
or, with GNU sed
sed -nE '/^(a_)?[a-z0-9]*$/Ip' ip.txt

regex match specific pattern

I have
[root#centos64 ~]# cat /tmp/out
[
"i-b7a82af5",
"i-9d78f4df",
"i-92ea58d0",
"i-fa4acab8"
]
I would like to pipe though sed or grep to match the format "x-xxxxxxxx" i.e. a mix of a-z 0-9 always in 1-[8 chars length], and omit everything else
[root#centos64 ~]# cat /tmp/out| sed s/x-xxxxxxxx/
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8
I know this is basic, but I can only find examples of text substitution.

grep -Eo '[a-z0-9]-[a-z0-9]{8}' file
The -E option makes it recognize extended regular expressions, so it can use {8} to match 8 repetitions.
The -o option makes it only print the part of the line that matches the regexp.

Why not just print whatever's between the quotes:
$ sed -n 's/[^"]*"\([^"]*\).*/\1/p' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8
$ awk -F\" 'NF>1{print $2}' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8

Through GNU sed,
$ sed -nr 's/.*([a-z0-9]-[a-z0-9]{8}).*/\1/p' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8

I think this is all you need: [0-9a-zA-Z]-[0-9a-zA-Z]{8}. Try it out here.

This should work ^[a-z0-9]-[a-zA-Z0-9]{8}$

Getting defined substring with help of sed or egrep

Everyone!!
I want to get specific substring from stdout of command.
stdout:
{"response":
{"id":"110200dev1","success":"true","token":"09ad7cc7da1db13334281b84f2a8fa54"},"success":"true"}
I need to get a hex string after token without quotation marks, the length of hex string is 32 letters.I suppose it can be done by sed or egrep. I don't want to use awk here. Because the stdout is being changed very often.

This is an alternate gnu-awk solution when grep -P isn't available:
awk -F: '{gsub(/"/, "")} NF==2&&$1=="token"{print $2}' RS='[{},]' <<< "$string"
09ad7cc7da1db13334281b84f2a8fa54

grep's nature is extracting things:
grep -Po '"token":"\K[^"]+'
-P option interprets the pattern as a Perl regular expression.
-o option shows only the matching part that matches the pattern.
\K throws away everything that it has matched up to that point.
Or an option using sed...
sed 's/.*"token":"\([^"]*\)".*/\1/'

With sed:
your-command | sed 's/.*"token":"\([^"]*\)".*/\1/'

YourStreamOrFile | sed -n 's/.*"token":"\([a-f0-9]\{32\}\)".*/\1/p'
doesn not return a full string if not corresponding

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After

Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.

This might work for you (GNU sed):
sed 's/-[^-]*//2g' file

You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u

#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.

awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).

This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u

awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After

This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex -- grepping for alphabetic characters only - regex

.?(?=-\d) .? => everything non greedy (?=-\d) => until "-" followed by a digit

Try this. Selects everything upto the last alphabet: grep -o "^[a-zA-Z-][a-zA-Z]" file.txt Or, if your package name also contains digits, you can use sed to trim out everything after -0:...: sed 's|-[0-9]:.*||' file.txt

With sed using grouping: sed -rn 's/([A-Za-z\-]+)\-(.*)/\1/p' packages.txt Should yield: #packageA #packageB #packageB-utils packages.txt contains: packageA-0:8.39-6.fc24.x86_64 packageB-0:6.4-1.fc24.x86_64 packageB-utils-0:3.63-2.fc24.x86_64

Related

Sed version extract

sed regex with alternative on Solaris doesn't work

regex match specific pattern

Getting defined substring with help of sed or egrep

Remove everything after 2nd occurrence in a string in unix

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex -- grepping for alphabetic characters only - regex

.*?(?=-\d) .*? => everything non greedy (?=-\d) => until "-" followed by a digit

Try this. Selects everything upto the last alphabet: grep -o "^[a-zA-Z-]*[a-zA-Z]" file.txt Or, if your package name also contains digits, you can use sed to trim out everything after -0:...: sed 's|-[0-9]*:.*||' file.txt

With sed using grouping: sed -rn 's/([A-Za-z\-]+)\-(.*)/\1/p' packages.txt Should yield: #packageA #packageB #packageB-utils packages.txt contains: packageA-0:8.39-6.fc24.x86_64 packageB-0:6.4-1.fc24.x86_64 packageB-utils-0:3.63-2.fc24.x86_64

Related

Sed version extract

sed regex with alternative on Solaris doesn't work

regex match specific pattern

Getting defined substring with help of sed or egrep

Remove everything after 2nd occurrence in a string in unix

Categories

Resources

.?(?=-\d) .? => everything non greedy (?=-\d) => until "-" followed by a digit

Try this. Selects everything upto the last alphabet: grep -o "^[a-zA-Z-][a-zA-Z]" file.txt Or, if your package name also contains digits, you can use sed to trim out everything after -0:...: sed 's|-[0-9]:.*||' file.txt