Using regular expression to extract substring - regex

I want to extract from < to the next from my log-files.
$>cat messages.log
2013-03-24 19:32:37.231 <F280 [192.168.178.22]:5000 -- Unknown>, Msg:[Test1]
2013-03-24 19:32:37.547 <F281 [192.168.178.22]:5000 -- Unknown>, Msg:[Test2
Test3
Test4]
2013-03-24 19:32:38.833 <F280 [192.168.178.22]:5000 -- Unknown>, Msg:[Test5]
2013-03-24 19:32:42.222 <F281 [192.168.178.22]:5000 -- Unknown>, Msg:[Test6]
$>sed 's/.*\<\(.*\) \[.*/\1|/g' messages.log
F280|
F281|
Test3
Test4]
F280|
F281|
I almost got what I wanted except for the output with the newlines. So I'd like to have the following result:
F280|F281|F280|F281
How has the regular expression look like?

I wouldn't create a unreadable regexp to do this I'd use awk here:
$ awk -F'[< ]' '/^[0-9]+/{s?s=s"|"$4:s=s$4}END{print s}' file
F280|F281|F280|F281

Try this:
sed -n '/</{s/^.*<\([^ ]\+\) .*$/\1|/g;H;${x;s/\n//g;s/|$//;p}}' messages.log

Try something like that (you'll have nested groups), or turn on multiline option in regex:
(^.+<(\w+) .+$)+

Is it compulsory to only use grep or are also other commands available?
I'd say that
grep "<.* " messages.log | sed 's/.*\<\(.*\) \[.*/\1|/g' | tr -d '\n' | sed 's/.$//'
The first grep is to remove data not following your desired pattern, followed by your sed command.
On the output, who should look like
F280|
F281|
F280|
F281|
The last tr command just removes the newline character at the end of each line (i.e it concatenates the result) while the last sed is just to remove the final pipe delimiter

Related

extract substring with SED

I have the next strings:
for example:
input1 = abc-def-ghi-jkl
input2 = mno-pqr-stu-vwy
I want extract the first word between "-"
for the fisrt string I want to get: def
if the input is the second string, I want to get: pqr
I want to use the command SED, Could you help me please?
Use
sed 's,^[^-]*-\([^-]*\).*,\1,' file
The string after the first - will be captured up to the second - and the rest will be matched, then the matched line will be replaced with the group text.
With bash:
var='input1 = abc-def-ghi-jkl'
var=${var#*-} # remove shortest prefix `*-`, this removes `input1 = abc-`
echo "${var%%-*}" # remove longest suffix `-*`, this removes `-ghi-jkl`
Or with awk:
awk -F'-' '{print $2}' <<<'input1 = abc-def-ghi-jkl'
Use - as input field separator and print the second field.
Or with cut:
cut -d'-' -f2 <<<'input1 = abc-def-ghi-jkl'
When you want to use sed, you can choose between solutions like
# Double processing
echo "$input1" | sed 's/[^-]*-//;s/-.*//'
# Normal approach
echo "$input1" | sed -r 's/^[^-]*-([^-]*)|-.*)/\1/g'
# Funny alternative
echo "$input1" | sed -r 's/(^[^-]*-|-.*)//g'
The obvious "external" tool would be cut. You can also look at a Bash builtin solution like
[[ ${input1} =~ ([^-]*)-([^-]*) ]] && printf %s "${BASH_REMATCH[2]}"
grep solution (in my opinion this is the most natural approach, as you are only trying to find matches to a regular expression - you are not looking to edit anything, so there should be no need for the more advanced command sed)
grep -oP '^[^-]*-\K[^-]*(?=-)' << EOF
> abc-qrs-bobo-the-clown
> 123-45-6789
> blah-blah-blah
> no dashes here
> mahi-mahi
> EOF
Output
qrs
45
blah
Explanation
Look at the inputs first, included here for completeness as a heredoc (more likely you would name your file as the last argument to grep.) The solution requires at least two dashes to be present in the string; in particular, for mahi-mahi it will find no match. If you want to find the second mahi as a match, you can remove the lookahead assertion at the end of the regular expression (see below).
The regular expression does this. First note the command options: -o to return only the matched substring, not the entire line; and -P to use Perl extensions. Then, the regular expression: start from the beginning of the line (^); look for zero or more non-dash characters followed by dash, and then (\K) discard this part of the required match from the substrings found to match the pattern. Then look for zero or more non-dash characters again - this will be returned by the command. Finally, require a dash following this pattern, but do not include it in the match. This is done with a lookahead (marked by (?= ... )).

Print 1 Occurence for Each Pattern Match

I have a file that contains a pattern at the beginning of each newline:
./bob/some/text/path/index.html
./bob/some/other/path/index.html
./bob/some/text/path/index1.html
./sue/some/text/path/index.html
./sue/some/text/path/index2.html
./sue/some/other/path/index.html
./john/some/text/path/index.html
./john/some/other/path/index.html
./john/some/more/text/index1.html
... etc.
I came up with the following code to match the ./{name}/ pattern and would like to print 1 occurance of each name, BUT, it either prints out every line matching that pattern, or just 1 and stops when using the -m 1 flag:
I've tried it as a simple grep line(below) and also put it in a for loop
name=$(grep -iEoha -m 1 '\.\/([^/]*)\/' ./without_localnamespace.txt)
echo $name
My expected reuslts are:
./bob/
./sue/
./john/
Actual Results are:
./bob/
awk -F'/' '!a[$2]++{print $1 FS $2 FS}' input
./bob/
./sue/
./john/
You can do
cut -d "/" -f2 ./without_localnamespace.txt | sort -u
You seem to want unique occurrences, use
grep -Eoha '\./[^/]*/' ./without_localnamespace.txt | uniq
See the online demo
Regarding the pattern, you do not need to escape forward slashes, they are not special regex metacharacters. The -i flag is redundant here, too.

How to use 'sed' to add dynamic prefix to each number in integer list?

How can I use sed to add a dynamic prefix to each number in an integer list?
For example:
I have a string "A-1,2,3,4,5", I want to transform it to string "A-1,A-2,A-3,A-4,A-5" - which means I want to add prefix of first integer i.e. "A-" to each number of the list.
If I have string like "B-1,20,300" then I want to transform it to string "B-1,B-20,B-300".
I am not able to use RegEx Capturing Groups because for global match they do not retain their value in subsequent matches.
When it comes to looping constructs in sed, I like to use newlines as markers for the places I have yet to process. This makes matching much simpler, and I know they're not in the input because my input is a text line.
For example:
$ echo A-1,2,3,4,5 | sed 's/,/\n/g;:a s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/; ta'
A-1,A-2,A-3,A-4,A-5
This works as follows:
s/,/\n/g # replace all commas with newlines (insert markers)
:a # label for looping
s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/ # replace the next marker with a comma followed
# by the prefix
ta # loop unless there's nothing more to do.
The approach is similar to #potong's, but I find the regex much more readable -- \([^0-9]*\) captures the prefix, \([^\n]*\) captures everything up to the next marker (i.e. everything that's already been processed), and then it's just a matter of reassembling it in the substitution.
Don't use sed, just use the other standard UNIX text manipulation tool, awk:
$ echo 'A-1,2,3,4,5' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
A-1,A-2,A-3,A-4,A-5
$ echo 'B-1,20,300' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
B-1,B-20,B-300
This might work for you (GNU sed):
sed -E ':a;s/^((([^-]+-)[^,]+,)+)([0-9])/\1\3\4/;ta' file
Uses pattern matching and a loop to replace a number following a comma by the first column prefix and that number.
Assuming this is for shell scripting, you can do so with 2 seds:
set string = "A1,2,3,4,5"
set prefix = `echo $string | sed 's/^\([A-Z]\).*/\1/'`
echo $string | sed 's/,\([0-9]\)/,'$prefix'-\1/g'
Output is
A1,A-2,A-3,A-4,A-5
With
set string = "B-1,20,300"
Output is
B-1,B-20,B-300
Could you please try following(if ok with awk).
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i !~ /^A/&&$i !~ /\"A/){
$i="A-"$i
}
}
}
1' Input_file
if your data in 'd' file, tried on gnu sed:
sed -E 'h;s/^(\w-).+/\1/;x;G;:s s/,([0-9]+)(.*\n(.+))/,\3\1\2/;ts; s/\n.+//' d

Extract substring from string with sed

I want to extract MIB-Objects from snmpwalk output. The output FILE looks like:
RFC1213-MIB::sysDescr.0.0.0.0.192.168.1.2 = STRING: "Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u1 (2017-06-18) x86_64"
RFC1213-MIB::sysObjectID.0 = OID: RFC1155-SMI::enterprises.8072.3.2.10
..
First, I read the output file, split at character = and remove everything between RFC1213-MIB:: and .0 till the end of the string.
while read -r; do echo "${REPLY%%=*}" | sed -e 's/RFC1213-MIB::\(.*\)\.0/\1/'; done <$FILE
My current output:
sysDescr.0.0.0.192.168.1.2
sysObjectID
How can I remove the other values? Is there a better solution of extracting sysDescr, sysObjectID?
With awk:
awk -F[:.] '{print $3}'
(define : and . as field delimiters and display the 3rd field)
with sed (Gnu):
sed 's/^[^:]*::\|\.0.*//g'
(replace with the empty string all that isn't a : followed by :: at the start of the line or the first .0 and following characters until the end of the line)
Maybe you can try with:
sed 's/RFC1213-MIB::\([^\.]*\).*/\1/' $FILE
This will get everything that is not a dot (.) following the RFC1213-MIB:: string.
If you don't want to use sed, you can just use parameter substitution. sed is an external process so it won't be as fast as parameter substitution since it's a bash built in.
while IFS= read -r line; do line=${line#*::}; line=${line%%.*}; echo $line; done < file
line=${line#*::} assumes RFC1213-MIB does not have two colons and will be split from sysDescr with two colons.
line=${line%%.*} assumes sysDescr will have a . after it.
If you have more examples, that you think won't work, I can update my answer.

Regular expressions with grep

So I have a bunch of data that all looks like this:
janitor#1/2 of dorm#1/1
president#4/1 of class#2/2
hunting#1/1 hat#1/2
side#1/2 of hotel#1/1
side#1/2 of hotel#1/1
king#1/2 of hotel#1/1
address#2/2 of girl#1/1
one#2/1 in family#2/2
dance#3/1 floor#1/2
movie#1/2 stars#5/1
movie#1/2 stars#5/1
insurance#1/1 office#1/2
side#1/1 of floor#1/2
middle#4/1 of December#1/2
movie#1/2 stars#5/1
one#2/1 of tables#2/2
people#1/2 at table#2/1
Some lines have prepositions, others don't so I thought I could use regular expressions to clean it up. What I need is each noun, the # sign and the following number on its own line. So for example, the first lines of output should look like this in the final file:
janitor#1
dorm#1
president#4
etc...
The list is stored in a file called NPs. My code to do this is:
cat NPs | grep -E '\b(\w*[#][1-9]).' >> test
When I open test, however, it's the exact same as the input file. Any input as to what I'm missing? It doesn't seem like it should be a hard operation, so maybe I'm missing something about syntax? I'm using this command from a shell script that is called in bash.
Thanks in advance!
This should do what you need.
The -o option will show only the part of a matching line that matches the PATTERN.
grep -Eo '[a-z#]+[1-9]' NPs > test
or even the -P option, which Interprets the PATTERN as a Perl regular expression
grep -Po '[\w#]*(?=/)' NPs > test
Using grep:
$ grep -o "\w*[#]\w*" inputfile
janitor#1
dorm#1
president#4
class#2
hunting#1
hat#1
side#1
hotel#1
side#1
hotel#1
king#1
hotel#1
address#2
girl#1
one#2
family#2
dance#3
floor#1
movie#1
stars#5
movie#1
stars#5
insurance#1
office#1
side#1
floor#1
middle#4
ecember#1
movie#1
stars#5
one#2
tables#2
people#1
table#2
grep variations extracting entire lines from text, if they match pattern. If you need to modify lines, you should use sed, like
cat NPs | sed 's/^\(\b\w*[#][1-9]\).*$/\1/g'
You need sed, not grep. (Or awk, or perl.) It looks like this would do what you want:
cat NPs | sed 's?/.*??'
or simply
sed 's?/.*??' NPs
s means "substitute". The next character is the delimiter between regular expressions. Usually it's "/", but since you need to search for "/", I used "?" instead. "." refers to any character, and "*" says "zero or more of what preceded me". Whatever is between the last two delimiters is the replacement string. In this case it's empty, so you're replacing "/" followed by zero or more of any character, with the empty string.
EDIT: Oh, I see now that you wanted to extract the last item on the line, too. Well, I'm sure that others' suggested regexps would work. If it were my problem, I'd probably filter the file in two steps, perhaps piping the results from one step to the next, or using multiple substitutions with sed: First delete the "of"s and middle spaces, and add newlines, and then run sed as above. It's not as cool as doing it all in one regexp, but each step is easier to understand. For even more simplicity and uncoolness, use three steps, replacing " of " with space in the first step. Since others have provided complete solutions, I won't work out the details.
Grep by default just searches for the text, so in your case it is printing the lines that match. I think you want to investigate sed instead to perform the replacement. (And you don't need to cat the file, just grep PATTERN filename)
To get your output on separate lines, this worked for me:
sed 's|/.||g' NPs | sed 's/ .. /=/' | tr "=" "\n"
This uses two seds in a row to do different substitutions, and tr to insert line feeds.
The -o option in grep, which causes it to print out only the matching text, as described in another answer, is probably even simpler!
An awk version:
awk '/#/ {print $NF}' RS="/" NPs
janitor#1
dorm#1
president#4
class#2
hunting#1
hat#1
side#1
hotel#1
side#1
hotel#1
king#1
hotel#1
address#2
girl#1
one#2
family#2
dance#3
floor#1
movie#1
stars#5
movie#1
stars#5
insurance#1
office#1
side#1
floor#1
middle#4
December#1
movie#1
stars#5
one#2
tables#2
people#1
table#2