Using awk, how do I match pattern and variants?

Using awk, how do I match pattern and variants? - regex

I've been struggling with this for a while in regex testers but what came up as a correct regex pattern actually failed. I've got a large file, tab delimited, with numerous types of data. I want to print a specific column, with the characters XYZ, and it's subsequent values.
In the specific column I'm interested in I have values like:
XYZ
ABCDE
XYZ/WORDS
XYZ/ABCDE
ABFE
XYZ
regex tester that was successful was something like:
XYZ(.....)*
It obviously fails when implemented as:
awk '{if ($1=="XYZ(......)*") print$0}'
What regex character do I use to denote that I want everything after the backslash(/), including the original pattern (XYZ)?
Specifically, I want to be able to capture all instances of XYZ, and print the other columns that go along with them (hence the print$0). Specifically, capture these values:
XYZ
XYZ/WORDS
XYZ/ABCDE
Thank you

Setup: (assuming actual data file does not include blank lines)
$ cat x
XYZ 1 2 3 4
ABCDE 1 2 3 4
XYZ/WORDS 1 2 3 4
XYZ/ABCDE 1 2 3 4
ABFE 1 2 3 4
XYZ 1 2 3 4
If you merely want to print all rows where the first field starts with XYZ:
$ awk '$1 ~ /^XYZ/' x
XYZ 1 2 3 4
XYZ/WORDS 1 2 3 4
XYZ/ABCDE 1 2 3 4
XYZ 1 2 3 4
If this doesn't provide the expected results then please update the question with more details (to include a more representative set of input data and the expected output).

Related

Sumif and IF, to add up one column and compare it to another column

I have two lists that I want to compare to see if they match, but in one list the numbers are broken down into individual lots so I need to sum them first to make sure they match the other list which only shows the total amount.
Here's an example:
List 1
5 ABC
6 ABC
7 ABC
1 CDE
5 CDE
2 CDE
List 2
18 ABC
8 CDE
So I want to make sure that the sum of the ABC and CDE in List 1 matches the amount of ABC and CDE in List 2. I can do this using multiple columns, but I am trying for a more...elegant way (one nested formula).

If you are looking for a confirmation that the numbers match you can use the following:
=SUMIF($B$3:$B$11,E1,$A$3:$A$11)=SUMIF(B15:$B$17,E1,$A$15:$A$17)
Whhat this does is check if the sum of ABC in list 1 is equal to the sum of ABC in list 2 and return true if they are equal and false if they are not.

Find and KEEP all DUPLICATE lines (instead of unique lines) in a text file

I am aiming to identify and keep DUPLICATE, TRIPLICATE, etc. lines, i.e., all lines that occur more than once in Notepad++? In other words, how can I delete all unique lines only?
For example, here are seven (7) separate lists and the desired true duplicate lines of each lists (shown as 7 columns, regard each column as an individual list or file!). (The lists here are shown side by side only to save space, in real life, each of the 7 lists occurs alone and independently from the others and are separate files!)
list1 list2 list3 list4 list5 list6 list7
1 0 0 0 0 0 0
2 1 1 1 1 1 1
3 2 2 2 2 2 2
4 3 3 3 3 3 3
4 4 4 4 4 4 4
4 4 4 4 4 4 4
5 4 4 4 4 4 4
6 5 5 5 5 5 5
7 5 5 5 5 5 5
8 6 6 6 6 6 6
9 6 6 6 6 6 6
abc 7 7 7 7 7 7
abd 8 8 8 8 8 8
abd 9 9 9 9 9 9
abe <CR> 9 9 9 9
<CR> 99 99
<CR>
[Lines of multiple occurence of above lists:]
4 4 4 4 4 4 4
4 4 4 4 4 4 4
4 4 4 4 4 4 4
abd 5 5 5 5 5 5
abd 5 5 5 5 5 5
6 6 6 6 6 6
6 6 6 6 6 6
9 9 9 9
9 9 9 9
There are many solutions to eliminate duplicates (e.g., TextFX; notepad++ delete duplicate and original lines to keep unique lines), I can not find solutions to keep duplicates only.
((.*)\R(\2\R)+)*\K.+\R
#Lars Fischer: This script works nearly OK, except the last entry of the (presorted) list needs to be unique line followed by a <CR> empty line. One (suboptimal) workaround is to insert an artificial (helper) unique line (e.g., zzz) followed by an empty line <CR> as the last two lines.
(END OF QUESTION)
UPDATE 3: This question is reposted per stackoverflow "ask a new question" instruction. (#AdrianHHH, #B. Desai, #Paolo Forgia, #greg-449, #Erik von Asmuth draw the incorrect conclusion that this question is a duplicate of notepad++ delete duplicate and original lines to keep unique lines. This question is definitely not a duplicate of the one #AdrianHHH et al quotes.
UPDATE 2: #AdrianHHH This question is not less "broad" (in fact, one can hardly be more specific) or less researched than other Notepad++ questions, including the one https://stackoverflow.com/questions/29303148 cited (wrongly) by #AdrianHHH et al. as the same question.
UPDATE:
#AdrianHHH, #B. Desai, #Paolo Forgia, #greg-449, #Erik von Asmuth
This questions is different from:
https://stackoverflow.com/questions/29303148
beacuse Q 29303148 is (i) neither asking how to identify and keep only the lines of multiple occurrence, (ii) neither there is a solution provided in the answers for that. Q 29303148 asks "...I just need the unique lines."

Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates):
Open the Mark Dialog (Search -> Mark ....)
click Clear all Marks on the right
check Bookmark line
check Wrap aound
Find What: ((.*)\R(\2\R?)+)*\K.*
Check regular expression and uncheck . matches newline
Mark All
Click Close
Search -> Bookmark -> Remove Bookmarked Lines
Explanation
The regular expression is made up of three parts:
((.*)\R(\2\R?)+)* : this is an optional block of duplicates consisting of one ore more line blocks
the outher ( ... )* matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks)
(.*)\R(\2\R?)+: \2 references the content of (.*): this are all duplicates of one line
the second \R is an optional ( due to the ?) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak
If there is a block of duplicated lines after the cursor position from which you start, this will match it.
now \K discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line
.* matches the next (unique) line and bookmarks it
Using Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.

How remove text wrap using vim text editor?

I'm trying to write a vim script for remove the text wrap, I am using the following code but it's doesn't provide exact output. eg \string{this indicate newline} if "this" appears in first line, "indicate" is in second line e.t.c then how I remove text wrap. Is it possible?
:%s/\\string{\zs\(\_[^}]*\)\ze}/\1/gec
Edit based on OP's comment:
for example (i/p): \string{1 <enterkey> 2 <enterkey> 3 <enterkey>
4 <enterkey> 5 <enterkey>}. i need (o/p) \string{1 2 3 4 5}.
Before I have:
\string{1
2
3
4
5
}
After I want:
\string{1 2 3 4 5}
Before I have (new pattern):
\string{1
{2}
{3}
4
5}
After I want:
\string{1 {2} {3} 4 5}

This line does what you want:
%s/\\string{\_[^}]*/\=substitute(submatch(0),"\n",' ','g')/
it changes:
foobar
\string{1
2
3
4
5
}
foobar
into:
foobar
\string{1 2 3 4 5 }
foobar

It would be easier to understand your question if you gave a longer example of text, and what you want to do with it. If I understand correctly, you could like to remove the line wrap on lines that contain \\string{this.
You could use :%g/\\string{this/j. It executes the j command on every line matching the \\string{this pattern.
Input:
some text
\string{this
indicate}
more text
Turns into:
some text
\string{this indicate}
more text

Not matching in perl regex

I have a variable and I want it to print success if it doesn't contained specific thing. But its always printing success even if its there.
$mystring = " 1 2 3 4 5 TEST=/my/user/test this/is/test
3 4 5 6 8 NEW=/my/new/offer this/is/offer
3 4 5 2 2 FINAL=/final/test/offer /lets/see/this";
if (($mystring !~ m/1 2 3 4 5 TEST=\/my\/user\/test this\/is\/test/i) or
($mystring !~ m/3 4 5 2 2 FINAL=\/final\/test\/offer \/lets\/see\/this/i))
{
print "success";
}
Its printing success even if the mysstring contains the string. Any help will be appreciated.

Your script is missing a ; at the end of the $mystring declaration. And the second regular expression is unterminated, missing /i at the end.
With those changes your script works fine. It prints "success" if one of the regexes does not match. In your example script, both regexes match, and it does not print "success".
If you mean to print "success" if either regex matches, use =~ instead of !~.

regex: extract 3 specific numbers from a string | example

given the following 2 lines of strings, for example :
May 22 00:46:38.340 prod-lab03c-rd1 fpc4 XETH(4/2): %PFE-6: Link 0 XFP: Low Rx power warning set
[May 24 11:24:28.299 LOG: Notice] MIC(0/1) link 1 SFP receive power low warning set
i would like to store in 3 variables the following numbers :
[1] the 1st number after the "(", it could be 1 or more digits
[2] the 1st number after the "/", it could be 1 or more digits
[3] the first number after the "(L|l)ink" word, it could be 1 or more digits
could you please assist me on this please ?
many thanks

To get the first number after the first (, we can use .*\((\d+). Then to get the first number after the /, we can use /(\d+)\). And then to get the first number after "link": [lL]ink (\d+). We put these together to get
^.*\((\d+)/(\d+)\).*[lL]ink (\d+)
The three numbers will be in the three groups

This works here
\((.*?)/(.*?)\).*?[Ll]ink (\d+)
Give your input it will give back
group 1 group 2 group 3
4 2 0
0 1 1

A full tested example in Perl :
#!/usr/bin/env perl
use strict;
use warnings;
my #a;
$a[0] = "May 22 00:46:38.340 prod-lab03c-rd1 fpc4 XETH(4/2): %PFE-6: Link 0 XFP: Low Rx power warning set\n";
$a[1] = "[May 24 11:24:28.299 LOG: Notice] MIC(0/1) link 1 SFP receive power low warning set";
foreach (#a) {
print "\$1=$1|\$2=$2|\$3=$3\n" if m!\((\d+)/(\d+)\).*?(?:L|l)ink\s+(\d+)!;
}
The output :
$1=4|$2=2|$3=0
$1=0|$2=1|$3=1

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using awk, how do I match pattern and variants? - regex

Related

Sumif and IF, to add up one column and compare it to another column

Find and KEEP all DUPLICATE lines (instead of unique lines) in a text file

How remove text wrap using vim text editor?

Not matching in perl regex

regex: extract 3 specific numbers from a string | example

Categories

Resources