Use regex to reduce a line?

Use regex to reduce a line? - regex

I have a line such as this:
andy_1972 * andy#ip.address 0 0 0 0 0 0 119075 224 1342751704 1348550270
I want the end result to be the bolded characters, like this:
andy_1972 119075
I am trying to just trim the line down to the word and the 4th number from the end of the line.
How can I do this using regex? I'm using Notepad++

This will match the first word and the fourth-from-last number:
^(\w+).* (\d+) \d+ \d+ \d+$

In perl-compatible (perl or PCRE) that would be
$string = "andy_1972 * andy#ip.address 0 0 0 0 0 0 119075 224 1342751704 1348550270";
$string =~ /^(\w+).* (\d+) \d+ \d+ \d+$/;
print $1 $2;

Using cut:
echo andy_1972 \* andy#ip.address 0 0 0 0 0 0 119075 224 1342751704 1348550270 |
cut -d' ' -f1,10

Related

"email ip" regex into log file

I have a logs file looking like:
'User_001','Entered server','email#aol.com','2','','','0','YES','0','0',','0','192.168.1.1','192.168.1.2','0','0','0','0','0','0','0','0','0','1','0','','0','0','0','1'
'User_002','Entered server','email#aol.com','2','','','0','NO','0','0',','0','192.168.1.3','192.168.1.4','0','0','0','0','0','0','0','0','0','1','0','','0','0','0','1'
OR
User_001 Entered server email#aol.com 2 Pool_1 YES 0 0 0 192.168.1.1 192.168.1.2 0 0 0 0 0 0 0 0 0 1 0 0 1
User_002 Entered server email#aol.com 2 Pool_1 NO 0 0 0 192.168.1.3 192.168.1.4 0 0 0 0 0 0 0 0 0 1 0 0 1
And i'm trying to make a regex for export in "Email IP" format the contents.
I tried with a regex like:
([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}(.*)([0-9]{1,3}[\.]){3}[0-9]{1,3})
But of course doesn't work since that get also the whole content between the 2 matched strings.
How can i ignore the contents between the 2 found strings?
I tried to negate that regex part without success.
Thanks to everyone in advance!
P.s. I need do this using grep

This is my ugly regex solution (that works):
([a-z0-9]+#[a-z0-9.]+).*?([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
https://www.regex101.com/r/APfJS1/1
const regex = /([a-z0-9]+#[a-z0-9.]+).*?([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/gi;
const str = `User_001','Entered server','email#aol.com','2','','','0','YES','0','0',','0','192.168.1.1','192.168.1.2','0','0','0','0','0','0','0','0','0','1','0','','0','0','0','1'`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
But as mentioned in the comments: a good csv parser will be better probably!
PHP
$re = '/([a-z0-9]+#[a-z0-9.]+).*?([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/i';
$str = 'User_001\',\'Entered server\',\'email#aol.com\',\'2\',\'\',\'\',\'0\',\'YES\',\'0\',\'0\',\',\'0\',\'192.168.1.1\',\'192.168.1.2\',\'0\',\'0\',\'0\',\'0\',\'0\',\'0\',\'0\',\'0\',\'0\',\'1\',\'0\',\'\',\'0\',\'0\',\'0\',\'1\'';
preg_match_all($re, $str, $matches);
// Print the entire match result
print_r($matches);

How to cut out a specific substring?

I wonder if it is possible to cut a substring from a string if this substring's occurency can be zero or more times using a perl regex?
So for example:
"foo bar //baz" and "foo bar" should both result into "foo bar", cutting out everything behind a double slash, if it's there.
I know this could be easily achieved using other methods, but I'm interested if a regex oneliner is possible for that.
I tried ($new_string) = ($string =~ /(.*?)(\/\/)*.*/)
But that does not work.

_____________ Matches 0 chars at position 0 ("").
/ ______ Matches 0 chars at position 0 ("").
/ / _____ Matches 13 chars at position 0 ("foo bar //baz").
_/ _____/ /
/ \ / \/\
(.*?)(\/\/)*.*
What you want:
( my $new_string = $string ) =~ s{//.*}{};
my $new_string = $string =~ s{//.*}{}r; # 5.14+

Remove spaces between words only not between numbers

I have a string consist of words, special characters (*, |, ( etc.) and numbers(floating). I want to remove white spaces between only words and special characters. Spaces between numbers should not be removed. How I can do it in Perl?
E.g.:
Rama 1 * 2.34 * ( L - 0.45 ) XYZ 10 20.05 30.06 40 P > 25.
It should be after conversion:
Rama1*2.34*(L-0.45)XYZ 10 20.05 30.06 40 P>25.

(?<!\d)\h+|\h+(?!\d)
You can use lookarounds here.See demo.
https://regex101.com/r/uF4oY4/62

You may use the below lookaround based regex.
perl -pe 's/\s+(?=\D)|(?<=\D)\s+//g' file
Example:
$ echo 'Rama 1 * 2.34 * ( L - 0.45 ) XYZ 10 20.05 30.06 40 P > 25.' | perl -pe 's/\s+(?=\D)|(?<=\D)\s+//g'
Rama1*2.34*(L-0.45)XYZ10 20.05 30.06 40P>25.
or
$ echo 'Rama 1 * 2.34 * ( L - 0.45 ) XYZ 10 20.05 30.06 40 P > 25.' | perl -pe 's/(?<=[^\s\w])\s+|\s+(?=[^\w\s])//g'
Rama 1*2.34*(L-0.45)XYZ 10 20.05 30.06 40 P>25.

Codegolf regex match

In the codegold i found this answer: https://codegolf.stackexchange.com/a/34345/29143 , where is this perl one liner:
perl -e '(q x x x 10) =~ /(?{ print "hello\n" })(?!)/;'
After the -MO=Deparse got:
' ' =~ /(?{ print "hello\n" })(?!)/;
^^^^^^^^^^^^
10 spaces
The explanation told than the (?!) never match, so the regex tries match each character. OK, but why it prints 11 times hello and not 10 times?

Regular expressions start matching based off positions, which can includes both before each character but also after the last character.
The following zero width regular expression will match before each of the 5 characters of the string, but also after the last one, thus demonstrated why you got 11 prints instead of just 10.
use strict;
use warnings;
my $string = 'ABCDE';
# Zero width Regular expression
$string =~ s//x/g;
print $string;
Outputs:
xAxBxCxDxEx
^ ^ ^ ^ ^ ^
1 2 3 4 5 6

It's because when you have a string of n characters there are n+1 positions in the string where the pattern is tested.
example with "abc":
a b c
^ ^ ^ ^
| | | |
| | | +--- end of the string
| | +----- position of c
| +------- position of b
+--------- position of a
The position of the end of the string can be a little counter-intuitive, but this position exists. To illustrate this fact, consider the pattern /c$/ that will succeed with the example string. (think of the position in the string when the end anchor is tested). Or this other one /(?<=c)/ that succeeds in the last position.

Take a look at the following:
$x = "abc"; $x =~ s/.{0}/x/; print("$x\n"); # xabc
$x = "abc"; $x =~ s/.{1}/x/; print("$x\n"); # xbc
$x = "abc"; $x =~ s/.{2}/x/; print("$x\n"); # xc
$x = "abc"; $x =~ s/.{3}/x/; print("$x\n"); # x
Nothing surprising. You can match anywhere between 0 and 3 of the three characters, and place an x at the position where you left off. That's four positions for three characters.
Also consider 'abc' =~ /^abc\z/.
Starting at position 0, ^ matches zero chars.
Starting at position 0, a matches one char.
Starting at position 1, b matches one char.
Starting at position 2, c matches one char.
Starting at position 3, \z matches zero char.
Again, that's a total of four positions needed for a three character string.
Only zero-width assertions can match at the last position, but there are plenty of those (^, \z, \b, (?=...), (?!...), (?<=...), (?:...)?, etc).
You can think of the positions as the edges of the characters, if that helps.
|a|b|c|
0 1 2 3

Regular expression, tcl

I'm trying to extract the specific lines from a trace file like below:
- 0.118224 0 7 ack 40 ------- 1 2.0 7.0 0 2
r 0.118436 1 2 tcp 40 ------- 2 7.1 2.1 0 1
+ 0.118436 1 2 ack 40 ------- 2 3.1 2.1 0 3
- 0.118436 1 2 ack 40 ------- 2 4.1 2.1 0 3
r 0.120256 0 7 ack 40 ------- 1 2.0 7.0 0 2
I want to extract any line that have the following:
r x.xxxxx 1 2 xxx xx ------- x numbers.x 2.x x x.
Note: x means any value and numbers could be between 3-to-7.
here is my try-its not working !!:
if {[regexp \r+ ([0-9.]+) 1 2.*- ([3-7.]+) 2.*- ([0-9.]+) $line -> time]}
Any suggestion??

Here's another approach: extract the fields you want to use for comparison
while {[gets $f line] != -1} {
lassign [split $line] a - b c - - - - d e - -
if {
$a eq "r" &&
$b == 1 &&
$c == 2 &&
3 <= floor($d) && floor($d) <= 7 &&
floor($e) == 2
} {
puts $line
}
}

You have to escape the . with a \. It means "any character" in regexp.
So your regexp could look like:
if {[regexp {r \d\.\d{5} 1 2 \d{3} \d{2} ------- \d [3-7]\.\d 2\.\d \d \d} $line -> time ]} {
# ...
}
Now you have to place () around the part you want.
Btw: I used the following transformation on your description of what you want to match:
set input {r x.xxxxx 1 2 xxx xx ------- x numbers.x 2.x x x}
set re [subst [regsub -all {x{2,}} $data {\\\\d{[string length \0]}}]]
set re [string map {. {\.} x {\d} numbers {[3-7]}} $re]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Use regex to reduce a line? - regex

This will match the first word and the fourth-from-last number: ^(\w+).* (\d+) \d+ \d+ \d+$

In perl-compatible (perl or PCRE) that would be $string = "andy_1972 * andy#ip.address 0 0 0 0 0 0 119075 224 1342751704 1348550270"; $string =~ /^(\w+).* (\d+) \d+ \d+ \d+$/; print $1 $2;

Using cut: echo andy_1972 \* andy#ip.address 0 0 0 0 0 0 119075 224 1342751704 1348550270 | cut -d' ' -f1,10

Related

"email ip" regex into log file

How to cut out a specific substring?

Remove spaces between words only not between numbers

Codegolf regex match

Regular expression, tcl

Categories

Resources