How to make regex accept strings with a dot - regex

I'm currently debugging a Perl script and encountered some errors which will be described in the following sections below:
In the script, I have this -td variable designed to accept a string like 1n, 5n, 0.3n, 0.8n
But when I try to use the the last two from the said list, the script does not work as intended, and only works fine when I use only the first two from the list.
To give you an overview, I have written some portions of the script, then after the code, I will state my concern:
if (scalar(#ARGV) < 1){ &get_usage() };
# Getoptions Setup ##
GetOptions (
'h|help' => \$help,
'v|version' => \$version,
'i|input=s' => \$input,
'o|output=s' => \$output,
... # more options here
'td=s' => \$td_val, # this is the line in question
'val=s' => \$v_var,
... # more options here
) or die get_usage(); # this will only call usage of script or help
... # more codes here
get_input_arg(); # this function will validate the entries user had inputted
#assigning to a placeholder value for creating a new file
$td_char="\ttd=$td_val" if $td_val;
$td_char=" " if !$td_val;
... # fast forward ...
sub get_input_arg{
...
# here you go, there is wrong in the following regex to accept values such as 0.8n
unless (($td_val=~/^\d+(m|u|n|p)s$/)||($td_val=~/^\d+(m|u|n|p|s)$/)||($td_val=~/^\d+$/)){#
print "\n-td error!\nEnter the required value!\n";
get_usage();
... # more functions here ...
}
For explanation:
On the console user will input -td 5n
The 5n will be assigned to td_val and to td_char and be used for printing later
This 5n will be validated by get_input_arg() function which will pass to the regex unless line.
For 5n input, script work as intended, but when we use -td 0.8n, then after validating it, it will print the error message after the unless line on the console
I know the regex is failing on matching with regards to using 0.8n as td input, but I don't know how can I fix it. Thanks in advance!

You can use
unless (($td_val=~/^\d+(\.\d+)?[munp]s$/)||($td_val=~/^\d+(\.\d+)?[munps]$/)||($td_val=~/^\d+(\.\d+)?$/))
Explanation:
Your regex has \d+ which matches only integers.. so replace it with \d+(\.\d+)? (integral part followed by optional decimal part)
See DEMO

Related

php regexp to search replace string functions to mb string functions

Solution was to look into look-aheads and look-behinds - the concept of LookArounds in RegEx helped me solve my issue since replacements was eaten from eachother when i did a replacement
So we've been working for a while to make some transitions on some of our older projects and (perhaps bad/old coding habits) and are working on making them php7-ready.
In this process i have made some adjustments in the .php files of the project so that for example
The problem at hand is that im facing some issues with danish characters in php string functions (strlen, substr etc) and would like for them to use mb_string functions instead. From what i can read on the internet using the "overload" function is not the way to go, so therefore i've decided to make filebased search replace.
My search replace function look like this right now (Updated thanks to #SeanBright)
$testfile = file_get_contents($file);
$array = array ( 'strlen'=>'mb_strlen',
'strpos'=>'mb_strpos',
'substr'=>'mb_substr',
'strtolower'=>'mb_strtolower',
'strtoupper'=>'mb_strtoupper',
'substr_count'=>'mb_substr_count',
'split'=>'mb_split',
'mail'=>'mb_send_mail',
'ereg'=>'mb_ereg',
'eregi'=>'mb_eregi',
'strrchr' => 'mb_strrchr',
'strichr' => 'mb_strichr',
'strchr' => 'mb_strchr',
'strrpos' => 'mb_strrpos',
'strripos' => 'mb_strripos',
'stripos' => 'mb_stripos',
'stristr' => 'mb_stristr'
);
foreach($array as $function_name => $mb_function_name){
$search_string = '/(^|[\s\[{;(:!\=\><?.,\*\/\-\+])(?<!->)(?<!new )' . $function_name . '(?=\s?\()/i';
$testfile = preg_replace($search_string, "$1".$mb_function_name."$2$3", $test,-1,$count);
}
print "<pre>";
print $test;
The $file has this content:
<?php
print strtoupper('test');
print strtolower'test');
print substr('tester',0,1);
print astrtoupper('test');
print bstrtolower('test');
print csubstr(('tester',0,1);
print [substr('tester',0,1)];
print {substr('tester',0,1)};
substr('test',0,1);
substr('test',0,1);
(substr('test',0,1));
!substr();
if(substr()==substr()=>substr()<substr()){
?substr('test');
}
"test".substr('test');
'asd'.substr('asd');
'asd'.substr('asd');
substr( substr('asdsadsadasd',0,-1),strlen("1"),strlen("100"));
substr (substr ('Asdsadsadasd',0,-1), strlen("1"), strlen("100"));
substr(substr(substr('Asdsadsadasd',0,-1),0,-1), strlen("1"), strlen("100"));
mailafsendelse(substr('asdsadsadasd',0,-1), strlen("1"), strlen("100"));
mail(test);
substr ( tester );
substr ( tester );
mail mail mail mail ( tester );
$mail->mail ();
$mail -> mail ();
new Mail();
new mail ();
strlen ( tester )*strlen ( tester )+strlen ( tester )/strlen ( tester )-strlen ( tester )
;
The point here is that the actual php code does not have to be valid syntax. I just wanted to make it work in different scenarios
My regEx problem is that i cannot find out why this line:
substr(substr(substr('Asdsadsadasd',0,-1),0,-1), strlen("1"), strlen("100"));
is not working. The 1st and 3rd substr are replaced correct but the 2nd looks like this:
mb_substr(substr(mb_substr('Asdsadsadasd',0,-1),0,-1), mb_strlen("1"), mb_strlen("100"));
As a note my search string is made to work with all sorts of characters in front of function name and require that the characters AFTER the function name is a "("
In a perfect world i would like to also exclude stringfunctions that are methods in classes, for example: $order->mail() that would send an email. This i would like NOT to be converted to $order->mb_send_mail()
From my understanding all parameters are the same, so it should not be a problem.
Complete script can be found here
https://github.com/welrachid/phpStringToMBString
The problem is that some of the characters you are using to delimit your function call checks are being consumed by matching. If you switch the last group to be a positive lookahead, this will fix the problem:
$search_string = '/([ \[{\n\t\r;(:!=><?\.,])'.($function_name).'([\ |\t]{0,1})(?=[(]{1})/i';
^^ Add these
Your current expression also won't match function calls at the beginning of the line. The following handles that and also simplifies things a bit:
$search_string = '/(^|[\s\[{;(:!=><?.,])' . $function_name . '(?=\s?\()/i';
I've set up an example on regex101.com.
You might even be able to get away with:
$search_string = '/(^|\W)' . $function_name . '(?=\s?\()/i';
Where \W will match a non-word character.
Update
To prevent matching method calls, you can add a negative lookbehind to your pattern:
$search_string = '/(^|[\s\[{;(:!=><?.,])(?<!->)' . $function_name . '(?=\s?\()/i';
^^^^^^^

Does using multiline in logstash filter print out the data?

I am trying to use multiline to combine a number of of lines in a logfile with the same starting symbol. In my case the starting symbol is #S#. it would look something like this:
#S# dsifj sdfojosf sfjosdfoisdjf
#S# dsfj sdojifoig dfpkgokdfgk 89s7fsjlk sdf
#S# lsdffm dg;;dfgl djfg 930`e`fsd
...
...
...
Note: The random character is just use to imitate the content of the actual log.
The following is what is wrote for the multiline startment:
multiline {
type => "table_init"
pattern => "#S#"
negate => true
what => "next"
}
I am assuming what I wrote does combine them as one line, but I am wondering if this prints out the line or do I need to use gork to parse the whole entire line before it prints. Any thoughts and inputs will be helpful. Thank you.
If you are trying to match up all lines that DO match "#S#", then you should have negate set to false. You use negate when you want to get all lines that DO NOT match a certain pattern.
As for your actual question, multiline takes all the relevant lines and puts them into the "message" field, including newline characters (\n, and I assume \r if you are running Windows as well though I have never checked). You can then grok this entire message to get the data you want.
So if you set up your output like so:
output { stdout { codec => rubydebug } }
You should find that the outputted message will read something like:
"message" = "#S# dsifj sdfojosf sfjosdfoisdjf \n#S# dsfj sdojifoig dfpkgokdfgk 89s7fsjlk sdf\n#S# lsdffm dg;;dfgl djfg 930`e`fsd
if you set up your multiline filter correctly.
Hope this helps!

Capturing specific part of domain name in R using regex

I am trying to capture domain names from a long string in R. The domain names are as follows.
11.22.44.55.url.com.localhost
The regex I am using is as following,
(gsub("(.*)\\.([^.]*url[^.]*)\\.(.*)","\\2","11.22.44.55.test.url.com.localhost",ignore.case=T)[1])
When I test it, I get the right answer that is
url.com
But when I run it as a job on a large dataset, (I run this using R and Hadoop), the result ends up being this,
11.22.44.55.url
And sometimes when the domain is
11.22.44.55.test.url.com.localhost
but I never get
url.com
I am not sure how this could happen. I know while I test it individually its fine but while running it on my actual dataset it fails. Am I missing any corner case that is causing a problem?
Additional information on the dataset, each of these domain addresses is an element in a list, stored as a string, I extract this and run the gsub on it.
This solution is based on using sub twice. First,".localhost" is removed from the string. Then, the URL is extracted:
# example strings
test <- c("11.22.44.55.url.com.localhost",
"11.22.44.55.test.url.com.localhost",
"11.22.44.55.foo.bar.localhost")
sub(".*\\.(\\w+\\.\\w+)$", "\\1", sub("\\.localhost", "", test))
# [1] "url.com" "url.com" "foo.bar"
This solution works also for strings ending with "url.com" (without ".localhost").
Why not try something simpler, split on ., and pick the parts you want
x <-unlist(strsplit("11.22.44.55.test.url.com.localhost",
split=".",fixed=T))
paste(x[6],x[7],sep=".")
I'm not 100% sure what you're going for with the match, but this will grab "url" plus the next word/numeric sequence after that. I think the "*" wildcard is too greedy, so I made use of the "+", which matches 1 or more characters, rather than 0 or more (like "*").
>oobar = c(
>"11.22.44.55.url.com.localhost",
>"11.22.44.55.test.url.cog.localhost",
>"11.22.44.55.test.url.com.localhost"
>)
>f = function(url) (gsub("(.+)[\\.](url[\\.]+[^\\.]+)[\\.](.+)","\\2",url,ignore.case=TRUE))
>f(oobar)
[1] "url.com" "url.cog" "url.com"

Vim: Delete the text matching a pattern IF submatch(1) is empty

This command line parses a contact list document that may or may not have either a phone, email or web listed. If it has all three then everything works great - appending the return from the FormatContact() at the end of the line for data uploading:
silent!/^\d/+1|ki|/\n^\d\|\%$/-1|kj|'i,'jd|let #a = substitute(#",'\s*Phone: \([^,]*\)\_.*','\1',"")|let #b = substitute(#",'^\_.*E-mail:\s\[\d*\]\([-_#.0-9a-zA-Z]*\)\_.*','\1',"")|let #c = substitute(#",'^\_.*Web site:\s*\[\d*\]\([-_.:/0-9a-zA-Z]*\)\_.*','\1',"")|?^\d\+?s/$/\=','.FormatContact(#a,#b,#c)
or, broken down:
silent!/^\d/+1|ki|/\n^\d\|\%$/-1|kj|'i,'jd
let #a = substitute(#",'\s*Phone: \([^,]*\)\_.*','\1',"")
let #b = substitute(#",'^\_.*E-mail:\s\[\d*\]\([-_#.0-9a-zA-Z]*\)\_.*','\1',"")
let #c = substitute(#",'^\_.*Web site:\s*\[\d*\]\([-_.:/0-9a-zA-Z]*\)\_.*','\1',"")
?^\d\+?s/$/\=','.FormatContact(#a,#b,#c)
I created three separate searches so as not to make any ONE search fail if one atom failed to match because - again - the contact info may or may not exist per contact.
The Problem that solution created was that when the pattern does not match I get the whole #" into #a. Instead, I need it to be empty when the match does not occur. I need each variable represented (phone,email,web) whether it be empty or not.
I see no flags that can be set in the substitution function that
will do this.
Is there a way to return "" if \1 is empty?
Is there a way to create an optional atom so the search query(ies) could still account for an empty match so as to properly record it as empty?
Instead of using substitutions that replace the whole captured text
with its part of interest, one can match only that target part. Unlike
substitution routines, matching ones either locate the text conforming
to the given pattern, or report that there is no such text. Thus,
using the matchstr() function in preference to substitute(), the
parsing code listed in the question can be changed as follows:
let #a = matchstr(#", '\<Phone:\s*\zs[^,]*')
let #b = matchstr(#", '\<E-mail:\s*\[\d*\]\zs[-_#.0-9a-zA-Z]*')
let #c = matchstr(#", '\<Web site:\s*\[\d*\]\zs[-_.:/0-9a-zA-Z]*')
Just in case you want linewise processing, consider using in combination with :global, e.g.
let #a=""
g/text to match/let #A=substitute(getline("."), '^.*\(' . #/ . '\).*$', '\1\r', '')
This will print the matched text for any line that contained it, separated with newlines:
echo #a
The beautiful thing here, is that you can make it work with the last-used search-pattern very easily:
g//let #A=substitute(getline("."), '^.*\(' . #/ . '\).*$', '\1\r', '')

Regular Expression to find string in Expect buffer

I'm trying to find a regex that works to match a string of escape characters (an Expect response, see this question) and a six digit number (with alpha-numeric first character).
Here's the whole string I need to identify:
\r\n\u001b[1;14HX76196
Ultimately I need to extract the string:
X76196
Here's what I have already:
interact {
#...
#...
#this expression does not identify the screen location
#I need to find "\r\n\u001b[1;14H" AND "([a-zA-Z0-9]{1})[0-9]{5}$"
#This regex was what I was using before.
-nobuffer -re {^([a-zA-Z0-9]{1})?[0-9]{5}$} {
set number $interact_out(0,string)
}
I need to identify the escape characters to to verify that it is a field in that screen region. So I need a regex that includes that first portion, but the backslashes are confusing me...
Also once I have the full string in the $number variable, how do I isolate just the number in another variable in Tcl?
If you just want the number at the end, then this should be enough...
[0-9]{6}
Update with new information
Assuming \n is a newline character, rather than a literal \ followed by a literal n, you can do this...
\r\n\u001B\[1;14H(X[0-9]{5})
I found out a few things with some more digging. First of all I wasn't looking at the output of the program but the input of the user. I needed to add the "-o" flag to look at the program output. I also shortened the regex to just the necessary part.
The regex example from #rikh led me to look at why his or my own regex was failing, and that was due to the fact that I wasn't looking at the output but the input. So the original regex that I tried wasn't at fault but the data being looked at (missing the "-o" flag)
Here's the complete answer to my problem.
interact {
#...
-o -nobuffer -re {(\[1;14H[a-zA-Z0-9]{1})[0-9]{5}} {
#get number in place
set numraw $interact_out(0,string)
#get just number out
set num [string range $numraw 6 11]
#switch to lowercase
set num [string tolower $num]
send_user " stored number: $num"
}
}
I'm a noob with Expect and Tcl so if any of this doesn't make sense or if you have any more insights into the interact flags, please set me straight.