I am trying to get the Word before and decimal string following a non guaranteed string that looks like ' - '.
Consider this string
"some str (targetWord - 12434 trailing string)"
this string is not guaranteed to have spaces before or after the '-'
so it could look like one of the following
"some str (targetWord-12434 trailing string)"
"some str (targetWord- 12434 trailing string)"
"some str (targetWord -12434 trailing string)"
"some str (targetWord- 12434 trailing string)"
So far I have the following
$allServices = (Get-Service "Known Service Prefix*").DisplayName
foreach ($service in $allServices){
$service = $service.split('\((.*?)\)')[1] #esc( 'Match any non greedy' esc)
if($service.split()[0] -Match '-'){
$arr_services += $service.split('( - )')[0..1]
}else{
$arr_services += ($service -replace '-','').split()[0..1]
}
}
This works to handle the simple case of ' - ' & '-', but cant handle anything else. I feel like this is the kind of problem that could be handled by one line of REGEX or at most two.
What I want to end up with is an array of strings, where the evens (including zero) are the targetWord, and the odd values are the decimal strings.
My issue isn't that I can't make this happen, it's that it looks like crap...
what I mean is my goal is to try and use REGEX to get each word, ignore the '-', and push out to a growing array the targetWord & decimalString.
I see this as more of a puzzle than anything and am trying to use this to improve my REGEX skills. Any help is appreciated!
A single regex passed to the -match operator should suffice:
$arr_services = $allServices | ForEach-Object {
if ($_ -match '\((?<word>\w+) *- *(?<number>\d+)') {
# Output the word and number consecutively.
$Matches.word, $Matches.number
}
}
# Output the resulting array.
$arr_services
Note how the pipeline output can be directly collected in a variable as an array ($arr_services = ...) - no need to iteratively "add" to an array. If you need to ensure that $arr_services is always an array - even if the pipeline outputs only one object, use [array] $arr_services = ...
With your sample strings, the above yields (a flat array of consecutive word-number pairs):
targetWord
12434
targetWord
12434
targetWord
12434
targetWord
12434
As for the regex:
\( matches a literal (
\w+ matches a nonempty run (+) of word characters (\w - letters, digits, _), captured in named capture group word ((?<word>...).
*- * matches a literal - surrounded by any number of spaces - including none (*).
\d+ matches a nonempty run of digits (\d), captured in named group digits.
if the -match operator finds a match, the results are reflected in the automatic $Matches variable, a hashtable that enables accessing named capture groups directly by name.
here's one way to handle the data set you posted. it presumes all the strings will have the same general format that you posted. that means it WILL FAIL if your sample data set is not realistic. [grin]
$InStuff = #(
'some str (targetWord - 12434 trailing string)'
'some str (targetWord-12434 trailing string)'
'some str (targetWord- 12434 trailing string)'
'some str (targetWord -12434 trailing string)'
'some str (targetWord- 12434 trailing string)'
)
$Results = foreach ($IS_Item in $InStuff)
{
$Null = $IS_Item -match '.+\((?<Word>.+) *- *(?<Number>\d{1,}) .+\)'
[PSCustomObject]#{
Word = $Matches.Word.Trim()
Number = $Matches.Number
}
}
$Results
output ...
Word Number
---- ------
targetWord 12434
targetWord 12434
targetWord 12434
targetWord 12434
targetWord 12434
Related
I want to find all users with a first name that has an empty space at the beginning or ending.
It could look like: "Juliette " or " Juliette"
For now I only have the regex to match when the space is at the end of string:
^[ab]:[[:space:]]|$
I didn't find how to match the empty space at the beginning of the string and I don't know if it's possible to accomplish both of these conditions in one regex ?
Thanks for your help.
Test for Strippable Whitespace without Regexp
There's a little trick you can use with String#strip!, which returns nil if it can't find whitespace to strip. For example:
# return true if str has leading/trailing whitespace;
# otherwise returns false
def strippable? str
{ str => !!str.dup.strip! }
end
# leading space, trailing space, no space
test_values = [ ' foo', 'foo ', 'foo' ]
test_values.map { |str| strippable? str }
#=> [{" foo"=>true}, {"foo "=>true}, {"foo"=>false}]
This doesn't rely on a regular expression, but rather on properties of the String and the Boolean result of an inverted #strip!. Regardless of whether the Ruby engine uses regular expressions under the hood, these types of String methods are often faster than comparable Regexp matches, but your mileage and specific use cases may vary.
Alternatives with Regexp
Using the same test data as above, you could do something similar with a regular expression. For example:
# leading space, trailing space, no space
test_values = [ ' foo', 'foo ', 'foo' ]
# test start/end of string
test_values = [ ' foo', 'foo ', 'foo' ].grep /\A\s+|\s+\z/
#=> [" foo", "foo "]
# test start/end of line
test_values = [ ' foo', 'foo ', 'foo' ].grep /^\s+|\s+$/
#=> [" foo", "foo "]
Benchmarks
require 'benchmark'
ITERATIONS = 1_000_000
TEST_VALUES = [ ' foo', 'foo ', 'foo' ]
def regex_grep array
array.grep /^\s+|\s+$/
end
def string_strip array
array.map { |str| { str => !!str.dup.strip! } }
end
Benchmark.bmbm do |x|
n = ITERATIONS
x.report('regex') { n.times { regexp_grep TEST_VALUES } }
x.report('strip') { n.times { string_strip TEST_VALUES } }
end
user system total real
regex 1.539269 0.001325 1.540594 ( 1.541438)
strip 1.256836 0.001357 1.258193 ( 1.259955)
A quarter second over a million iterations may not seem like a big difference, but on significantly larger data sets or iterations it can add up. Whether or not it's enough for you to care for this particular use case is up to you, but the general pattern is that native String methods (regardless of how they're implemented by the interpreter under the hood) are generally faster than regular expression pattern matching. Of course there are edge cases, but that's what benchmarks are for!
You can use
/\A([a-zA-Z]+ | [a-zA-Z]+)\z/
/\A(?:[[:alpha:]]+[[:space:]]|[[:space:]][[:alpha:]]+)\z/
/\A(?:\p{L}+[\p{Z}\t]|[\p{Z}\t]\p{L}+)\z/
See the Rubular demo (with line anchors instead of string anchors used for the demo purposes)
Details:
\A - a string start anchor
(...) - a capturing group
(?:...) - a non-capturing group (it is preferred here since you are not extracting, just validating)
[a-zA-Z]+ - any one or more ASCII letters
\p{L}+ - any one or more Unicode letters
| - or
\z - end of string anchor.
I want to find string ?Allen in the string array but there is question mark in keyword and it causes some problems.
I write this code to find string in array
#arr = ("My name is ?Allen",
"My name is ?Allens",
"My name is s?Allen",
"My name is s?Allens",
"My name is ?allen");
$keyword = "?Allen";
for (my $i=0; $i <= 4; $i++){
if ($arr[$i] =~ /\b$keyword\b/){
print "str $i = match\n";
}else{
print "str $i = no\n";
}
}
finally I get this result
str 0 = match
str 1 = no
str 2 = match
str 3 = no
str 4 = no
but I want to find only first index array as matching string like this:
str 0 = match
str 1 = no
str 2 = no
str 3 = no
str 4 = no
Note that your regex contains non-word special chars that you need to quote before using them in the actual pattern. Also, the fact that the special chars can appear at the leading/trailing positions means you cannot expect \b to always work the same (since its meaning is context dependent). Thus, you may fix the code with
/(?<!\S)\Q$keyword\E(?!\S)/
where
(?<!\S) - requires a whitespace char or start of string before
\Q$keyword\E - a literal search string (see Quoting Metacharacters)
(?!\S) - that should be followed with a whitespace or end of string.
Another alternative for \Q...\E (mentioned by Dave Cross) is using quotemeta:
This is the internal function implementing the \Q escape in double-quoted strings.
I am looking for a perl regex which will validate a string containing only the letters ACGT. For example "AACGGGTTA" should be valid while "AAYYGGTTA" should be invalid, since the second string has "YY" which is not one of A,C,G,T letters. I have the following code, but it validates both the above strings
if($userinput =~/[A|C|G|T]/i)
{
$validEntry = 1;
print "Valid\n";
}
Thanks
Use a character class, and make sure you check the whole string by using the start of string token, \A, and end of string token, \z.
You should also use * or + to indicate how many characters you want to match -- * means "zero or more" and + means "one or more."
Thus, the regex below is saying "between the start and the end of the (case insensitive) string, there should be one or more of the following characters only: a, c, g, t"
if($userinput =~ /\A[acgt]+\z/i)
{
$validEntry = 1;
print "Valid\n";
}
Using the character-counting tr operator:
if( $userinput !~ tr/ACGT//c )
{
$validEntry = 1;
print "Valid\n";
}
tr/characterset// counts how many characters in the string are in characterset; with the /c flag, it counts how many are not in the characterset. Using !~ instead of =~ negates the result, so it will be true if there are no characters not in characterset or false if there are characters not in characterset.
Your character class [A|C|G|T] contains |. | does not stand for alternation in a character class, it only stands for itself. Therefore, the character class would include the | character, which is not what you want.
Your pattern is not anchored. The pattern /[ACGT]+/ would match any string that contains one or more of any of those characters. Instead, you need to anchor your pattern, so that only strings that contain just those characters from beginning to end are matched.
$ can match a newline. To avoid that, use \z to anchor at the end. \A anchors at the beginning (although it doesn't make a difference whether you use that or ^ in this case, using \A provides a nice symmetry.
So, you check should be written:
if ($userinput =~ /\A [ACGT]+ \z/ix)
{
$validEntry = 1;
print "Valid\n";
}
Any s at the beginning of the word should be converted to a $.
Any s inside the word should be converted to a 5.
To match an s at the start of the word, use \b to match word boundaries and \w to match alphanumerics:
/\bs\w/
(as #Matthew points out, the \w is really superfluous:)
/\bs/
Once you've replaced all s at the start of a word, then the only remaining ones are inside the word (I'm assuming that you also want to replace s at the end of a word with 5) so you can simply use
/s/
For completeness, here's how to put it all together (I'm going to assume JavaScript):
function pimpMyEsses(str)
{
return str.replace(/\bs/gi, '$').replace(/s/gi, '5');
}
console.log(pimpMyEsses('slither quantum Sassy. arcades'));
// > "$lither quantum $a55y. arcade5"
Depending on the language it may be possible to capture the substitutions with a single regular expression and replace them procedurally. Here's a PHP example:
<?php
$word = 'sassy';
preg_match_all('/\b(s)|([^s]+)|(s)/', $word, $matches, PREG_SET_ORDER);
/* captures:
* $matches = array(
* array('s','s'),
* array('a','','a'),
* array('s','','','s'),
* array('s','','','s'),
* array('y','','y')
* )
*/
$newword = '';
foreach ($matches as $m){
if ($m[1]) $newword .= '$'; # leading s --> $
elseif ($m[2]) $newword .= $m[2]; # not an s --> as-is
else $newword .= '5'; # any other s --> 5
}
echo $newword;
Because I've used \b to match a word-boundary before the "leading s", the string 'sassy socks' becomes '$a55y $ock5'
If you want only the s at the start of "sassy" to become a $, change the regular expression to:
'/^(s)|([^s]+)|(s)/'
You can do:
/^(s)/ to select only the first "s";
/(?:[^s])(?:(s)[^s]*)+ to select all other "s". Note that the first character will be skipped (which is independent of);
Explain:ignore first character;Repeat one or more: get a "s" and ignore others character that not "s";
Next step: you need to determinate what language you will use.
I have a situation where I need to remove the last n numeric characters after a / character.
For eg:
/iwmout/sourcelayer/iwm_service/iwm_ear_layer/pomoeron.xml##/main/lsr_int_vnl46a/61
After the last /, I need the number 61 stripped out of the line so that the output is,
/iwmout/sourcelayer/iwm_service/iwm_ear_layer/pomoeron.xml##/main/lsr_int_vnl46a/
I tried using chop, but it removes only the last character, ie. 1, in the above example.
The last part, ie 61, above can be anything, like 221 or 2 or 100 anything. I need to strip out the last numeric characters after the /. Is it possible in Perl?
A regex substitution for removing the last digits:
my $str = '/iwmout/sourcelayer/iwm_service/iwm_ear_layer/pomoeron.xml##/main/lsr_int_vnl46a/61';
$str =~ s/\d+$//;
\d+ matches a series of digits, and $ matches the end of the line. They are replaced with the empty string.
#Tim's answer of $str =~ s/\d+$// is right on; however, if you wanted to strip the last n digit characters of a string but not necessarily all of the trailing digit characters you could do something like this:
my $s = "abc123456";
my $n = 3; # Just the last 3 chars.
$s =~ s/\d{$n}$//; # $s == "abc123"
// Code to remove last n number of strings from a string.
// Import common lang jar
import org.apache.commons.lang3.StringUtils;
public class Hello {
public static void main(String[] args) {
String str = "Hello World";
System.out.println(StringUtils.removeEnd(str, "ld"));
}
}