Trim end of string - regex

I'm having trouble trimming off some characters at the end of a string. The string usually looks like:
C:\blah1\blah2
But sometimes it looks like:
C:\blah1\blah2.extra
I need to extract out the string 'blah2'. Most of the time, that's easy with a substring command. But on the rare occasions when the '.extra' portion is present, I need to first trim that part off.
The thing is, '.extra' always begins with a dot, but then is followed by various combinations of letters with various lengths. So wildcards will be necessary. Essentially, I need to script, "If the string contains a dot, trim off the dot and anything following it."
$string.replace(".*","") doesn't work. Nor does $string.replace(".\*",""). Nor does $string.replace(".[A-Z]","").
Also, I can't get at it from the beginning of the string either. 'blah1' is unknown and of various lengths. I have to get at 'blah2' from the end of the string.

Assuming that the string is always a path to a file with or without an extension (such as ".extra"), you can use Path.GetFileNameWithoutExtension():
PS C:\> [System.IO.Path]::GetFileNameWithoutExtension("C:\blah1\blah2")
blah2
PS C:\> [System.IO.Path]::GetFileNameWithoutExtension("C:\blah1\blah2.extra")
blah2
The path doesn't even have to be rooted:
PS C:\> [System.IO.Path]::GetFileNameWithoutExtension("blah1\blah2.extra")
blah2
If you want to implement similar functionality on your own, that should be fairly simply as well - use String.LastIndexOf() to find the last \ in the string and use that as your starting argument for Substring():
function Extract-Name {
param($NameString)
# Extract part after the last occurrence of \
if($NameString -like '*\*') {
$NameString = $NameString.Substring($NameString.LastIndexOf('\') + 1)
}
# Remove anything after a potential .
if($NameString -like '*.*') {
$NameString.Remove($NameString.IndexOf("."))
}
$NameString
}
And you'll see similar results:
PS C:\> Extract-Name "C:\blah1\blah2.extra"
blah2
PS C:\> Extract-Name "C:\blah124323\blah2.extra"
blah2
PS C:\> Extract-Name "C:\blah124323\blah2"
blah2
PS C:\> Extract-Name "abc124323\blah2"
blah2

As the other posters have said, you can use special file name manipulators for this. If you'd like to do it with regular expressions, you can say
$string.replace("\..*","")
The \..* regex matches a dot (\.) and then any string of characters (.*).
Let me address each of the non-working regexes individually:
$string.replace(".*","")
The reason this doesn't work is that . and * are both special characters in regular expressions: . is a wildcard character that matches any character, and * means "match the previous character zero or more times." So .* means "any string of characters."
$string.replace(".\*","")
In this instance, you're escaping the * character, meaning that the regex treats it literally, so the regex matches any single character (.) followed by a star (\*).
$string.replace(".[A-Z]","")
In this case, the regex will match any character (.) followed by any single capital letter ([A-Z]).

If the strings are actual paths using Get-Item would be another option:
$path = 'C:\blah1\blah2.something'
(Get-Item $path).BaseName
The Replace() method can't be used here, because it doesn't support wildcards or regular expressions.

Related

Replace a sequence of characters with a sequence of different characters of same length using regular expressions

I have a string which starts with spaces. I want to replace the leading spaces with equal number of dashes -. I don't want to replace any other spaces which may occur elsewhere in the string.
If I use /^\s*/-/, it only replaces with a single dash. If I use /^\s/-/, it only replaces the first space with a dash. If I remove the anchor /\s/-/, it replaces every occurences of space in the string which is not acceptable.
My string looks like this in general:
<n-leading-spaces><a-non-space-character><remaining-characters>
Example (pipes added to show the boundary):
| ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn |
After substitution (pipes added to show the boundary):
|---ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn |
NOTE: I cannot use any code snippet. I just want to know whether this can be done using just regex patterns. (Forgive my formatting as I'm new to markdown. I welcome formatting corrections)
You can use the following solution to replace a sequence of characters with a sequence of different characters of same length using regular expressions:
my $string = ' ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn ';
$string =~ s/^(\s+)/"-" x length($1)/eg;
print $string;
Returns '----ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn '

PowerShell Regex - word with wildcards and commas

Trying to do a replace on what I understand to be a simple operation but hitting a wall.
I can replace a word with a comma on the end:
$firstval = 'ssonp,RDPNP,LanmanWorkstation,webclient,MfeEpePcNP,PRNetworkProvider'
($firstval) -replace 'webclient+,',''
ssonp,RDPNP,LanmanWorkstation,MfeEpePcNP,PRNetworkProvider
But haven't been able to work out how to add a wildcard in the word, or how I'd have multiple words with wildcards proceeded by a comma, e.g.:
w* client+,* fee*, etc
(spaces added to stop being interpreted as formatting within the question)
Played with a few permeations and attempted to use examples from other questions without any luck.
The -replace operator takes a regular expression as its first parameter. You seem to be confusing wildcards and regular expressions. Your pattern w*client+,*fee*,, though a valid regular expression, seems to be intended to use wildcards.
The regular expression equivalent of the * wildcard is .*, where . means "any character" and * means "0 or more occurrences". Thus, the regular expression equivalent of w*client, would be w.*client,, and, similarly the regular expression equivalent of *fee*, would be .*fee.*,. Since the string to be searched has comma-separated values, however, we don't want our patterns to include "any character" (.*) but rather "any character but comma" ([^,]*). Therefore, the patterns to use become w[^,]*client, and [^,]*fee[^,]*,, respectively.
To search for both words in a string, separate the two patterns with |. The following builds such a pattern and tests it against strings with a match in various locations:
# Match w*client or *fee*
$wordPattern = 'w[^,]*client|[^,]*fee[^,]*';
# Match $wordPattern and at most one comma before or after
$wordWithAdjacentCommaPattern = '({0}),?|,({0})$' -f $wordPattern;
"`$wordWithAdjacentCommaPattern: $wordWithAdjacentCommaPattern";
# Replace single value
'webclient', `
# Replace first value
'webclient,middle,last', `
# Replace middle value
'first,webclient,last', `
# Replace last value
'first,middle,webclient' `
| ForEach-Object -Process { '"{0}" => "{1}"' -f $_, ($_ -replace $wordWithAdjacentCommaPattern); };
This outputs the following:
$wordWithAdjacentCommaPattern: (w[^,]*client|[^,]*fee[^,]*),?|,(w[^,]*client|[^,]*fee[^,]*)$
"webclient" => ""
"webclient,middle,last" => "middle,last"
"first,webclient,last" => "first,last"
"first,middle,webclient" => "first,middle"
A non-regex alternative you might consider would be to split your input string into individual values, filter out values that match certain wildcards, and reassemble what's left into comma-separated values:
(
'ssonp,RDPNP,LanmanWorkstation,webclient,MfeEpePcNP,PRNetworkProvider' -split ',', -1, 'SimpleMatch' `
| Where-Object { $_ -notlike 'w*client' -and $_ -notlike '*fee*'; } `
) -join ',';
By the way, you used the regular expression webclient+, to match and remove the text webclient, from your string (looks like the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkProvider\Order\ProviderOrder registry value). Just a note that, with the +, that will search for the literal text webclien followed by 1 or more occurrences of t followed by the literal text ,. Thus, that will match webclientt,, webclienttt,, webclientttttttttt,, etc. as well webclient,. If you are only interested in matching webclient, then you can just use the pattern webclient, (no +).

Evaluation Search and Replace in Perl

I have a file formatted like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9558 9629 gene
locus_tag CeraR_t011
gene trnR-UCU
11296 9773 CDS
locus_tag CeraR_p012
gene atpA
product ATP synthase CF1 alpha subunit
transl_except (pos:complement(10268..10270), aa:Q)
transl_except (pos:complement(11192..11194), aa:Q)
transl_except (pos:complement(13267..13269), aa:M)
11296 9773 gene
locus_tag CeraR_p012
gene atpA
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I need to add 809 to both of the values following pos:complement in each instance. I have been attempting with the search and replace modifier as so:
$line =~ s!complement((\d+)..(\d+)!complement(($1+809)..($2+809)!eg
however, the ( after complement is always interpreted as part of an evaluation rather than simply a character. I have tried every combination of backslashes, apostrophes, and quotes to make it just a character but nothing seems to work.
Any advice would be appreciated
Since the replacement string is evaluated, you must use a quoted string and concatenations:
$line =~ s/complement\(\K(\d+)..(\d+)/($1+809) . '..' . ($2+809)/eg;
Note: since \K removes all on the left from the match result, you don't need to rewrite all the begining of the match in the replacement string.

Regular expression extract filename from line content

I'm very new to regular expression. I want to extract the following string
"109_Admin_RegistrationResponse_20130103.txt"
from this file content, the contents is selected per line:
01-10-13 10:44AM 47 107_Admin_RegistrationDetail_20130111.txt
01-10-13 10:40AM 11 107_Admin_RegistrationResponse_20130111.txt
The regular expression should not pick the second line, only the first line should return a true.
Your Regex has a lot of different mistakes...
Your line does not start with your required filename but you put an ^ there
missing + in your character group [a-zA-Z], hence only able to match a single character
does not include _ in your character group, hence it won't match Admin_RegistrationResponse
missing \ and d{2} would match dd only.
As per M42's answer (which I left out), you also need to escape your dot . too, or it would match 123_abc_12345678atxt too (notice the a before txt)
Your regex should be
\d+_[a-zA-Z_]+_\d{4}\d{2}\d{2}\.txt$
which can be simplified as
\d+_[a-zA-Z_]+_\d{8}\.txt$
as \d{2}\d{2} really look redundant -- unless you want to do with capturing groups, then you would do:
\d+_[a-zA-Z_]+_(\d{4})(\d{2})(\d{2})\.txt$
Remove the anchors and escape the dot:
\d+[a-zA-Z_]+\d{8}\.txt
I'm a newbie in php but i think you can use explode() function in php or any equivalent in your language.
$string = "01-09-13 10:17AM 11 109_Admin_RegistrationResponse_20130103.txt";
$pieces = explode("_", $string);
$stringout = "";
foreach($i = 0;$i<count($pieces);i++){
$stringout = $stringout.$pieces[$i];
}

Replace patterns that are inside delimiters using a regular expression call

I need to clip out all the occurances of the pattern '--' that are inside single quotes in long string (leaving intact the ones that are outside single quotes).
Is there a RegEx way of doing this?
(using it with an iterator from the language is OK).
For example, starting with
"xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
I should end up with:
"xxxx rt / $ 'dfdffggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g 'ggh' vcbcvb"
So I am looking for a regex that could be run from the following languages as shown:
+-------------+------------------------------------------+
| Language | RegEx |
+-------------+------------------------------------------+
| JavaScript | input.replace(/someregex/g, "") |
| PHP | preg_replace('/someregex/', "", input) |
| Python | re.sub(r'someregex', "", input) |
| Ruby | input.gsub(/someregex/, "") |
+-------------+------------------------------------------+
I found another way to do this from an answer by Greg Hewgill at Qn138522
It is based on using this regex (adapted to contain the pattern I was looking for):
--(?=[^\']*'([^']|'[^']*')*$)
Greg explains:
"What this does is use the non-capturing match (?=...) to check that the character x is within a quoted string. It looks for some nonquote characters up to the next quote, then looks for a sequence of either single characters or quoted groups of characters, until the end of the string. This relies on your assumption that the quotes are always balanced. This is also not very efficient."
The usage examples would be :
JavaScript: input.replace(/--(?=[^']*'([^']|'[^']*')*$)/g, "")
PHP: preg_replace('/--(?=[^\']*'([^']|'[^']*')*$)/', "", input)
Python: re.sub(r'--(?=[^\']*'([^']|'[^']*')*$)', "", input)
Ruby: input.gsub(/--(?=[^\']*'([^']|'[^']*')*$)/, "")
I have tested this for Ruby and it provides the desired result.
This cannot be done with regular expressions, because you need to maintain state on whether you're inside single quotes or outside, and regex is inherently stateless. (Also, as far as I understand, single quotes can be escaped without terminating the "inside" region).
Your best bet is to iterate through the string character by character, keeping a boolean flag on whether or not you're inside a quoted region - and remove the --'s that way.
If bending the rules a little is allowed, this could work:
import re
p = re.compile(r"((?:^[^']*')?[^']*?(?:'[^']*'[^']*?)*?)(-{2,})")
txt = "xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
print re.sub(p, r'\1-', txt)
Output:
xxxx rt / $ 'dfdf-fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '-ggh-' vcbcvb
The regex:
( # Group 1
(?:^[^']*')? # Start of string, up till the first single quote
[^']*? # Inside the single quotes, as few characters as possible
(?:
'[^']*' # No double dashes inside theses single quotes, jump to the next.
[^']*?
)*? # as few as possible
)
(-{2,}) # The dashes themselves (Group 2)
If there where different delimiters for start and end, you could use something like this:
-{2,}(?=[^'`]*`)
Edit: I realized that if the string does not contain any quotes, it will match all double dashes in the string. One way of fixing it would be to change
(?:^[^']*')?
in the beginning to
(?:^[^']*'|(?!^))
Updated regex:
((?:^[^']*'|(?!^))[^']*?(?:'[^']*'[^']*?)*?)(-{2,})
Hm. There might be a way in Python if there are no quoted apostrophes, given that there is the (?(id/name)yes-pattern|no-pattern) construct in regular expressions, but it goes way over my head currently.
Does this help?
def remove_double_dashes_in_apostrophes(text):
return "'".join(
part.replace("--", "") if (ix&1) else part
for ix, part in enumerate(text.split("'")))
Seems to work for me. What it does, is split the input text to parts on apostrophes, and replace the "--" only when the part is odd-numbered (i.e. there has been an odd number of apostrophes before the part). Note about "odd numbered": part numbering starts from zero!
You can use the following sed script, I believe:
:again
s/'\(.*\)--\(.*\)'/'\1\2'/g
t again
Store that in a file (rmdashdash.sed) and do whatever exec magic in your scripting language allows you to do the following shell equivalent:
sed -f rmdotdot.sed < file containing your input data
What the script does is:
:again <-- just a label
s/'\(.*\)--\(.*\)'/'\1\2'/g
substitute, for the pattern ' followed by anything followed by -- followed by anything followed by ', just the two anythings within quotes.
t again <-- feed the resulting string back into sed again.
Note that this script will convert '----' into '', since it is a sequence of two --'s within quotes. However, '---' will be converted into '-'.
Ain't no school like old school.