Match exactly two backslashes - regex

I'm trying to match exactly two \ characters (first ones encountered going from the left) in a string via Powershell regexp -replace command, to replace them with /. Doing \\{2} doesn't work, as it only matches two backslashes together.. I've tried \\.+?\\, but that matches the whole substring between them.
I'm new to regexp, and nothing I found on various sites has helped me with this issue. And I know I can do that with a for loop that runs twice, but I'd first like to know if it could be done with regexp better.
EDIT: I'm looking to do something like this:
IN: \aaa\bbb(d\c)
OUT: /aaa/bbb(d\c)

You may use
$s -replace '\\([^\\]+)\\','/$1/'
Here, \\([^\\]+)\\ matches a \, then matches and captures any 1+ chars other than \ into Group 1 (later access with $1 from the replacement pattern) and then matches \, and replaces the match with /, the value in Group 1 and /.
To only replace the first occurrence, you may use
$s -replace '(?s)\\([^\\]+)\\(.*)','/$1/$2'
where the trailing (.*) will capture the rest of the string (if any) into Group 2 and the $2 replacement backreference will paste that part of the string back into the result. (?s) will allow . to match line break chars that it does not match by default.

Related

Powershell Drop the last part of a string with multiple "."

I'm trying to do a regex expression in powershell to get only a specific part of a string. I know a way I can do this without regex but it can definitely be more efficient with. I have a string that looks like this:
Some/Stuff/Here/Then.drop.last
Ideally, I want to write a regex that gets me just:
Then.Drop
PS> 'Some/Stuff/Here/Then.drop.last' -replace '.*/(.+)\..*', '$1'
Then.drop
.*/ greedily matches everything up to the last /
(.+)\. greedily matches everything up to the last literal . and captures everything before that . in the first capture group ($1) - which is your string of interest.
.* matches the remaining part of the string.
Using $1 as the replacement string then replaces the overall match - the entire input string - with what the first capture group matched.
For more information about PowerShell's -replace operator, see this answer.

Extract certain part of a string in Perl

I have the following Perl strings. The lengths and the patterns are different. The file is always named *log.999
my $file1 = '/user/mike/desktop/sys/syslog.1';
my $file2 = '/user/mike/desktop/movie/dnslog.2';
my $file3 = '/haselog.3';
my $file4 = '/user/mike/desktop/movie/dns-sys.log'
I need to extract the words before log. In this case, sys, dns, hase and dns-sys.
How can I write a regular expression to extract them?
\w+(?=log\b)
matches one or more alphanumeric characters that are followed by log (but not logging etc.)
If the filename format is fixed, you can make the regex more reliable by using
\w+(?=log\.\d+\/$)
The main property of shown strings is that the *log* phrase is last.
Then anchor the pattern, so we wouldn't match a log somewhere in the middle
my ($name) = $string =~ /(\w+)log\.[0-9]+$/;
while if .N extension is optional
my ($name) = $string =~ /(\w+)log(?:\.[0-9]+)?$/;
The above uses the \w+ pattern to capture the text preceding log. But that text may also contain non-word characters (-, ., etc), in which case we would use [^/]+ to capture everything after the last /, as pointed out in Abigail's answer. With .N optional, per question in the comments
my ($name) = $string =~ m{ ([^/]+) log (?: \.[0-9]+ )? $}x;
where I added the }x modifier, with which spaces inside are ignored, what can aid readibility.
I use a set of delimiters other than / to be able to use / inside without escaping it, and then the m is compulsory. The [^...] is a negated character class, matching any character not listed inside. So [^/]+log matches all successive characters which are not /, coming before log.
The non capturing group (?: ... ) groups patterns inside, so that ? applies to the whole group, but doesn't needlessly capture them.
The (?:\.[0-9]+)? pattern was written specifically so to disallow things like log. (nothing after dot) and log5. But if these are acceptable, change it to the simpler \.?[0-9]*
Update Corrected a typo in code: for optional .N there is +, not *

Remove all characters after a certain match

I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.
You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO
I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.
(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5

How to grep for this pattern in Unix

I want to grep for this particular pattern. The pattern is as follows
**xMT123xMT123x**ABCxxxxxxxxxxxxxxxxxx_123_29887
inside the file test.txt which has the following data
NNN**xMT123xMT123x**ABCxxxxxxxxxxxxxxxxxx_123_29887_20140628.csv
I tried using grep "**xMT123xMT123x**ABCxxxxxxxxxxxxxxxxxx_123_29887" test.txt but it's not returning anything. Please advice
EDIT:
Hi, basically i'm inside a loop and only sometimes i get files with this pattern. So currently im putting like grep "$i" test.txt which works in all the cases except when I have to encounter such patterns.
And I'm actually grepping for the exact file_number, file sequence.So if it says 123_29887 it will be 123_29887. Thanks.
You could use:
grep -P "(?i)\*\*[a-z\d]+\*\*[a-z]+_\d+_\d+" somepath
(?i) turns on case-insensitive mode
\*\* matches the two opening stars
[a-z\d]+ matches letters and digits
\*\* matches two more stars
[a-z]+ matches letters
_\d+_\d+ matches underscore, digits, underscore, digits
If you need to be more specific (for instance, you know that a group of digits always has three digits), you can replace parts of the expression: for instance, \d+ becomes \d{3}
Matching a Literal but Yet Unknown Pattern: \Q and \E
If you receive literal patterns that you need to match, such as **xMT123xMT123x**ABCxxxxxxxxxxxxxxxxxx_123_29887, the issue is that special regex characters such as * need to be escaped. If the whole string is a literal, we do this by escaping the whole string between \Q and \E:
grep -P "\Q**xMT123xMT123x**ABCxxxxxxxxxxxxxxxxxx_123_29887\E" somepath
And in a loop, of course, you can build that regex programmatically by concatenating \Q and \E on both sides.

Regex Expression Not Matching

I have the following regular expression:
"\[(\d+)\].?\s+(\S+)\s+(\/+?)\\r\n"
I am pretty new to regex. I have this regexp and a string that I am trying to see if it matches or not. I believe it should match it but my program says it doesn't, and an online analyser says they do not match. I am pretty sure I am missing something small. Here is my string:
[1]+ Stopped sleep 60
However, when using this online tool to check for a match (and my program is saying they're not equal), why does the following expression not match the above regexp? Any ideas?
you appear to have escaped the \ prior to the \r resulting in it searching for the letter r
RegExp interpretation and allowed characters vary slightly with implementation, so you should give your execution context, but this is probably generic enough.
Decomposing your regexp gives
\[ - an open bracket character.
(\d+) - one or more digits; save this as capture group 1 ($1).
\] - a close bracket character.
.? - 0 or 1 character, of any kind
\s+ - 1 or more spaces.
(\S+) - 1 or more non-space characters; save this as $2
\s+ - 1 or more spaces
(\/+?) - 1 or more forward-slash characters, optional as $3
(not sure about this, this is an odd construct)
\\r\n" - an (incorrectly specified) end of line sequence, I think.
First of all, if you want to match the end of a line, use $, not \r\n. That should match the end of a line in most contexts. ^ matches the beginning of a line.
Second, I can't tell from your regexp what you are trying to capture after the "Stopped" word, so I'm going to assume you want the rest as one block, including internal spaces. A reg-exp basically the same as yours will do it.
"\[(\d+)\].?\s+(\S+)\s+(.+)\s*$"
This captures
$1 = 1,
$2 = Stopped
$3 = sleep 60
This is basically the same as yours except for the end, which grabs everything after "stopped" up to the end of the line as a single capture group, $3, except for leading and trailing blanks. If you want to do additional parsing, replace the (.+) as appropriate. Note that there must be at least 1 non-blank character after "stopped " for this to match. If you want it to match even if there is no string $3, use \s*(.*)\s*$ instead of \s+(.+)\s*$
Try to use this pattern:
\[\d+\]\+\s*\w+\s*\w+\s*\d+