Regex Help matching quotes - regex

I havin a reg ex problemm i would like to have a reg ex that will match the '\nGO at the end of my file(see below.) I have got the following so far:
^\'*GO
but its match the quote sysbol?
EOF:
WHERE (dbo.Property.Archived <> 1)
'
GO

In Perl \Z matches the end of the string, totally ignoring line breaks. Use this to match GO on the last line of a file if the file is loaded into a string:
^GO\Z
POSIX regex uses \' instead of \Z.
To match exactly the newline and then the word GO in your example, you want this:
\nGO
You can also do this:
\n.*GO
This last regular expression will match what you want in your example, but the .* part will make it so there can be anything (or nothing) in between the newline and GO.

Related

How do I use regex to match up to a pattern, if present, and otherwise to the end of the string?

I am parsing the command line portion of a string from a log, which occasionally appears at the end of the string.
This question In regex, match either the end of the string or a specific character does not help me, because when I try adding |$ to the end of the non-capture group like this
(?:CommandLine|Process Command Line):\s?([\S\s]+)(?:CurrentDirectory|\s+Token|$)
it includes CurrentDirectory following the command line as part of the match, which I do not want.
The string could look like this:
string1 = 'OriginalFileName: PowerShell.EXE CommandLine: powershell -Enc QQBkAGQALQBUAHkAcABlACAALQBBAHMAcwBlAG0AYgBsAHkATgBhAG0AZQAgAFMAeQBzAHQAZQBtAC4AUwBlAGMAdQByA CurrentDirectory: C:\Windows\system32\'
In this case I can extract the portion
'QQBkAGQALQBUAHkAcABlACAALQBBAHMAcwBlAG0AYgBsAHkATgBhAG0AZQAgAFMAeQBzAHQAZQBtAC4AUwBlAGMAdQ`'
using this pattern
'(?:CommandLine|Process Command Line):\s?([\S\s]+)(?=CurrentDirectory|\s+Token)'
but this fails in cases like this that end with a \n:
string2 = 'CommandLine: powershell -Enc QQBkAGQALQBUAHkAcABlACAALQBBAHMAcwBlAG0AYgBsAHkATgBhAG0AZQAgAFMAeQBzAHQAZQBtAC4AUwBlAGMAdQBHQAeQBzAG\n'
or like this, where the part I want to extract is at the end of the string:
string3 = 'CommandLine: powershell -Enc QQBkAGQALQBUAHkAcABlACAALQBBAHMAcwBlAG0AYgBsAHkATgBhAG0AZQAgAFMAeQBzAHQAZQBtAC4AUwBlAGMAdQByAGkAdAB5ADsACgBmAHUAb'
How can I match up until CurrentDirectory|\s+Token if either one exists, but otherwise go to end of the string?
Here is an example of my pattern matching most of the command line portions of the string, but not the last one at the end of the string: regex101.com/r/kETNpo/1
You may use this regex for all of your matches:
(?:CommandLine|Process Command Line):\s?([\S\s]+?)(?:\bCurrentDirectory|\s+Token|\Z)
RegEx Demo
Important changes from your original regex are:
[\S\s]+?: To match this match non-greedy
(?:\bCurrentDirectory|\s+Token|\Z): Non-capturing group to match full word CurrentDirectory or Token after 1+ whitespaces or else match end of input.

Why does `perl -pe 's/$/\n/g'` add 2 blank lines?

I'm working through the one liner book and came across
perl -pe 's/$/\n/' file
which inserts a blank line after each line by setting the end of the line to new line thus adding a new line to the existing newline resulting in a blank line.
As this is the first example without g at the end of the pattern, I tried
perl -pe 's/$/\n/g' file
this results in 2 blank lines between lines.
I would have expected no difference since there is only one $ per line so replacing all of them should be the same as replacing just the first one.
What's going on here?
/$/ matches the “end of string”. This might be
the end of string (like /\z/),
or just before a newline before the end of string (like /(?=\n\z)/).
(Additionally, /$/m matches the “end of line”. This might be
the end of string,
or just before a newline (like /(?=\n)/).
).
With your substitution /$/\n/g, the regex matches twice: once before the newline, then again at the end of string:
The first match is before the newline:
"foo\n"
# ^ match
A newline is placed before the current match end:
"foo\n\n"
# ^ insert before
The next match is at the end of string:
"foo\n\n"
# ^ match
A newline is inserted before the current match end:
"foo\n\n\n"
# ^ insert before
No further match is found.
The solution: if $ is to DWIMmy for you, always match \z or \n explicitly, possibly together with lookaheads like (?=\n). Consider matching all Unicode line separators \R instead of just \n.
This isn't a sound understanding of the situation. $ is a badly-defined and unintuitive metacharacter
It is a zero-width match
It will match before a newline character at the end of the bound string
It will match at the end of the bound string
With the /m modifier in place, it will also match before any newline character anywhere, but not immediately after it unless it is the last character of the string
\z is much more useful: it only ever matches at the end of the string
"by setting the end of the line to new line"
Mentioning "lines" at all is misleading, and you should be careful to explain in comments what meaning you're applying. If you have
my $s = "xxx\n"
then
say pos($s) while $s =~ /$/g
will produce
3
4
i.e. both before and after the newline, because it happens to be at the end of the string
This is also why your s/$/\n/g adds two newlines: there are two zero-width matches for /$/ within this string, and a global substitution finds them and replaces them both with a newline, resulting in three newlines instead of the original one
It's unclear what you intended
Adding a newline to the end of a string, regardless of what's there already is s/\z/\n/ or just $s .= "\n"
If you want to ensure that, say, there are exactly two newlines at the end of a string, then just remove any existing linefeeds first with s/\n+\z/\b\n/
As you can see, \z is much more useful than $
And don't forget \R if you're dealing with cross-platform data. It will match any standard line terminator: any of CR, LF or CRLF
If this still leaves you with a problem then please ask again. I was going to write about zero-width matches but it's hard to know whether my answer is clear without it

\1 not defined in the RE

In my script, I'm in passing a markdown file and using sed, I'm trying to find lines that do not have one or more # and are not empty lines and then surround those lines with <p></p> tags
My reasoning:
^[^#]+ At beginning of line, find lines that do not begin with 1 or more #
.\+ Then find lines that contain one or more character (aka not empty lines)
Then replace the matched line with <p>\1</p>, where \1 represents the matched line.
However, I'm getting "\1 not defined in the RE". Is my reasoning above correct and how do I fix this error?
BODY=$(sed -E 's/^[^#]+.\+/<p>\1</p>/g' "$1")
Backslash followed by a number is replaced with the match for the Nth capture group in the regexp, but your regexp has no capture groups.
If you want to replace the entire match, use &:
BODY=$(sed -E 's%^[^#].*%<p>&</p>%' "$1")
You don't need to use .+ to find non-empty lines -- the fact that it has a character at the beginning that doesn't match # means it's not empty. And you don't need + after [^#] -- all you care is that the first character isn't #. You also don't need the g modifier when the regexp matches the entire line -- that's only needed to replace multiple matches per line.
And since your replacement string contains /, you need to either escape it or change the delimiter to some other character.

regular expression in sublime text 2 to match text

I have a string that looks like this:
lonfksa.newsvine.com
and I have tons of file that looks like this:
http://ricambi.ru/avtomobilnie-novosti/lexus-gotovit-k-debiutu-obnovlenniy-rx
http://www.kiwibox.com/hoytboar/blog/entry/121424391/modis-tshirt-tips-untuk-womens-clothing/
http://www.euro-rockradio.com/archives/category/interview
http://lonfksa.newsvine.com/_news/2014/04/18/23538711-vampir-romantis-clothing
http://www.fam-hinterseer.de/cgi-bin/info.php?a%5B%5D=%3Ca+href%3Dhttp%3A%2F%2Fwww.shopious.com%3Ecart+means+payment%3C%2Fa%3E
http://www.kiwibox.com/donniehihp/blog/entry/116146741/skin-care-beauty-makeup-tips-for-female/
http://www.kiwibox.com/karlagbr/blog/archive/2014/9/7/
I wanted to match the line that contains:
lonfksa.newsvine.com
and I tried the following regex but it doesn't work:
(?s)lonfksa.newsvine.com(?s)
what regex should I use to match the whole line that has this string?
You can make use of the multiline flag, and ^ and $ anchors that will match at the string start and string end repsectively:
(?m)^.*lonfksa\.newsvine\.com.*$
Mind that you need to escape a dot in regex to make it match a literal dot. Your regex (?s)lonfksa.newsvine.com(?s) contains unescaped dots that match any character (even a newline since you are using a singleline inline option (?s)). The final inline option (?s) is not necessary, it does not do anything.
Try this regex :
^.*lonfksa\.newsvine\.com.*\b
Demo

Regular expression to match last line break in file

In my quest to learn flex I'm having a scanner echo input adding line numbers.
After every line I display a counter and increment it.
Trouble is there is always a lone line number at the end of the display.
I need a regex that will ignore all line breaks except for the last one.
I tried [\n/<<EOF>>] to no avail.
Any thoughts?
I don't know what regex engine uses Flex but you can use this regex:
\z
Working demo
\z assert position at the very end of the string.
Matches the end of a string only. Unlike $, this is not affected by
multiline mode, and, in contrast to \Z, will not match before a
trailing newline at the end of a string.
If above regex doesn't work then you can use this one:
(?<=[\S\s])$
Working demo
Edit: since flex seems to work slightly different than other regex engines you could use this regex:
[\s\S]$
To get the latest character of each line. Then you can iterated over all lines until get the last one. Here you have an online flex regex engine tool:
http://ryanswanson.com/regexp/#start
Try below regex, It will search for a new line character at the end of the line.
\n$
Have you tried simply doing:
\n$
Debuggex Demo
The \n matches the newline, the $ matches end of string.