Regular expression to match last line break in file - regex

In my quest to learn flex I'm having a scanner echo input adding line numbers.
After every line I display a counter and increment it.
Trouble is there is always a lone line number at the end of the display.
I need a regex that will ignore all line breaks except for the last one.
I tried [\n/<<EOF>>] to no avail.
Any thoughts?

I don't know what regex engine uses Flex but you can use this regex:
\z
Working demo
\z assert position at the very end of the string.
Matches the end of a string only. Unlike $, this is not affected by
multiline mode, and, in contrast to \Z, will not match before a
trailing newline at the end of a string.
If above regex doesn't work then you can use this one:
(?<=[\S\s])$
Working demo
Edit: since flex seems to work slightly different than other regex engines you could use this regex:
[\s\S]$
To get the latest character of each line. Then you can iterated over all lines until get the last one. Here you have an online flex regex engine tool:
http://ryanswanson.com/regexp/#start

Try below regex, It will search for a new line character at the end of the line.
\n$

Have you tried simply doing:
\n$
Debuggex Demo
The \n matches the newline, the $ matches end of string.

Related

Regex to change all past a certain pattern to Uppercase

I have an xml file that has a value like
JOBNAME="JBDSR14353_Some_other_Descriptor"
I am looking for an expression that will go through the file and change all of the characters in the quotes to Uppercase letters. Is there a Regex expression that will search for JOBNAME="Anything within the quotes" and change them to uppercase? Or a command that will find JOBNAME= and change all on that line to uppercase letters? I know that can just do a search for JOBNAME= and then use a VU command in vim to throw the line to uppercase store that to a macro and run that, but I was wondering if there was a way to get this done with a regex??
Here's an alternative with :substitute, as you had originally intended. This works better than #Zach's solution with gU_ when there's other text in the line:
:%s/JOBNAME="[^"]\+"/\U&/g
"[^"]\+" matches the quoted text (non-greedily by matching only non-quotes inside, to handle multiple quotes in a line)
\U turns the remainder of the replacement uppercase
for simplicity, the entire match (&) is uppercased here, but one could have also used capture groups (\(...\)), or match limiting with \zs
You can use the :g command which executes a command on lines that match a pattern:
:g/JOBNAME/norm! gU_
This will execute the gU_, which capitalizes all letters on a line, on all the lines that match JOBNAME
If there are other things on the same line that you don't want to capitalize, here is a solution for only the words in quotes:
:g/JOBNAME/norm! f"gU;
f" goes to the next quote. gU capitalizes with a motion. The motion used is ; which searches for the next " (repeats the last f command).
To do this with substitution you can use the \U atom which makes everything after it uppercase.
:%s/JOBNAME="\zs.*\ze"/\U&
\zs and \ze mark the start and end of the match and & is the whole match. This means that only the part between quotes is replaced.

regex match only specific file lines

I have a file containing lines like:
13
13-55
some text 11
I want to create a regex to match only first to type of lines, but not the last one.
Reges created by me is [0-9\-]+
You have to specify that you are testing from the beggining to the end of the string:
Try with following regex:
^[0-9-]+$
Try using anchors (^ and $ to denote beginning and end of string respectively) and use the multiline option (this one depends on the language/engine/environment of the regex).
^[0-9-]+$
Note, you can drop the backslash for the - if it's at the beginning or end of a character class.
If you want to match lines which start with number.
^[0-9-]+$

Regex to match only the first line?

Is it possible to make a regex match only the first line of a text? So if I have the text:
This is the first line.
This is the second line.
...
It would match "This is the first line.", whatever the first line is.
that's sounds more like a job for the filehandle buffer.
You should be able to match the first line with:
/^(.*)$/m
(as always, this is PCRE syntax)
the /m modifier makes ^ and $ match embedded newlines. Since there's no /g modifier, it will just process the first occurrence, which is the first line, and then stop.
If you're using a shell, use:
head -n1 file
or as a filter:
commandmakingoutput | head -n1
Please clarify your question, in case this is not wat you're looking for.
In case you need the very first line no matter what, here you go:
\A.*
It will select the first line, no matter what.
Yes, you can.
Example in javascript:
"This is the first line.\n This is the second line.".match(/^.*$/m)[0];
Returns
"This is the first line."
EDIT
Explain regex:
match(/^.*$/m)[0]
^: begin of line
.*: any char (.), 0 or more times (*)
$: end of line.
m: multiline mode (. acts like a \n too)
[0]: get first position of array of results
There is also negative lookbehind function (PowerGREP or Perl flavor). It works perfectly for my purposes.
Regex:
(?<!\s+)^(.+)$
where
(?<!\s+) is negative lookbehind - regex matches only strings that
are not preceded by a whitespace(s) (\s also stands for a line break)
^ is start of a string
(.+) is a string
$ is end of string

How can I match at the beginning of any line, including the first, with a Perl regex?

According the Perl documentation on regexes:
By default, the "^" character is guaranteed to match only the beginning of the string ... Embedded newlines will not be matched by "^" ... You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any newline within the string ... you can do this by using the /m modifier on the pattern match operator.
The "after any newline" part means that it will only match at the beginning of the 2nd and subsequent lines. What if I want to match at the beginning of any line (1st, 2nd, etc.)?
EDIT: OK, it seems that the file has BOM information (3 chars) at the beginning and that's what's messing me up. Any way to get ^ to match anyway?
EDIT: So in the end it works (as long as there's no BOM), but now it seems that the Perl documentation is wrong, since it says "after any newline"
The ^ does match the 1st line with the /m flag:
~:1932$ perl -e '$a="12\n23\n34";$a=~s/^/:/gm;print $a'
:12
:23
:34
To match with BOM you need to include it in the match.
~:1939$ perl -e '$a="12\n23\n34";$a=~s/^(\d)/<\1>:/mg;print $a'
12
<2>:3
<3>:4
~:1940$ perl -e '$a="12\n23\n34";$a=~s/^(?:)?(\d)/<\1>:/mg;print $a'
<1>:2
<2>:3
<3>:4
You can use the /^(?:\xEF\xBB\xBF)?/mg regex to match at the beginning of the line anyway, if you want to preserve the BOM.
Conceptually, there's assumed to be a newline before the beginning of the string. Consequently, /^a/ will find a letter 'a' at the beginning of a string.
Put a empty line at the beginning of the file, this cool things down, and avoid to make regex hard to read.
Yes, the BOM. It might appear at the beginning of the file, so put an empty at the beginning of the file. The BOM will not be \s, or something can be seen by bare eye. It kills my hours when a BOM make my regex fail.

what can be the regex for the following string

I am doing this in groovy.
Input:
hip_abc_batch hip_ndnh_4_abc_copy_from_stgig abc_copy_from_stgig
hiv_daiv_batch hip_a_de_copy_from_staging abc_a_de_copy_from_staging
I want to get the last column. basically anything that starts with abc_.
I tried the following regex (works for second line but not second.
\abc_.*\
but that gives me everything after abc_batch
I am looking for a regex that will fetch me anything that starts with abc_
but I can not use \^abc_.*\ since the whole string does not start with abc_
It sounds like you're looking for "words" (i.e., sequences that don't include spaces) that begin with abc_. You might try:
/\babc_.*\b/
The \b means (in some regular expression flavors) "word boundary."
Try this:
/\s(abc_.*)$/m
Here is a commented version so you can understand how it works:
\s # match one whitepace character
(abc_.*) # capture a string that starts with "abc_" and is followed
# by any character zero or more times
$ # match the end of the string
Since the regular expression has the "m" switch it will be a multi-line expression. This allows the $ to match the end of each line rather than the end of the entire string itself.
You don't need to trim the whitespace as the second capture group contains just the text. After a cursory scan of this tutorial I believe this is the way to grab the value of a capture group using Groovy:
matcher = (yourString =~ /\s(abc_.*)$/m)
// this is how you would extract the value from
// the matcher object
matcher[0][1]
I think you are looking for this: \s(abc_[a-zA-Z_]*)$
If you are using perl and you read all lines into one string, don't forget to set the the m option on your regex (that stands for "Treat string as multiple lines").
Oh, and Regex Coach is your free friend.