regex match file with multiple extension - regex

I have several strings like this
XYZ_TEST_2017.txt
ASD_TEST_2017.txt.tmp
I need to extract only those strings ending with .txt
So I'm using this regex:
[A-Z]{3}_TEST_[0-9]{4}.txt
However I still get the strings with multiple extensions like the second one (.txt.tmp)
See my regex demo.
How can I handle it?

To have your regex match everything up to the end, append an "end-of-text marker" ($) to your pattern like this:
[A-Z]{3}_TEST_[0-9]{4}\.txt$
As you may have noticed, I also escaped the dot, otherwise this filename would match as well:
SOM_TEST_1234Etxt
The dot (.) would match any character (depending on your flags, even newline and carriage return), in this case, the E before txt.

Related

Is there regex to remove space and newline from xml input file

I would like to change an xml which is in format
<input>My
Input</input>
<input2>My
input2</input2>
to
<input>My Input</input>
<input2>My input2</input2>
The input xml file has more than 10000 records with xml in the above format which breaks the software to work properly.
Need a regex to fix it in one stroke.
I tried ('//n','') but it is not functioning as expected
If your regex flavor supports Lookbehinds, you may use something like this:
(?<!>)(\s)*[\r\n]+
..and replace with \1.
This will match any number of new-line characters, preceded by zero or more other whitespace characters and not preceded by the > character. Then, it will replace them with a whitespace character (if present) or nothing.
Demo.
If Lookbehind is not supported, you may use:
([^>])(\s)*[\r\n]+
..and replace with \1\2.

Regular expressions: inserting a word and NOT replacing the found key

I have a list of items, such as:
this_thing.ety
other-stuff.ety
34-pairings.ety
I want to do this:
"At the beginning of every line, insert "images/"
so the result of search/replace with reg exp would yield:
images/this_thing.ety
images/other-stuff.ety
images/34-pairings.ety
I am using:
^.
as my anchor to find the beginning of each line but everything I've tried to add "images/" has resulted in actually replacing that first character. I am using Notepad ++, but can use anything.
I thought using ${foo} was on the right track but I'm missing something here.
In a regex ^.is matching begin of line and a character. If you replace this by 'image', first character, which matched, will be replaced. Empty line wont have 'image' but stay identical (they don't match ^.)
Just use ^ as regexp for begin of line
. is the any character symbol, but can only account for one character. You will want to use ^..*$ or ^.+$ if your version of regex allows so that every line that contains at least one character will be fully replaced. With replace, it would look like this
s/^(.+)$/images\/\1/
where the \1 re-inserts the part in parenthesis in the regex. In older versions of regex, try
s/^^\(..*\)$/\1/

Regular expression matching space but at the end of line

I'm trying to replace multiple spaces with a single one, but at the start of the line.
Example:
___abc___def__
___ghi___jkl__
should turn to
___abc_def__
___ghi_jkl__
Note that I've replaced space with underscore
A simple search using the following pattern:
([^\s])\s+
matches the space at the end of the first line up to the space at the beginning of the next one.
So, if I replace with \1_, I get the following:
___abc_def_ghi_jkl
And that is absolutely not what I expect and regex engines, e.g., PowerGREP or the one in Visual Studio, don't behave that way.
If you want to match only horizontal spaces, use \h:
Find what: (?<=\S)\h+(?=\S)
Replace with: (a space)
There are several possible interpretations of the question. For each of them the replacement will be a single space character.
If spaces is plural and means space characters but not tabs then use
a find string of (^ {2,})|( {2,}$).
If spaces is plural and should includes tabs then use a find string
of (^[ \t]{2,})|([ \t]{2,}$).
If any leading or trailing spaces and tabs (one or more) is to be
replaced with a space then use a find string of (^[ \t]+)|([ \t]+$).
The general form of each of these is (^...)|(...$). The | means an alternation so either the preceding or the following bracketed expression can match. Hence the find what text can match either at the beginning or the end of a line. The ... varies depending on exactly what needs to be matched. Specifying [ \t] means only the two characters space and tab, whereas \s includes the line-end characters.
Ok, so the intention was to replace this:
Hey diddle diddle, \n<br/>
The Cat and the fiddle,\n
with this:
Hey diddle diddle,\n<br/>
The Cat and the fiddle,\n
A slightly modified version of Toto's answer did the trick:
(?<=\S)\h+(?=\S)|\s+$
finding any space(s) between word-characters and trailing space at the end of the line.

Regex to change all past a certain pattern to Uppercase

I have an xml file that has a value like
JOBNAME="JBDSR14353_Some_other_Descriptor"
I am looking for an expression that will go through the file and change all of the characters in the quotes to Uppercase letters. Is there a Regex expression that will search for JOBNAME="Anything within the quotes" and change them to uppercase? Or a command that will find JOBNAME= and change all on that line to uppercase letters? I know that can just do a search for JOBNAME= and then use a VU command in vim to throw the line to uppercase store that to a macro and run that, but I was wondering if there was a way to get this done with a regex??
Here's an alternative with :substitute, as you had originally intended. This works better than #Zach's solution with gU_ when there's other text in the line:
:%s/JOBNAME="[^"]\+"/\U&/g
"[^"]\+" matches the quoted text (non-greedily by matching only non-quotes inside, to handle multiple quotes in a line)
\U turns the remainder of the replacement uppercase
for simplicity, the entire match (&) is uppercased here, but one could have also used capture groups (\(...\)), or match limiting with \zs
You can use the :g command which executes a command on lines that match a pattern:
:g/JOBNAME/norm! gU_
This will execute the gU_, which capitalizes all letters on a line, on all the lines that match JOBNAME
If there are other things on the same line that you don't want to capitalize, here is a solution for only the words in quotes:
:g/JOBNAME/norm! f"gU;
f" goes to the next quote. gU capitalizes with a motion. The motion used is ; which searches for the next " (repeats the last f command).
To do this with substitution you can use the \U atom which makes everything after it uppercase.
:%s/JOBNAME="\zs.*\ze"/\U&
\zs and \ze mark the start and end of the match and & is the whole match. This means that only the part between quotes is replaced.

what can be the regex for the following string

I am doing this in groovy.
Input:
hip_abc_batch hip_ndnh_4_abc_copy_from_stgig abc_copy_from_stgig
hiv_daiv_batch hip_a_de_copy_from_staging abc_a_de_copy_from_staging
I want to get the last column. basically anything that starts with abc_.
I tried the following regex (works for second line but not second.
\abc_.*\
but that gives me everything after abc_batch
I am looking for a regex that will fetch me anything that starts with abc_
but I can not use \^abc_.*\ since the whole string does not start with abc_
It sounds like you're looking for "words" (i.e., sequences that don't include spaces) that begin with abc_. You might try:
/\babc_.*\b/
The \b means (in some regular expression flavors) "word boundary."
Try this:
/\s(abc_.*)$/m
Here is a commented version so you can understand how it works:
\s # match one whitepace character
(abc_.*) # capture a string that starts with "abc_" and is followed
# by any character zero or more times
$ # match the end of the string
Since the regular expression has the "m" switch it will be a multi-line expression. This allows the $ to match the end of each line rather than the end of the entire string itself.
You don't need to trim the whitespace as the second capture group contains just the text. After a cursory scan of this tutorial I believe this is the way to grab the value of a capture group using Groovy:
matcher = (yourString =~ /\s(abc_.*)$/m)
// this is how you would extract the value from
// the matcher object
matcher[0][1]
I think you are looking for this: \s(abc_[a-zA-Z_]*)$
If you are using perl and you read all lines into one string, don't forget to set the the m option on your regex (that stands for "Treat string as multiple lines").
Oh, and Regex Coach is your free friend.