Wrap each matching word with quotationmarks - regex

I have several lines in Notepad++ which looks similar to this
A8s KQo QTs A9s A9s AJo AJo 99 KQo A5s
What I would like to do is to wrap each word in quotation marks, followed by a comma is possible.
I've tried matching against [A-Za-z\d]{2-3}, but I dont get any matches.
Desired result:
"A8s", "KQo", "QTs", //etc...

What nickb said is true, but you might want to consider adding word boundaries:
\b[A-Za-z0-9]{2,3}\b
Otherwise if your input had longer words, too like
A8s KQo ABCD 1234
You would get results like
"A8s" "KQo" "ABC"D "123"4
The word boundary makes sure that you can only match entire words.

Because in quantifiers, you need a comma, not a dash:
[A-Za-z\d]{2,3}
^
Otherwise, you were literally matching the characters {2-3}, so your current regex would match things like:
A{2-3}
You probably want to wrap this in a capturing group, like this:
([A-Za-z\d]{2,3})
And then replace it with a reference to what was captured, but surrounded by quotes, similar to this:
"$1",

Related

preserve all group in regexp

I have question regarding regexp, I have text like this
embedded-software-entwickler
basically I want to replace the - with something else but preserving the group so I can easily do $1#$2#$3 with # as replacement of -
my current regexp is like this ([a-zäöüß]+)(-) but this one will not hit the third word which is entiwckler
How about something simple like this:
([\w]*?)-([\w]*?)-([\w]*)
Replace with:
$1#$2#$3
What we did here is basically we started looking for any available character using \w and using the lazy sign *? at the beginning and the greedy sign * at the end to match each group, and separated each section with -.
If you would like to include spaces, numbers, special characters, etc. in each section, you can use something like this:
([\s\S]*?)-([\s\S]*?)-([\s\S]*)
If you prefer something dynamic, you could try something like this:
([^\-]+)-
Replace with:
$1#
Demo: https://regex101.com/r/p6zQTO/1/
Alternative way to mach each group plus the replacement:
([^\-]*)-([^\-]*)
Replace with:
$1#$2
Demo: https://regex101.com/r/p6zQTO/2/
If your need is simply to change all '-' into '#', trying a tr/-/#/m would produce simpler and better substitution.
If you need to group and extract for other purposes, then try something like /(\w+)(?:-(\w+))*/
(?:groups but don't extract)

How do you "quantify" a variable number of lines using a regexp?

Say you know the starting and ending lines of some section of text, but the chars in some lines and the number of lines between the starting and ending lines are variable, á la:
aaa
bbbb
cc
...
...
...
xx
yyy
Z
What quantifier do you use, something like:
aaa\nbbbb\ncc\n(.*\n)+xx\nyyy\nZ\n
to parse those sections of text as a group?
You can use the s flag to match multilines texts, you can do it like:
~\w+ ~s.
There is a similar question here:
Javascript regex multiline flag doesn't work
If I understood correctly, you know that your text begins with aaa\nbbbb\ncc and ends with xx\nyyy\nZ\n. You could use aaa.+?bbbb.+?cc(.+?)xx.+?yyy.+?Z so that all operators are not greedy and you don't accidentally capture two groups at once. The text inbetween these groups would be in match group 1. You also need to turn the setting that causes dot to match new line on.
Try this:
aaa( |\n)bbbb( |\n)cc( |\n)( |\n){0,1}(.|\n)*xx( |\n)yyy( |\n)Z
( |\n) matches a space or a newline (so your starting and ending phrases can be split into different lines)
RegExr
At the end of the day what worked for me using Kate was:
( )+aaa\n( )+bbbb\n( )+cc\n(.|\n)*( )+xx\n( )+yyy\n( )+Z\n
using such regexps you can clear pages of quite a bit of junk.

I can find the expressions but I don't want to replace everything of it (regex in vim)

I tried for some time now but I can't figure it out: I have text that looks like that:
xxx "fy1":
xxx.xxx = fy;
xxx;
xxx "tm1":
xxx.xxx = tm;
xxx;
...
And I want it to look like that:
xxx "fy1":
xxx.xxx = fy1;
xxx;
...
My problem is that I can find all occurences, where I want to put a "1" with
s/[ft][ymdhi];/???/g
but everything I put at the place of the question marks replaces the letters too. The only thing I want to do is put a number after those two letters.
I thought about something with
\w{2}
in the search string too, but that finds everything with two letters before a semicolon, so I think I need the
[ft][ymdhi]
Thanks in advance!
You can make capture-groups so that in replacement reference the matched text. Or you can use vim's \zs to leave some matched text untouched, for example
this line does the job:
%s/[ft][ymdhi]\zs;/1;/g
:h \zs and :h \ze for details
If add \ze to this example too, it would be:
%s/[ft][ymdhi]\zs\ze;/1/g
It works too.
You want to use \(...\) along with \1:
:%s/\([ft][ymdhi]\);/\11;/
The escaped paranthes \(...\) sort of store what was matched between the paranthesis. The stored text can later be used again with \1.
In your case the \11 has nothing to to with eleven, it's just a coincidence that the text you remembered with the paranthesis is followed by a 1.

Regex validation of filename failing

I'm trying to validate a filename having letters "CAT" or "DOG" followed by 8 numerics, and ending in ".TXT".
Examples:
CAT20000101.TXT
DOG20031212.TXT
This would NOT match:
ATA12330000.TXT
CAT200T0101.TXT
DOG20031212.TX1
Here's the regex I am trying to make work:
(([A-Z]{3})([0-9]{8})([\.TXT]))\w+
Why is the last section (.TXT) failing against non-matching file extensions?
See example: http://regexr.com/3a7fo
Inside character class there is no regex grouping hence [\.TXT] is not right.
You can use this regex:
^[A-Z]{3}[0-9]{8}\.TXT$
For only matching CAT and DOG use:
^(CAT|DOG)[0-9]{8}\.TXT$
lose the unnecessary parentheses
[A-Z]{3}[0-9]{8}[\.TXT]\w+
lose the unnecessary/pattern-breaking character class [] around \.TXT
[A-Z]{3}[0-9]{8}\.TXT\w+
lose the \w+ at the end
[A-Z]{3}[0-9]{8}\.TXT
change [A-Z]{3} to (?:CAT|DOG).
(?:CAT|DOG)[0-9]{8}\.TXT
voilà.
It's failing because \.TXT is in square brackets, which matches only one of those four characters. Just use (\.TXT).
remove square brackets around [.TXT] to .TXT
Your example modified http://regexr.com/3a7fu

regex for match inside a word

Say I have following similar texts:
_startOneEnd
_startTwoEnd
_startThreeEnd
I want to match on:
begins with _start
ends with End
and I want capture the bit in-between, e.g., One, Two, Three in the variable above:
Can anyone suggest a regex to capture this?
If each line of input contains only the text similar to your examples, something like this should work:
/^_start(.*)End$/
The ^ anchors the pattern to the start of the string. The $ anchors it to the end of the string. The parenthesis capture the middle part.
In C#, you may use this:
(?<=_start).*(?=End)
It isn't clear if the part in the middle may only be the examples given.
If so, use this:
_start((One)|(Two)|(Three))End
If not, is it can be anything, try this:
_start(.*?)End
Note that the match is non-greedy.