How to select the first space only, not including other characters - regex

I want to fix some formatted strings with 'find and replace' in Visual Studio Code.
To do so, I have to select first spaces only in each line, not including characters.
The format goes like this :
dc932a17 3919734822 5234dce7debe.mp4
e_f943 4961243553 03be639fa8b7.mp4
9cbcc2 4365389628 e741018829d6.mp4
543419d 4639618462 d0bd72c9b737.mp4
Desired outputs look like :
dc932a17-3919734822 5234dce7debe.mp4
e_f943-4961243553 03be639fa8b7.mp4
9cbcc2-4365389628 e741018829d6.mp4
543419d-4639618462 d0bd72c9b737.mp4
So what I want to select are :
dc932a17 3919734822 5234dce7debe.mp4
|------|^These spaces
^Not these characters
So I made an regex like this :
^(?:([a-zA-Z0-9_]+))\s
But this selects all the characters before the first space including in it.
dc932a17 3919734822 5234dce7debe.mp4
|-------|
^Selected
Is there anything I got wrong?
condition:
The characters' length before spaces vary. I can't use alt+shift+Drag selection

You can use
Find: ^(\S+)\s+
Replace: $1-
See the regex demo.
Note: If there are any leading whitespace chars on the qualifyinf line, you need to add \s* after ^(, i.e. ^(\s*\S+)\s+.
Details:
^ - start of a line
(\s*\S+) - Group 1 ($1): zero or more whitespace chars (but not line break chars here since the pattern does not contain \r or \n) and then one or more non-whitespace chars
\s+ - one or more whitespaces (except line break chars).

Related

KVP extraction but allow the separator in the value

With a key value pair string that is separated by space character (just one I believe will ever happen) but also allows spaces and other white space (e.g. newlines, tabs) in the value, e.g.
a=1 b=cat c=1 and 2 d=3
becomes:
a=1
b=cat
c=1 and 2
d=3
i.e. I want to extract all the pairs as groups.
I cannot figure out the regex. My sample doesn't include newline but that could also happen
I've tried the basics like:
(.+?=.+?)
\s?([^\s]+)
but these fail with space and newlines. I'm coding it also so can tidy up any leading/trailing characters where needed, I just rather do it in regex than scan one character at a time.
You can use
([^\s=]+)=([\w\W]*?)(?=\s+[^\s=]+=|$)
See the regex demo. Details:
([^\s=]+) - Group 1: one or more chars other than whitespace and = char
= - a = char
([\w\W]*?) - Group 2: any zero or more chars, as few as possible
(?=\s+[^\s=]+=|$) - a positive lookahead that requires one or more whitespaces followed with one or more chars other than whitespace and = followed with = or end of string immediately to the right of the current location.
A better idea to match any character instead of [\w\W] is by using a . and the singleline/dotall modifier (if supported, see How do I match any character across multiple lines in a regular expression?), here is an example:
(?s)([^\s=]+)=(.*?)(?=\s+[^\s=]+=|$)

RegEx string to find two strings and delete the rest of the text in the file including lines that don't contain the strings [duplicate]

I need to do a find and delete the rest in a text file with notepad+++
i want tu use RegeX to find variations on thban..... the variable always has max 5 chars behind it(see dots).
with my search string it hit the last line but the whole line. I just want the word preserved.
When this works i also want keep the words containing C3.....
The rest of a tekst file can be delete.
It should also be caps insensitive
(?!thban\w+).*\r?\n?
\
THBANES900 and C3950 bla bla
THBAN
..THBANES901.. C3850 bla bla
THBANMP900
**..thbanes900..**
This should result in
THBANES900 C3950
THBAN
THBANES901 C3850
THBANMP900
thbanes900
Maybe just capture those words of interest instead of replacing everything else? In Notepad++ search for pattern:
^.*\b(thban\S{0,5})(?:.*(\sC3\w+))?.*$|.+
See the Online Demo
^ - Start string ancor.
.*\b - Any character other than newline zero or more times upto a word-boundary.
(- Open 1st capture group.
thban\S{0,5} - Match "thban" and zero or 5 non-whitespace chars.
) - Close 1st capture group.
(?: - Open non-capturing group.
.* - Any character other than newline zero or more times.
( - Open 2nd capture group.
\sC3\w+ - A whitespace character, match "C3" and one ore more word characters.
) - Close 2nd capture group.
)? - Close non-capturing group and make it optional.
.* - Any character other than newline zero or more times.
$ - End string ancor.
| - Alternation (OR).
.+ - Any character other than newline once or more.
Replace with:
$1$2
After this, you may end up with empty line you can switly remove using the build-in option. I'm unaware of the english terms so I made a GIF to show you where to find these buttons:
I'm not sure what the english checkbutton is for ignore case. But make sure that is not ticked.
You may use
Find What: (?|\b(thban\S{0,5})|\s(C3\w+))|(?s:.)
Replace With: (?1$1\n:)
Screenshot & settings
Details
(?| - start of a branch reset group:
\b(thban\S{0,5}) - Group 1: a word boundary, then thban and any 0 to 5 non-whitespace chars
| - or
\s(C3\w+) - a whitespace char, and then Group 1: C3 and one or more word chars
) - end of the branch reset group
| - or
(?s:.) - any one char (including line break chars)
The replacement is
(?1 - if Group 1 matched,
$1\n - Group 1 value with a newline
: - else, replace with empty string
) - end of the conditional replacement pattern

How to clear lines after the last regex match

I got an huge log of records I need to turn into a table.
Each line has a record, preceded by date and time, something like this:
27/11/2019 16:35 - i don't need this
28/11/2019 17:25 - don't need this either
30/11/2019 11:33 - stuff i'm looking for
01/12/2019 08:11 - stuff that i'm also looking for
03/11/2019 09:39 - don't need this
I want to completely clear the file from all the lines that I don't need.
I'm able to clear most of the lines that I don't want if I use the following regex and substitution patterns (in notepad++, using the flag in which dot matches newline):
.+?(?<datetime>[\d\/]+\s[\d:]+)\s-\s(?<mystuff>stuff[^\n]+)
'${datetime};${mystuff}
However, I can't clear the lines after the last match. How could I do so?
You may use
Find What: ^(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?)
Replace With: (?{1}$1;$2)
Details
^ - start of a line
(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?) - match either
.+? - any 1+ chars, as few as possible
([\d/]+\h[\d:]+) - Group 1: one or more digits or /, a horizontal whitespace, one or more digits or :
\h-\h - a horizontal whitespace, - and a hor. whitespace
(stuff.*) - Group 2: stuff and the rest of the line
| - or
.* - any 0+ chars other than linebreak chars
\R? - an optional line break sequence.
The (?{1}$1;$2) replacement pattern only replaces with $1;$2 if Group 1 matches.
See the Notepad++ demo:

regular expression to capture groups in selected lines

I have multi line string below (in python) and looking for regex to extract src, dst and severity. So in the example below group1 be '10.4.180.5' , group 2 '34.23.21.10' and group 3 'critical'
src: 10.4.180.25
dst: 34.23.21.10
natsrc: 20.160.129.5
natdst: 34.33.21.10
... more lines
severity: critical
... more lines
If I try regex like /src: (\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\ndst: (\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\n/ with gm flags it will find me src and dst but not severity which is few lines down (lines omitted for clarity). Is there a way to do it without including all of these lines between src, dst and severity ?
You missed need to actually match any number of lines that do not start with severity after what your pattern matches. Besides, you may shorten the pattern by using {3} limiting quantifier in order not to repeat \.\d{1,3} so many times. Note than between a whitespace and a digit, the word boundary is implicit, it is already there, no need to use \b.
Use
src:\s*(\d{1,3}(?:\.\d{1,3}){3})\ndst:\s*(\d{1,3}(?:\.\d{1,3}){3})(?:\n(?!severity).+)*?\nseverity:\s*(.+)
See the regex demo
Details
src: - a literal substring
\s* - 0+ whitespaces
(\d{1,3}(?:\.\d{1,3}){3}) - Group 1: IP-like pattern
\n - a newline
dst:\s* - dst: with 0+ whitespaces after it
(\d{1,3}(?:\.\d{1,3}){3}) - Group 1: IP-like pattern
(?:\n(?!severity).+)*? - 0+ sequences (as few as possible) of
\n(?!severity) - a newline not followed with severity
.+ - the whole line
\nseverity:\s* - a newline, severity: substring and 0+ whitespaces
(.+) - Group 3: 1 or more chars up to the end of the line
Note you do not need any DOTALL modifier with this regex.
You can use a greedy lookup (think this is the right terminology) regex to do this:
src: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\ndst: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})[\s\S]*?severity: (.+)?\n
I have updated the regex so it actually works now!
so it searches for the same bit you have, but then as there are many lines between the dst: line and the severity line, we need to skip all these lines.
To match any number of lines up to the line beginning with severity:, we need to match any characters - including new-lines. To do this, we can use a set of characters: [\s\S]. This means match any character which is not a space or is a space, i.e. all characters. We then put this in a greedy lookup to match as many any characters needed to get to the severity: line - so this bit is [\s\S]*?severity:.
Now we are at the severity: line, we want to match and return the characters up to the end of that line (up to the new-line \n character). This is done with the similar: (.+)?\n syntax but with a plus as we want to match one or more characters. Also, as want to return this bit, we need to put it in parentheses.

Sublime Regex extract

<.*>|\n.*\s.*\sid="(\w*)".*\n+|.*>\n|\n.+
and replace $1
This regex can take all id out from file
<a href="java" class="total" id="maker" placeholder="getTheResult('local6')">master6<a>
Result is maker
How can I extract getTheResult key name?
so my result will be local6
Tried <.*>|\n.*\s.*\sgetTheResult('(\w*)').*\n+|.*>\n|\n.+ but didn't helped
I assume that:
you have files with text like getTheResult('local6')
you may have several values like that on a line
you'd like to keep those text only, one value per line.
I suggest
getTheResult\('([^']*)'\)|(?:(?!getTheResult\(')[\s\S])*
and replace with $1\n. The \n will insert a newline between the values. You can then use ^\n regex (to replace with empty string) to remove empty lines.
Pattern details:
getTheResult\(' - matches getTheResult(' as a literal string (note the ( is escaped)
([^']*) - Group 1 capturing 0+ chars other than '
'\) - a literal ')
| - or
(?:(?!getTheResult\(')[\s\S])* - 0+ chars that are not starting chars of the getTheResult(' character sequence (this is a tempered greedy token).