How to select the first space only, not including other characters

How to select the first space only, not including other characters - regex

I want to fix some formatted strings with 'find and replace' in Visual Studio Code.
To do so, I have to select first spaces only in each line, not including characters.
The format goes like this :
dc932a17 3919734822 5234dce7debe.mp4
e_f943 4961243553 03be639fa8b7.mp4
9cbcc2 4365389628 e741018829d6.mp4
543419d 4639618462 d0bd72c9b737.mp4
Desired outputs look like :
dc932a17-3919734822 5234dce7debe.mp4
e_f943-4961243553 03be639fa8b7.mp4
9cbcc2-4365389628 e741018829d6.mp4
543419d-4639618462 d0bd72c9b737.mp4
So what I want to select are :
dc932a17 3919734822 5234dce7debe.mp4
|------|^These spaces
^Not these characters
So I made an regex like this :
^(?:([a-zA-Z0-9_]+))\s
But this selects all the characters before the first space including in it.
dc932a17 3919734822 5234dce7debe.mp4
|-------|
^Selected
Is there anything I got wrong?
condition:
The characters' length before spaces vary. I can't use alt+shift+Drag selection

You can use
Find: ^(\S+)\s+
Replace: $1-
See the regex demo.
Note: If there are any leading whitespace chars on the qualifyinf line, you need to add \s* after ^(, i.e. ^(\s*\S+)\s+.
Details:
^ - start of a line
(\s*\S+) - Group 1 ($1): zero or more whitespace chars (but not line break chars here since the pattern does not contain \r or \n) and then one or more non-whitespace chars
\s+ - one or more whitespaces (except line break chars).

Related

KVP extraction but allow the separator in the value

With a key value pair string that is separated by space character (just one I believe will ever happen) but also allows spaces and other white space (e.g. newlines, tabs) in the value, e.g.
a=1 b=cat c=1 and 2 d=3
becomes:
a=1
b=cat
c=1 and 2
d=3
i.e. I want to extract all the pairs as groups.
I cannot figure out the regex. My sample doesn't include newline but that could also happen
I've tried the basics like:
(.+?=.+?)
\s?([^\s]+)
but these fail with space and newlines. I'm coding it also so can tidy up any leading/trailing characters where needed, I just rather do it in regex than scan one character at a time.

You can use
([^\s=]+)=([\w\W]*?)(?=\s+[^\s=]+=|$)
See the regex demo. Details:
([^\s=]+) - Group 1: one or more chars other than whitespace and = char
= - a = char
([\w\W]*?) - Group 2: any zero or more chars, as few as possible
(?=\s+[^\s=]+=|$) - a positive lookahead that requires one or more whitespaces followed with one or more chars other than whitespace and = followed with = or end of string immediately to the right of the current location.
A better idea to match any character instead of [\w\W] is by using a . and the singleline/dotall modifier (if supported, see How do I match any character across multiple lines in a regular expression?), here is an example:
(?s)([^\s=]+)=(.*?)(?=\s+[^\s=]+=|$)

RegEx string to find two strings and delete the rest of the text in the file including lines that don't contain the strings [duplicate]

I need to do a find and delete the rest in a text file with notepad+++
i want tu use RegeX to find variations on thban..... the variable always has max 5 chars behind it(see dots).
with my search string it hit the last line but the whole line. I just want the word preserved.
When this works i also want keep the words containing C3.....
The rest of a tekst file can be delete.
It should also be caps insensitive
(?!thban\w+).*\r?\n?
\
THBANES900 and C3950 bla bla
THBAN
..THBANES901.. C3850 bla bla
THBANMP900
**..thbanes900..**
This should result in
THBANES900 C3950
THBAN
THBANES901 C3850
THBANMP900
thbanes900

Maybe just capture those words of interest instead of replacing everything else? In Notepad++ search for pattern:
^.*\b(thban\S{0,5})(?:.*(\sC3\w+))?.*$|.+
See the Online Demo
^ - Start string ancor.
.*\b - Any character other than newline zero or more times upto a word-boundary.
(- Open 1st capture group.
thban\S{0,5} - Match "thban" and zero or 5 non-whitespace chars.
) - Close 1st capture group.
(?: - Open non-capturing group.
.* - Any character other than newline zero or more times.
( - Open 2nd capture group.
\sC3\w+ - A whitespace character, match "C3" and one ore more word characters.
) - Close 2nd capture group.
)? - Close non-capturing group and make it optional.
.* - Any character other than newline zero or more times.
$ - End string ancor.
| - Alternation (OR).
.+ - Any character other than newline once or more.
Replace with:
$1$2
After this, you may end up with empty line you can switly remove using the build-in option. I'm unaware of the english terms so I made a GIF to show you where to find these buttons:
I'm not sure what the english checkbutton is for ignore case. But make sure that is not ticked.

You may use
Find What: (?|\b(thban\S{0,5})|\s(C3\w+))|(?s:.)
Replace With: (?1$1\n:)
Screenshot & settings
Details
(?| - start of a branch reset group:
\b(thban\S{0,5}) - Group 1: a word boundary, then thban and any 0 to 5 non-whitespace chars
| - or
\s(C3\w+) - a whitespace char, and then Group 1: C3 and one or more word chars
) - end of the branch reset group
| - or
(?s:.) - any one char (including line break chars)
The replacement is
(?1 - if Group 1 matched,
$1\n - Group 1 value with a newline
: - else, replace with empty string
) - end of the conditional replacement pattern

How to clear lines after the last regex match

I got an huge log of records I need to turn into a table.
Each line has a record, preceded by date and time, something like this:
27/11/2019 16:35 - i don't need this
28/11/2019 17:25 - don't need this either
30/11/2019 11:33 - stuff i'm looking for
01/12/2019 08:11 - stuff that i'm also looking for
03/11/2019 09:39 - don't need this
I want to completely clear the file from all the lines that I don't need.
I'm able to clear most of the lines that I don't want if I use the following regex and substitution patterns (in notepad++, using the flag in which dot matches newline):
.+?(?<datetime>[\d\/]+\s[\d:]+)\s-\s(?<mystuff>stuff[^\n]+)
'${datetime};${mystuff}
However, I can't clear the lines after the last match. How could I do so?

You may use
Find What: ^(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?)
Replace With: (?{1}$1;$2)
Details
^ - start of a line
(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?) - match either
.+? - any 1+ chars, as few as possible
([\d/]+\h[\d:]+) - Group 1: one or more digits or /, a horizontal whitespace, one or more digits or :
\h-\h - a horizontal whitespace, - and a hor. whitespace
(stuff.*) - Group 2: stuff and the rest of the line
| - or
.* - any 0+ chars other than linebreak chars
\R? - an optional line break sequence.
The (?{1}$1;$2) replacement pattern only replaces with $1;$2 if Group 1 matches.
See the Notepad++ demo:

regular expression to capture groups in selected lines

I have multi line string below (in python) and looking for regex to extract src, dst and severity. So in the example below group1 be '10.4.180.5' , group 2 '34.23.21.10' and group 3 'critical'
src: 10.4.180.25
dst: 34.23.21.10
natsrc: 20.160.129.5
natdst: 34.33.21.10
... more lines
severity: critical
... more lines
If I try regex like /src: (\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\ndst: (\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\n/ with gm flags it will find me src and dst but not severity which is few lines down (lines omitted for clarity). Is there a way to do it without including all of these lines between src, dst and severity ?

You missed need to actually match any number of lines that do not start with severity after what your pattern matches. Besides, you may shorten the pattern by using {3} limiting quantifier in order not to repeat \.\d{1,3} so many times. Note than between a whitespace and a digit, the word boundary is implicit, it is already there, no need to use \b.
Use
src:\s*(\d{1,3}(?:\.\d{1,3}){3})\ndst:\s*(\d{1,3}(?:\.\d{1,3}){3})(?:\n(?!severity).+)*?\nseverity:\s*(.+)
See the regex demo
Details
src: - a literal substring
\s* - 0+ whitespaces
(\d{1,3}(?:\.\d{1,3}){3}) - Group 1: IP-like pattern
\n - a newline
dst:\s* - dst: with 0+ whitespaces after it
(\d{1,3}(?:\.\d{1,3}){3}) - Group 1: IP-like pattern
(?:\n(?!severity).+)*? - 0+ sequences (as few as possible) of
\n(?!severity) - a newline not followed with severity
.+ - the whole line
\nseverity:\s* - a newline, severity: substring and 0+ whitespaces
(.+) - Group 3: 1 or more chars up to the end of the line
Note you do not need any DOTALL modifier with this regex.

You can use a greedy lookup (think this is the right terminology) regex to do this:
src: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\ndst: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})[\s\S]*?severity: (.+)?\n
I have updated the regex so it actually works now!
so it searches for the same bit you have, but then as there are many lines between the dst: line and the severity line, we need to skip all these lines.
To match any number of lines up to the line beginning with severity:, we need to match any characters - including new-lines. To do this, we can use a set of characters: [\s\S]. This means match any character which is not a space or is a space, i.e. all characters. We then put this in a greedy lookup to match as many any characters needed to get to the severity: line - so this bit is [\s\S]*?severity:.
Now we are at the severity: line, we want to match and return the characters up to the end of that line (up to the new-line \n character). This is done with the similar: (.+)?\n syntax but with a plus as we want to match one or more characters. Also, as want to return this bit, we need to put it in parentheses.

Sublime Regex extract

<.*>|\n.*\s.*\sid="(\w*)".*\n+|.*>\n|\n.+
and replace $1
This regex can take all id out from file
<a href="java" class="total" id="maker" placeholder="getTheResult('local6')">master6<a>
Result is maker
How can I extract getTheResult key name?
so my result will be local6
Tried <.*>|\n.*\s.*\sgetTheResult('(\w*)').*\n+|.*>\n|\n.+ but didn't helped

I assume that:
you have files with text like getTheResult('local6')
you may have several values like that on a line
you'd like to keep those text only, one value per line.
I suggest
getTheResult\('([^']*)'\)|(?:(?!getTheResult\(')[\s\S])*
and replace with $1\n. The \n will insert a newline between the values. You can then use ^\n regex (to replace with empty string) to remove empty lines.
Pattern details:
getTheResult\(' - matches getTheResult(' as a literal string (note the ( is escaped)
([^']*) - Group 1 capturing 0+ chars other than '
'\) - a literal ')
| - or
(?:(?!getTheResult\(')[\s\S])* - 0+ chars that are not starting chars of the getTheResult(' character sequence (this is a tempered greedy token).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to select the first space only, not including other characters - regex

Related

KVP extraction but allow the separator in the value

RegEx string to find two strings and delete the rest of the text in the file including lines that don't contain the strings [duplicate]

How to clear lines after the last regex match

regular expression to capture groups in selected lines

Sublime Regex extract

Categories

Resources