How to extract the following 2-3 lines? - regex

Hi everyone i would like to extract the following Text. I have provided my regular expression below but its not going according to the output I want.
Output I want:
Extract Title
Extract 2nd line below title if there is, extract it. If not, move on.
Extract the Address (Only for address: regardless new line or not)
Regular expression:
/(.+?)\s*(\d+.*Singapore\s+\d{6}\b|\d+.*S\d{6})\b/g

/(^.+\n)(^.+\n)?(^\d+.*\sSingapore,?\s\d{6})/gm
(^.+\n) - capture title
() - defines capture group
^ - matches beginning of the line
.+ - matches 1 or more character
\n - matches new line
(^.+\n)? - capture 2nd line
? - matches the group 0 or 1 times (since this line is optional)
(^\d+.*\sSingapore,?\s\d{6}) - capture address
\d+ - matches 1 or more digit
.* - matches any character 0 or more times (maybe you need to modify it to be required)
\s - matches a whitespace
Singapore - matches the word Singapore
,? - matches a comma 0 or 1 times (remove ? if comma is required)
\s - matches a whitespace
\d{6} - matches 6 digits
gm
g - global flag, allows you to find multiple matches in the text. Only needed if your text contains more than one set of title/description/address.
m - multiline flag, looks for matches in the whole text, not in a single line.

Related

Extract numbers from a 1 line string using regex

I have a one line string that looks like this:
{"took":125,"timed_out":false,"_shards":{"total":10,"successful":10,"skipped":0,"failed":0}}{"took":365,"timed_out":false,"_shards":{"total":10,"successful":10,"skipped":0,"failed":0}}{"took":15,"timed_out":false,"_shards":{"total":10,"successful":10,"skipped":0,"failed":0}}
I would like to extract all the numbers after the "took" part, so in my case the output would look like this:
125
365
15
What I've tried so far is using took":(\d{1,6}),"(.*) as a regex. But since its a one line string, it only extracts the first occurence and ignores the others.
You can use
Find What:      took":(\d+)|(?s)(?:(?!took":\d).)*
Replace With: (?{1}$1\n)
Details:
took": - literal text
(\d+) - one or more digits captured into Group 1
| - or
(?s) - set the DOTALL mode on (. matches line break chars now)
(?:(?!took":\d).)* - any single char, zero or more times, as many as possible, that does not start a took": + digit char sequence.
The (?{1}$1\n) conditional replacement pattern replaces this way:
(?{1} - if Group 1 is matched
$1\n - replace the match with Group 1 and a newline
) - else, replace with an empty string.

RegEx string to find two strings and delete the rest of the text in the file including lines that don't contain the strings [duplicate]

I need to do a find and delete the rest in a text file with notepad+++
i want tu use RegeX to find variations on thban..... the variable always has max 5 chars behind it(see dots).
with my search string it hit the last line but the whole line. I just want the word preserved.
When this works i also want keep the words containing C3.....
The rest of a tekst file can be delete.
It should also be caps insensitive
(?!thban\w+).*\r?\n?
\
THBANES900 and C3950 bla bla
THBAN
..THBANES901.. C3850 bla bla
THBANMP900
**..thbanes900..**
This should result in
THBANES900 C3950
THBAN
THBANES901 C3850
THBANMP900
thbanes900
Maybe just capture those words of interest instead of replacing everything else? In Notepad++ search for pattern:
^.*\b(thban\S{0,5})(?:.*(\sC3\w+))?.*$|.+
See the Online Demo
^ - Start string ancor.
.*\b - Any character other than newline zero or more times upto a word-boundary.
(- Open 1st capture group.
thban\S{0,5} - Match "thban" and zero or 5 non-whitespace chars.
) - Close 1st capture group.
(?: - Open non-capturing group.
.* - Any character other than newline zero or more times.
( - Open 2nd capture group.
\sC3\w+ - A whitespace character, match "C3" and one ore more word characters.
) - Close 2nd capture group.
)? - Close non-capturing group and make it optional.
.* - Any character other than newline zero or more times.
$ - End string ancor.
| - Alternation (OR).
.+ - Any character other than newline once or more.
Replace with:
$1$2
After this, you may end up with empty line you can switly remove using the build-in option. I'm unaware of the english terms so I made a GIF to show you where to find these buttons:
I'm not sure what the english checkbutton is for ignore case. But make sure that is not ticked.
You may use
Find What: (?|\b(thban\S{0,5})|\s(C3\w+))|(?s:.)
Replace With: (?1$1\n:)
Screenshot & settings
Details
(?| - start of a branch reset group:
\b(thban\S{0,5}) - Group 1: a word boundary, then thban and any 0 to 5 non-whitespace chars
| - or
\s(C3\w+) - a whitespace char, and then Group 1: C3 and one or more word chars
) - end of the branch reset group
| - or
(?s:.) - any one char (including line break chars)
The replacement is
(?1 - if Group 1 matched,
$1\n - Group 1 value with a newline
: - else, replace with empty string
) - end of the conditional replacement pattern

How to clear lines after the last regex match

I got an huge log of records I need to turn into a table.
Each line has a record, preceded by date and time, something like this:
27/11/2019 16:35 - i don't need this
28/11/2019 17:25 - don't need this either
30/11/2019 11:33 - stuff i'm looking for
01/12/2019 08:11 - stuff that i'm also looking for
03/11/2019 09:39 - don't need this
I want to completely clear the file from all the lines that I don't need.
I'm able to clear most of the lines that I don't want if I use the following regex and substitution patterns (in notepad++, using the flag in which dot matches newline):
.+?(?<datetime>[\d\/]+\s[\d:]+)\s-\s(?<mystuff>stuff[^\n]+)
'${datetime};${mystuff}
However, I can't clear the lines after the last match. How could I do so?
You may use
Find What: ^(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?)
Replace With: (?{1}$1;$2)
Details
^ - start of a line
(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?) - match either
.+? - any 1+ chars, as few as possible
([\d/]+\h[\d:]+) - Group 1: one or more digits or /, a horizontal whitespace, one or more digits or :
\h-\h - a horizontal whitespace, - and a hor. whitespace
(stuff.*) - Group 2: stuff and the rest of the line
| - or
.* - any 0+ chars other than linebreak chars
\R? - an optional line break sequence.
The (?{1}$1;$2) replacement pattern only replaces with $1;$2 if Group 1 matches.
See the Notepad++ demo:

Regular expression for substitute a string with another

I have this two lines of text, that I want to manipulate using Regular Expression and substitute:
Obj.FieldNameA = Reader.GetEnumFromInt32<ClassName>(QueryGenerator,nameof(Obj.));
Obj.FieldNameB=Reader.GetTrimmedStringOrNull(QueryGenerator,nameof(Obj.));
Attached on the first Obj. there is a Field name, so in this case they are FieldNameA,FieldNameB
I want to attach these values to the second Obj. found on the same line, so the text should become:
Obj.FieldNameA = Reader.GetEnumFromInt32<ClassName>(QueryGenerator,nameof(Obj.FieldNameA));
Obj.FieldNameB=Reader.GetTrimmedStringOrNull(QueryGenerator,nameof(Obj.FieldNameB));
I have tested this very simple (and wrong) regex:
Obj\.(\w*).*\n
With substituition as $1
But I don't know how to use substitution...
Sample code here
Some Notes:
After FieldNameA there is always an equal sign that could be preceded or followed by a space.
Before the second Obj. there could be any character, including < ( etc...
Could this be achieved?
You may use
Find: (Obj\.(\w+).*\(Obj\.)\)
Replace: $1$2)
See the regex demo.
You may also add ^ to the start of the regex to match only at the start of a line/string.
Details
^ - start of string
(Obj\.(\w+).*\(Obj\.) - Group 1 ($1 in the replacement):
Obj\. - Obj. text
(\w+) - Group 2 ($2): 1 or more word chars
.* - any 0+ chars other than line break chars as many as possible (you may use .*? to only match the second Obj. on a line, your current input only has two with the second one closer to the end of a line, so .* will work better)
\(Obj\. - (Obj. text
\) - a ) char.

Puppet dynamic variable from hostname

I am looking at trying to get a dynamic variable out of my ec2's hostname. Hostnames follow this pattern
us-east-1b-application-type-environment-138-10.domain.com
I would like my variable to end up looking like this
application-type-environment
Using this
$variable = regsubst($hostname, '/[a-z]{1}[0-9]{1}-([^-]+)-[0-9]{1,3}/', '')
I end up with this though
us-east-1b-application-type-environment-138-10
How can I get my expected outcome?
You do not need regex delimiters in regsubst. You need to match the whole string to be able to remove it and only keep what you need. The techique consists in matching what you do not want to keep and matching and capturing what you do want to have asa result.
You can use
regsubst($hostname, '^[^0-9]*[0-9][a-z]-(.*?)-[0-9]{1,3}.*$', '\1')
I think you are trying to get just what is in between the first [digit][lowercase-letter] chunk and a three digit chunk.
Here is a regex demo
Breakdown of the expression:
^ - start of line (if start of string is meant, replace with \A)
[^0-9]* - 0 or more non-digit symbols (all but digits, this can be replaced with \D*)
[0-9][a-z]- - a digit followed by a lowercase letter followed by - (the same as \d[a-z])
(.*?) - match and capture any characters but a newline as few as possible before the closest...
-[0-9]{1,3} - 1 to 3 digits (the same as \d{1,3})
.*$ - 0 or more any characters but a newline up to the end of line (if end of string is meant, replace with \z).