How to combine lines in regular expressions? - regex

So i am new to regular expressions and i am learning them using a simple text editor only. I have the following file
84544484N
32343545M
32334546E
34456434M
I am trying to combine each pair of lines into one tab delimited line
The result should be :
84544484N 32343545M
32334546E 34456434M
I wrote the following :
Search: (.*?)\n(.*?)
Replace: \1\t\2
this did not work can someone please explain why and give me the correct solution. Thank you!!

The (.*?)\n(.*?) pattern will never work well because the (.*?) at the end of the pattern will always return an empty string (since *? is a lazy matching quantifier and if it can return zero characters (and it can) it will. Use greedy matching and adjust the pattern like:
(.+)\r?\n *(.*)
or - since SublimeText uses Boost regex - you can match any newline sequence with \R:
(.+)\R *(.*)
and replace with \1\t\2. Note I replaced *? with + in the first capturing group because you need to match non-empty lines.
Regex breakdown:
(.+) - one or more characters other than a newline (as many as possible) up to
\R - a newline sequence (\r\n, \r or just \n)
* - a literal space, zero or more occurrences
(.*) - Group 2: zero or more characters other than a newline (as many as possible)
/

Related

Regex: delete everything between String and replace with other

So I've been scratching my head over this one, I have over a thousand files that have different values between the strings
<lodDistances content="float_array">
15.000000
25.000000
70.000000
140.000000
500.000000
500.000000
</lodDistances>
I need to replace those values with these
<lodDistances content="float_array">
120.000000
200.000000
300.000000
400.000000
500.000000
550.000000
</lodDistances>
I tried the following without any success
\ (?<=\<lodDistances content\=\"float_array\"\>)(.*)(?=\<\/lodDistances\>)
It seems to find it in regexr but not in a sublime text when I try to find it in files, I constantly get 0 results. Any idea why this is happening?
There are a couple of things that are wrong in your pattern:
\< matches a leading word boundary position (as \b(?=\w)) and \> matches the trailing word boundary position (same as \b(?<=\w)). You wanted to match literal < and > chars, thus, you must NOT escape them
There is no need matching a space before the first <
Since you text is multiline, use either (?s) inline modifier or (?s:...) modifier group to make . match across line breaks, or use a [\s\S] / [\w\W] / [\d\D] workaround
Use a lazy dot pattern to stop matching at first occurrence of the trailing delimiter.
You may use
(?s)(<lodDistances content="float_array">\s*).*?(?=\s*</lodDistances>)
And replace with ${1}<new values>. The curly braces are necessary as the new values are most likely numbers and without the braces, $1n (n stands for a digit here) will be parsed incorrectly (see this YT video for a demo of what it is fraught with).
See the demo below:
V
Regex details:
(?s) - now, . matches line break chars, too
(<lodDistances content="float_array">\s*) - Group 1 capturing <lodDistances content="float_array"> text and then zero or more whitespaces
.*? - any zero or more chars, but as few as possible
(?=\s*</lodDistances>) - a positive lookahead that matches the location that is immediately followed with zero or more whitespaces and </lodDistances> text.
Note that / is not a special regex metacharacter, and since regex delimiter notation is not supported in Sublime Text, you do not have to ever escape it here.

Match asterisk followed by space in PCRE

I'm just having trouble figuring out how to regex properly. What I need is to match an asterisk followed by a space followed by any amount of characters that aren't \n. (Similar to reddit list formatting)
Example:
* Test
* Test2
* Test3
The closest I got was this, but it wasn't working.
/^[*][ ](.*?)/s
Can anyone familiar with PCRE help me.
You should not use a lazy dot pattern at the end of the regex because it will never match any single char (as it will be skipped when the regex engine comes up to it, and since there is nothing to match after it, the empty string will be matched by .*?).
Use the greedy dot pattern:
^\* (.*)
See the regex demo
Other notes: you may use \h to match any horizontal whitespace instead of the regular space in the pattern. To match start of lines with ^ use m modifier. Only use s modifier if you need . to match any chars including a newline (and carriage return depending on PCRE verbs that are active).

Regular expression to remove syslog date in filebeat?

I would like to parse some syslog lines that they look like
Oct 20 16:34:59 artguard TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I would like to turn them into
TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
So I was wondering how the regular expression should look like that would allow me to do so, since the first part will change every day, because it is appended by the syslog.
EDIT: to avoid duplicated, I am trying to use REGEX with filebeat, where no all regex are supported as explained here
Regex101
(TTN-.*$)
Debuggex Demo
Explained
1st Capturing Group (TTN-.*$)
TTN- matches the characters TTN- literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
The regular expression TTN-\S* is probably a way of doing what you're looking for, here it is in a java-script example.
var value = "Oct 20 16:34:59 artguard TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
var matches = value.match(
new RegExp("TTN-\\S*", "gi")
);
document.writeln(matches);
It works in two main parts:
The TTN- matches TTN- (obviously)
The \S* matches any character that is not a white-space, this is done as many times as possible.
Currently it is always expecting atleas a '-' after the TTN but if you repace the '-' with a '-{01}' in the regex it will expect TNN maybe a dash followed by 0-n characters that are not a white-space. You could also replace \S* with \w* to get all the letters and digits or .* to get all characters apart from end of line /n character, TNN-\S*[^\s{2}] too end the match with two spaces. Hope this was helpful.

Regex to match lines starting with a \t or - but only capture - on

I cannot figure out this regex for the life of me
I have example input such as:
- Line 1
- Line 2
- Line 3
- Line 4
I am trying to match each line starting at the - and going through the end of the line. I am using the Workflow app on iOS which uses ICU regex parsing
The pattern I am using is
(?m)^\t*(-.*)
This pattern will match all the lines, but it captures the tabs. What am I doing wrong?
You ask why your regex captures the tabs. It is not so: your regex matches the tabs, and captures the - after those tabs with the rest of the line. The point is that you are using consuming pattern, the one that will return the matched/captured strings.
Non-consuming patterns - lookarounds - can be used to just check for some text presence/absence that do not actually put it into the text returned.
In ICU regex flavor, the lookbehinds are of constrained-width, that is, if you use a limiting quantifier, it is OK to use it. (The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
Thus, this will work in case there can be 100 and fewer tabs at the line start:
(?m)(?<=^\t{0,100})-.*
Here,
(?m) - makes ^ match the start of a line
(?<=^\t{0,100}) - a positive lookbehind requiring 0 to 100 tabs after the beginning of the line to appear before a
-.* - hyphen and the rest of the line.
Try this:
(?m)^[ \t]*(-.*)
First, it appears that you have some spaces at the beginning of some of those lines, so \t will not match spaces. Replacing \t with [ \t] (or just \s) will fix this. Also, (-*) is going to match and capture any number of -, not including what's following. Put a . before your * to match any number of characters following the -, like this: (-.*)
If you don't require leading spaces, you can use
(?m)(-.*)
If you don't care about capturing the match, you don't need the parenthesis, giving you
(?m)-.*
As mentioned in the comments

How can I replace only the first 2 matches per line, using regex in Notepad++

I'm trying to parse a list of filenames to a CSV file by converting the first 2 - characters per line into a |. The problem is that the filenames themselves also contain the character I'm searching for.
My raw data looks something like this:
12055371-1-Florence - BW Letter of Intent HB Comments 9-4-14-2.DOCX
12057668-2-EB-DUE-M- SBuxbaum FHA Benefit Plans-2.DOCX
12058210-1-Redline Letter of Intent-2.PDF
12058029-3-Florence Hospital--Order Establishing Bid Procedures-HB 9-23-14-2.DOCX
12058020-10-Florence - BW Letter of Intent 10,10,14 Revisions-2.DOCX
Using Notepadd++ to replace on the fly, but I'm not sure what regex will work to identify and replace these items.
Don't match -, match the beginning of the lines up to the second - :
match ^(.*?)-(.*?)-
replace by \1|\2|
Explanation :
^ matches the beginning of the line (0-width match).
(.*?) matches any character in a non-greedy way : if the next token of the regex can match, it will let it do so. The result is grouped so it can be referenced later.
\1 and \2 are back-references and refers to the two (.*?) groups.
Note : for efficiency you could replace the non-greedy matches by the negated class [^\-], which means every character but -, the - being escaped because it's a special character in this context. The groups would then become ([^\-]*). Of course it really does not matter if it's a one-time operation.