RegEx for simple pattern - regex

I have a logfile that contains lines like:
....
unit:
...
unit.integration:
....
I would like to run a RegEx search on the file using notepad++ that returns all lines that: start with a word with no blanks and ends with :\n. I have tried:
(.:\n)
but that gives 0 results. I have looked at:
http://www.aivosto.com/vbtips/regex.html
EDIT: Updated with more specific requirements to the starting word.

I would like to run a RegEx search ... that returns all lines that start with anything but ends with :\n.
The first part of the match is :, and because you're looking for line breaks, you'll want to match the end of the line, which is $.
You said you don't care about what comes before it, so don't even include it in the regex:
:$
Update to address edit:
I would like to run a RegEx search ... that returns all lines that: start with a word with no blanks and ends with :\n.
This differs from your original post, now you want to match a "word with no blanks", however that implies that it's a single word, and doesn't contain special characters.
It seems to me that you'd like to match unit.integration which is two words and a separating . character.
If you want what you asked for ("a word with no blanks") then just prepend ^\w+ to the regex:
^\w+:$
(matches unit:, but not unit.integration:)
If, instead you want to match lines that don't contain spaces and end in :, then you should use ^\S+ instead:
^\S+$
(matches unit: and unit.integration:, but also matches ##*()$&*(&:)
The details matter, so avoid assumptions and be as explicit as possible in what you want matched.

To get all lines that have no spaces and end in :, use
^\S+:\R
\S matches non-whitespace symbols only, and \R means any line break.
See screenshot:
If you plan to match the last line, too, replace \R with $ (end of line or whole file metacharacter).

Related

What is the regex to find lines WITHOUT a line break

I'm using SubtitleEdit and I'd like to locate all the lines that do not contain a line break.
Because lines containing a line break indicates they are bilingual, which I want.
But those that do not have line breaks are mono-lingual, and I'd like to quickly locate them all and delete them. TIA!
Alternatively, if there is a regex expression that can find lines which do not contain any English characters, that would also work.
The confusion here was caused by 2 facts:
What SubtitleEdit calls a line is actually a multiline, containing
newlines.
The newline displayed is not the one used internally (so it would never match <br>).
Solution 1:
Now that we have found out it uses either \r\n or just \n, we can write a regex:
(?-m)^(?!.*\r?\n)[\s\S]*$
Explanation:
(?-m) - turn off the multiline option (which is otherwise enabled).
^ - match from start of text
(?!.*\r?\n) - negative look ahead for zero or more of any characters followed by newline character(s) - (=Contains)
[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.
In short: If we don't find newline characters, match everything.
Now replace with an empty string.
Solution 2:
If you want to match lines that doesn't have any English characters, you can use this:
(?-m)^(?![\s\S]*[a-zA-Z])[\s\S]*$
Explanation:
(?-m) - turn off the multiline option (which is otherwise enabled).
^ - match from start of text
(?![\s\S]*[a-zA-Z]) - negative look ahead for ANY characters followed by an English character.
[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.
In short: If we don't find an English character, match everything.
Now replace with an empty string.
You should use regex assert. Given test lines:
something_1
some<br>thing_2
something_3<br>
<br>something_4
something_5
This is an expression that will match lines 1 and 5
^(?!.*<br>).*$
In this regular expression we have the negative lookahead assertion (?!.*<br>) that allows us to define what line is suitable for us

How to set regex for lines that don't start with '/' and end with ',' or '.' or '+'?

I have source code, and I would like to find all lines than end with comma, dot or plus symbols ',' , '.' , '+',
So, I have this: (\,|\.|\+)$ - to show lines that end with any of these symbols.
But I want to ignore lines that start with // - commented lines, and they can be preceded with any number of spaces.
I tried this to ignore lines with / at the beginning: ^[^/].*
but this doesn't really work well. Then I also tried to put: \s* at the beginning to ignore all spaces, but still not working good.
So, my regex would be something like: \s*^[^/].*(\,|\.|\+)$ - skip all spaces at the beginning AND ignore those with / as first symbol AND show those lines that end with ,.+
But this still finds lines with that start with space and /. What am I doing wrong?
EDIT:
Here are example lines:
...
// increase no of objects...
counter := 1+
2;
...
I need to ignore comment lines, starting with // but need all lines that end with .,+ as I need to reformat them.
The only way to do this with one single regex will be to use a negative lookbehind. However, you would end up with a regex that starts with ^(?<!\s*\/), which is invalid due to the quantifier in the lookbehind.
That means the only way that actually works is to use multiple regexes.
The lines you want will match this (which literally means "a comma, a dot or a plus sign at the end of the line"):
[,.+]$
but not this (which literally means "zero or more spaces followed by a forward slash at the beginning of the line"):
^\s*\/
Maybe it can works:
^(?!/|\s*/)+.*[\,\.\+]$
This regular expression ignores all lines who start with / or zero or more spaces follow by / and ends with ,, ., or +.
I hope works for you.
EDIT: It can be reduced:
^(?!\s*/)+.*[\,\.\+]$
I tested this regular expression, how you can see, it ignores line 192 and match 210, but it doesn't ignore lines inner a block comment, because it doesn't start with /.

What regular expression will select all lines that have more than one punctuation mark?

I have this regular expression:
\..*?\.
But it only selects between two periods, not every punctuation mark, and it also selects across multiple lines.
Would modifying this expression to only take in one line at a time work somehow, if there's also a way to group punctuation into where we have a period?
Just to make things simpler, at this time I only need the expression to recognize periods, exclamation points, and question marks. I don't need it to register commas.
Thanks to Nathan and Agumander below, I know to substitute [.!?] in place of \. now, but I'm still having trouble with the other half of my question.
Just to make sure I'm being more clear, using [.!?].*?[.!?]\s will highlight text between punctuation marks, but across multiple lines. So I can't use it to bookmark only the lines that have multiple punctuation marks.
Placing characters inside a pair of square brackets will match to any of the enclosed characters. In your case you'd want [.?!]
If you want to match any sentence that has two of these, then you'll be looking for a pair of [.!?] separated by zero or more of any character.
The regex that matches strings with more than one of the set [.?!] would then be [.!?].*[.!?]
To make . match newlines, you'd add the s modifier to your regex.
...so the full regex would be /[.!?].*[.!?]/s
Ok I figured it out. Thanks to Agumander and Nathan above I substituted [.!?] in for the two \. in my original regex:
\..*?\. became [.!?].*[.!?]
Putting \s at the end of the regex made it pink select the entire document in notepad++.
The last issue I had was remembering to turn off "matches newline."
Agumander, I think you're asking for a regex that basically finds multiple punctuation marks on a single line. So here's one way to do it.
Here's the text I'm going to match. The regex will match the first line in it's entirety, but will not match the second.
Here's a line with multiple punctuation. The entire line will match the regex!
This line does not have multiple punctuation.
Regex
^.*(?:[\.?!].*){2,}$
Explanation
^ -- Start matching at the beginning of a line
.* -- match any character 0 or more times
(?: -- start a new non-capturing group
[.?!] -- find a character matching a period, question mark, or exclamation point.
.* -- match any character 0 or more times
)
{2,} -- repeat the previous group 2 or more times. This is how we ensure there's at least two punctuation marks before considering it a match.
$ -- end of line anchor, basically stop matching at the end of a line

Convert a list of values to CSV values

I have a list of words and I want to convert them into a CSV.
a
b
c
d
to a,b,c,d
I replaced \n by , and it worked, but that was my 2 attempt
I first tried this regex ^([A-Za-z ]+)$\n and replacement is \1, . This particular regex is doing it for adjacent string like this:
a,b
c,d
What can I change in it to get it to work.
I am doing it in eclipse so I guess it is java, but I dont have to take into consideration the \ escape, it is same as edit+.
This regex:
^([A-Za-z ]+)$\n
matches the beginning of a line, letters and space, then the end of the line.
Once you perform your first replacement, the line contains a comma, so it would no longer match that pattern.
The regex is also a bit redundant. Because \n only comes at the end of a line anyway, you don't need both $ and \n in your pattern.
In order to fix it, you simply need to let your pattern match a comma:
^([A-Za-z ,]+)\n
Note: the specifics might vary based on your Eclipse version and/or file encoding. I needed \r\n to match a newline in mine.
From your example, you don't even need to use regular expressions. Simply replace two one newlines newline (\n) with a comma (,) and you're set.

RegEx \D matches start and end of line as well

I need to find lines that are 3 digits and 3 other characters: I thought I use the following RegEx:
^\d{3}\D{3}$
But take the following sample text file and run the RegEx above (the text must have the empty lines in it):
1
12
123xxx
123y
aaabb
The problem is that there are two matches: 123xxx (which is fine), but also 123y is matched!
I suspect the reason is that "y" + the end-of-line + the beginning-of-next-line are also matched.
How can I tell the regex engine to ignore line beginnings and endings with \D and match characters only, not positions?
The behavior of $ in UltraEdit changes depending on whether you have "Match Whole Word Only" checked or not. To get the behavior you want you need to make sure that that option is checked. Your regular expression doesn't need to change.
Maybe:
/^\d{3}\D{3}$/m
The m means
Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.
http://perldoc.perl.org/perlre.html
I don't know about UltraEdit exactly but I expect it will have something similar.
Try this :
^\d{3}[\S]{3}$
Match lines with 3 digits followed by three characters that are not blank characters.