How to get (or remove) all comment lines from a matlab file?
Lines may start with no or an arbitrary number of whitespaces followed by one or more %, followed by the comment.
Using
only_comments = regexp(raw_string, '(?m)^[ ]*[%].*?$', 'match');
fails. Also, how to make sure tabs will be catched?
As I understand this its
(?m) line mode
^ beginning of line
[ ]* none or any number of white spaces
[%].*?$ followed by a % and then any charachter until the line end is reached.
Whats wrong?
Seems like you want something like this,
only_comments = regexp(raw_string, '(?m)^[ ]*[%]+.*?$', 'match');
OR
only_comments = regexp(raw_string, '(?m)^ *%+.*$', 'match');
Explanation:
^ Asserts that we are at the start.
<space>* Matches zero or more spaces.
%+ Matches one or more %
.* Matches any character but not of line breaks.
$ Asserts that we are at the end.
(?m)^[ ]*%+.*$
Think you need this.your regex (?m)^[ ]*[%].*?$ does not quantify %.It will match only 1 %.You need to use %+ to match one or more of it.
Related
So, I know from this question how to find all the lines that don't contain a specific string. But it leaves a lot of empty newlines when I use it, for example, in a text editor substitution (Notepad++, Sublime, etc).
Is there a way to also remove the empty lines left behind by the substitution in the same regex or, as it's mentioned on the accepted answer, "this is not something regex ... should do"?
Example, based on the example from that question:
Input:
aahoho
bbhihi
cchaha
sshede
ddhudu
wwhada
hede
eehidi
Desired output:
sshede
hede
[edit-1]
Let's try this again: what I want is a way to use regex replace to remove everything that does not contain hede on the text editor. If I try .*hede.* it will find all hede:
But it will not remove. On a short file, this is easy to do manually, but the idea here is to replace on a larger file, with over 1000+ lines, but that would contain anywhere between 20-50 lines with the desired string.
If I use ^((?!hede).)*$ and replace it with nothing, I end up with empty lines:
I thought it was a simple question, for people with a better understanding of regex than me: can a single regex replace also remove those empty lines left behind?
An alternative try
Find what: ^(?!.*hede).*\s?
Replace with: nothing
Explanation:
^ # start of a line
(?!) # a Negative Lookahead
. # matches any character (except for line terminators)
* # matches the previous token between zero and unlimited times,
hede # matches the characters hede literally
\s # matches any whitespace character (equivalent to [\r\n\t\f\v ])
? # matches the previous token between zero and one times,
Using Notepad++.
Ctrl+H
Find what: ^((?!hede).)*(?:\R|\z)
Replace with: LEAVE EMPTY
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
((?!hede).)* # tempered greedy token, make sure we haven't hede in the line
(?:\R|\z) # non capture group, any kind of line break OR end of file
Screenshot (before):
Screenshot (after):
Have you tried:
.*hede.*
I don't know why you are doing an inverse search for this.
You can use sed like:
sed -e '/.*hede.*/!d' input.txt
I need to cut lines that have 6 or more characters, hyphen, then other characters or symbols. Hyphen and rest of line should be removed. Source text:
0402CS-2
0402CS-3
0402
7812-C
0603CS-1
0603CS-2
0603CS-3
As a result, I need this:
0402CS
0402CS
0402
7812-C
0603CS
0603CS
0603CS
To do that, I use Notepad++ regexp replace feature. Find pattern: ^([^\-]{6,})\-.+$ Replace pattern: \1
But there is no option "multiline", so, symbols "^" and "$" doesn't match ONLY beginning and end of the line and actually I have result:
0402CS
0402CS
0402
7812 <-- that's wrong!
0603CS
0603CS
0603CS
Please advice me how to fix find pattern? Or, maybe there is other handful and powerful free text editor that can do that?
^([^\n\-]{6,})\-.+$
^^
Just use \n as due to [^-] the regex can traverse to line below as use that line to make a match.
See demo.
https://regex101.com/r/BHO93c/1
for the input
0402
7812-C the regex matches both lines as 1 line and makes a match.
See demo if 0402 is not there.
https://regex101.com/r/BHO93c/2
That happens because the [^-] character class also matches a newline.
Add \n to it:
^([^\n-]{6,})-.+$
See the regex online demo (note the m multiline modifier (making ^ match the start of the line, and $ - the end of the line) and g modifier (enabling search for multiple occurrences) that is ON by default in Notepad++).
Note that escaping the hyphen is not necessary inside a character class when it is at the start/end of the class, and you never need to escape the hyphen outside the character class.
I just started PySpark, here is the task:
I have an input of:
I need to use a regex to remove punctuation and all leading or trailing space and underscore. output is all lowercase.
What I came up is not complete:
sentence = regexp_replace(trim(lower(column)), '\\*\s\W\s*\\*_', '')
and the result is:
How do I fix the regex here? I need to use regexp_replace here.
Thank you very much.
You may use
^\W+|\W+$|[^\w\s]+|_
The ^ and $ anchors must match line start/end.
If the pattern must not overflow across lines, replace \W+$ with [^\w\n]+$ and the ^\W+ pattern with ^[^\w\n]+:
^[^\w\n]+|[^\w\n]+$|[^\w\s]+|_
See the regex demo.
Explanation:
^ - start of line (if multiline option is onby default, else, try adding (?m) at the pattern start)
[^\w\n]+ - 1 or more non-word chars (non-[a-zA-Z0-9_]) except a newline
| - or
[^\w\n]+$ - 1 or more non-word chars at the end of the line ($)
| - or
[^\w\s]+ - 1 or more non-word chars except any whitespace
| - or
_ - an underscore.
If you do not really care about Unicode (I used \w, \s that can be made Unicode aware), you may just use a shorter, more simple pattern:
^[^a-zA-Z\n]+|[^a-zA-Z\n]+$|[^a-zA-Z\s]+
See this regex demo.
TL;DR: sentence = column.strip(' \t\n*+_')
If you want to remove characters only from the ends and don't care about unicode, then the basic string strip() function will let you pick characters to strip. It defaults to whitespace, but you can put in whatever you want.
If you want to remove within a string you are stuck with a regular expression or, if using byte strings or Python 2, maketrans.
You may like to look at this question as well.
I would like to add some custom text to the end of all lines in my document opened in Notepad++ that start with 10 and contain a specific word (for example "frog").
So far, I managed to solve the first part.
Search: ^(10)$
Replace: \1;Batteries (to add ;Batteries to the end of the line)
What I need now is to edit this regex pattern to recognize only those lines that also contain a specific word.
For example:
Before: 1050;There is this frog in the lake
After: 1050;There is this frog in the lake;Batteries
You can use the regex to match your wanted lines:
(^(10).*?(frog).*)
the .*? is a lazy quantifier to get the minimum until frog
and replace by :
$1;Battery
Hope it helps,
You should allow any characters between the number and the end of line:
^10.*frog.*
And replacement will be $0;Batteries. You do not even need a $ anchor as .* matches till the end of a line since . matches any character but a line break char.
NOTE: There is no need to wrap the whole pattern with capturing parentheses, the $0 placeholder refers to the whole match value.
More details:
^ - start of a line
10 - a literal 10 text
.* - zero or more chars other than line break chars as many as possible
frog - a literal string
.* - zero or more chars other than line break chars as many as possible
try this
find with: (^(10).*(frog).*)
replace with: $1;Battery
Use ^(10.*frog.*)$ as regex. Replace it with something like $1;Batteries
I'm trying to run through some code files and find lines that don't end in a semicolon.
I currently have this: ^(?:(?!;).)*$ from a bunch of Googling, and it works just fine. But now I want to expand on it so it ignores all the whitespace at the start or specific keywords like package or opening and closing braces.
The end goal is to take something like this:
package example
{
public class Example
{
var i = 0
var j = 1;
// other functions and stuff
}
}
And for the pattern to show me var i = 0 is missing a semi colon. That's just an example, the missing semi colon could be anywhere in class.
Any ideas? I've been fiddling for over an hour but no luck.
Thanks.
If you want a line that doesn't end in a semicolon you can ask for any amount anything .* followed by one character that isn't a semicolon [^;] followed possibly by some whitespace \s* by the end of the line $. So you have:
.*[^;]\s*$
Now if you don't want whitespace at the beginning you need to ask for the beginning of the line ^ followed by any character that isn't whitespace [^\s] followed by the regex from earlier:
^[^\s].*[^;]\s*$
If you don't want it to start with a keyword like package or, say, class, or whitespace you can ask for a character that isn't any of those three things. The regex that matches any of those three things is (?:\s|package|class) and the regex that matches anything except them them is (?!\s|package|class). Note the !. So you now have:
^(?!\s|package|class).*[^;]\s*$
Try this:
^\s*(?!package|public|class|//|[{}]).*(?<!;\s*)$
When tested in PowerShell:
PS> (gc file.txt) -match '^\s*(?!package|public|class|//|[{}]).*(?<!;\s*)$'
var i = 0
PS>
The key to capturing this complicated concept in a regex is to first understand how your regular expression engine/interpreter handles the following concepts:
positive lookahead
negative lookahead
positive lookbehind
negative lookbehind
Then you can begin to understand how to capture what you want, but only in such cases where what's ahead and what's behind is exactly as you specify.
str.scan(/^\s*(?=\S)(?!package.+\n|public.+\n|\/\/|\{|\})(.+)(?<!;)\s*$/)
This is the regular expression line I'm using to highlight lines of Java code that don't end in semicolon and aren't one of the lines in java that aren't supposed to have a semicolon at the end... using vim's regular expression engine.
\(.\+[^; ]$\)\(^.*public.*\|.*//.*\|.*interface.*\|.*for.*\|.*class.*\|.*try.*\|^\s*if\s\+.*\|.*private.*\|.*new.*\|.*else.*\|.*while.*\|.*protected.*$\)\#<!
^ ^ ^
| | negative lookbehind feature
| |
| 2. But not where such matches are preceeded by these keywords
|
|
1. Group of at least some anychar preceeding a missing semicolon
Mnemonics for deciphering glyphs:
^ beginning of line
.* Any amount of any char
+ at least one
[^ ... ] everything but
$ end of line
\( ... \) group
\| delimiter
\#<! negative lookbehind
Which roughly translates to:
Find me all lines that don't end in a semicolon and don't have any of the above keywords/expressions to the left of it. It's not perfect and probably doesn't hold up to obfuscated java, but for simple java programs it highlights the lines that should have semicolons at the end, but don't.
Image showing how this expression is working out for me:
Helpful link that helped me get the concepts I needed:
https://jbodah.github.io/blog/2016/11/01/positivenegative-lookaheadlookbehind-vim/
For just line that don't end in a semicolon, this is simpler:
.*[^;]$
If you don't want lines starting with whitespace and ending with semicolon:
^[^ ].*[^;]$
You are trying to match lines that possibly begin with whitespace ^\s*, then don't have a particular set of words, for example (?!package|class), then have anything .* but then don't end in a semicolon (or a semicolon with whitespace after it) [^;]\s*.
^\s*(?!package|class).*?[^;]\s*$
Note that I added parentheses around a section of the regex.