searching for text that contain and not contain text and symbol - regex

This regex code works correctly for searching for lines that begins with an exclamation mark and does not contain colon : symbol
^!([^:\n]*)$
In addition to the regex code above, I need it to contain lines of text that has the word "spelling" in it, like this code below but does not work.
^!([^:\n]spelling*)$

You could do this:
^![^:\n]*spelling[^:\n]*$
If you are looping through a file line by line, as is typical, there is no need to exclude the newlines from the match:
^![^:]*spelling[^:]*$
Another option to consider when you have complex requirements is breaking the match down into mutiple steps. This makes for simpler, easier to understand code that is less error-prone:
if (/^!/ and /spelling/ and not /:/)

spelling*
matches
spellin
spelling
spellingg
spellinggg
etc. You were trying for
^([^:\n]*spelling[^\n]*)$
aka
^([^:\n]*spelling.*)$ # Assuming /s isn't used
But that would allow : after spelling, so you really want
^([^:\n]*spelling[^:\n]*)$

What about ^([^:\n]*spelling.*)$ ?
Adding .* allows any character (except newline) to be present after 'spelling'

Related

Search for entire word containing specific keyword in Notepad++ using regular expressions

I use Notepad++,
i need to search and replace entire word that contain a specific keyword.
Ex: someting HELP.blablabla.blabla someting
i would like to search entire text for words that contain the keyword "HELP" untill the first space OR the first comma.
In this case: HELP.blablabla.blabla
thanks a lot
Go to the search panel, check the regex checkbox on the bottom and try: (HELP)([^ ,]*)
Note: There are a space character after the ^
This regex means: Search for the entire word HELP (HELP) followed by anything that it isn't an space or an comma [^ ,] the ^ inside the brackets is a denial
Edit:
You can use just HELP[^ ,]* the parenthesis is just to create capturing groups if you need to use the specific groups to replace later. As pointed by #alphabravo
You say search and replace an entire word but if it were that simple then I wonder why a regular search and replace isn't sufficient. So I'm reading between the lines and assuming you want to match on full lines of text.
I think I've used npp enough to get the syntax right. I don't remember any eccentricities that would apply. Is the comma/space optional?
^[^, ]*HELP[^, ]*[, ]
I'm kinda thinking this one might be good enough:
^[^, ]*HELP

Regular expression to delete row from csv

I have a line from CSV
first decimal;;;first text;;second text with newlines, special symbols, including semicolons;second decimal, always present;first dot separated float, may not present;second dot separated float, may not present;third text that present only if present previous float
I need to delete second text (with new lines and special symbols).
As for now I have expression like:
(?<=;;)(.*?)(?=;\d+)
First part of it does not work, and I don't know how to make it select text preceded by only two semicolons (for now it selects text preceded by two or more semicolons and first decimal preceded by semicolons + newline if I turn on dotall). Besides, I do not know how to include newline symbol here (.*?).
If you have a CSV file that contains semicolons and newlines as part of quoted fields, then regex is not the right tool for this. Imagine what would happen if you had a field like "This is one field;;don't split this;42"...
If you're sure that you'll never have two semicolons before or within a quoted field, then you may give regex a try. But a dedicated CSV parser would definitely be a safer bet.
That said, let's see why your regex fails:
Imagine the line 1;;;2;3. Your regex will match ;2 because it fulfills all the requirements - there are two semicolons before it, and a semicolon plus digit after it. It's also the shortest possible match at this position in the string.
What can you do? You could use another lookbehind assertion to make sure that it's not possible to match three semicolons before the current position:
(?<=;;)(?<!;;;)(.*?)(?=;\d+)
Give it a try - but look into CSV libraries too, because they will solve your problem better.

Notepad++ Regex: Find all 1 and 2 letter words

I’m working with a text file with 200.000+ lines in Notepad++. Each line has only one word. I need to strip out and remove all words which only contains one letter (e.g.: I) and words which contains only two letters (e.g.: as).
I thought I could just pas in regular regex like this [a-zA-Z]{1,2} but I does not recognize anything (I’m trying to Mark them).
I’ve done manual search and I know that there do exists words of that length so therefor can it only be my regex code that’s wrong. Anyone knows how to do this in Notepad++ ???
Cheers,
- Mestika
If you want to remove only the words but leave the lines empty, this works:
^[a-zA-Z]{1,2}$
Replace this with an empty string. ^ and $ are anchors for the beginning and the end of a line (because Notepad++'s regexes work in multi-line mode).
If you want to remove the lines completely, search for this:
^[a-zA-Z]{1,2}\r\n
And replace with an empty string. However, this won't work before Notepad++ 6, so make sure yours is up-to-date.
Note that you will have to replace \r\n with the specific line-endings of your file!
As Tim Pietzker suggested, a platform independent solution that also removes empty lines would be:
^[a-zA-Z]{1,2}[\r\n]+
A platform-independent solution that does not remove empty lines but only those with one or two letters would be:
^[a-zA-Z]{1,2}(\r\n?|\n)
I don't use Notepad++ but my guess is it could be because you have too many matches - try including word boundaries (your exp will match every set of 2 letters)
\b[a-zA-Z]{1,2}\b
The regex you specified should find 1-or-2 characters (even in Notepad++'s Find-dialog), but not in the way you'd think. You want to have the regex make sure it starts at the beginning of the line and ends at the end with ^ and $, respecitevely:
^[a-zA-Z]{1,2}$
Notepad++ version 6.0 introduced the PCRE engine, so if this doesn't work in your current version try updating to the most recent.
You seem to use the version of Notepad++ that doesn't support explicit quantifiers: that's why there's no match at all (as { and } are treated as literals, not special symbols).
The solution is to use their somewhat more lengthy replacement:
\w\w?
... but that's only part of the story, as this regex will match any symbol, and not just short words. To do that, you need something like this:
^\w\w?$

How to read this command to remove all blanks at the end of a line

I happened across this page full of super useful and rather cryptic vim tips at http://rayninfo.co.uk/vimtips.html. I've tried a few of these and I understand what is happening enough to be able to parse it correctly in my head so that I can possibly recreate it later. One I'm having a hard time getting my head wrapped around though are the following two commands to remove all spaces from the end of every line
:%s= *$== : delete end of line blanks
:%s= \+$== : Same thing
I'm interpreting %s as string replacement on every line in the file, but after that I am getting lost in what looks like some gnarly variation of :s and regex. I'm used to seeing and using :s/regex/replacement. But the above is super confusing.
What do those above commands mean in english, step by step?
The regex delimiters don't have to be slashes, they can be other characters as well. This is handy if your search or replacement strings contain slashes. In this case I don't know why they use equal signs instead of slashes, but you can pretend that the equals are slashes:
:%s/ *$//
:%s/ \+$//
Does that make sense? The first one searches for a space followed by zero or more spaces, and the second one searches for one or more spaces. Each one is anchored at the end of the line with $. And then the replacement string is empty, so the spaces are deleted.
I understand your confusion, actually. If you look at :help :s you have to scroll down a few pages before you find this note:
*E146*
Instead of the '/' which surrounds the pattern and replacement string, you
can use any other character, but not an alphanumeric character, '\', '"' or
'|'. This is useful if you want to include a '/' in the search pattern or
replacement string. Example:
:s+/+//+
I do not know vim syntax, but it looks to me like these are sed-style substitution operators. In sed, the / (in s/REGEX/REPLACEMENT/) can be uniformly replaced with any other single character. Here it appears to be =. So if you mentally replace = with /, you'll get
:%s/ *$//
:%s/ \+$//
which should make more sense to you.

how to eliminate dots from filenames, except for the file extension

I have a bunch of files that look like this:
A.File.With.Dots.Instead.Of.Spaces.Extension
Which I want to transform via a regex into:
A File With Dots Instead Of Spaces.Extension
It has to be in one regex (because I want to use it with Total Commander's batch rename tool).
Help me, regex gurus, you're my only hope.
Edit
Several people suggested two-step solutions. Two steps really make this problem trivial, and I was really hoping to find a one-step solution that would work in TC. I did, BTW, manage to find a one-step solution that works as long as there's an even number of dots in the file name. So I'm still hoping for a silver bullet expression (or a proof/explanation of why one is strictly impossible).
It appears Total Commander's regex library does not support lookaround expressions, so you're probably going to have to replace a number of dots at a time, until there are no dots left. Replace:
([^.]*)\.([^.]*)\.([^.]*)\.([^.]*)$
with
$1 $2 $3.$4
(Repeat the sequence and the number of backreferences for more efficiency. You can go up to $9, which may or may not be enough.)
It doesn't appear there is any way to do it with a single, definitive expression in Total Commander, sorry.
Basically:
/\.(?=.*?\.)//
will do it in pure regex terms. This means, replace any period that is followed by a string of characters (non-greedy) and then a period with nothing. This is a positive lookahead.
In PHP this is done as:
$output = preg_replace('/\.(?=.*?\.)/', '', $input);
Other languages vary but the principle is the same.
Here's one based on your almost-solution:
/\.([^.]*(\.[^.]+$)?)/\1/
This is, roughly, "any dot stuff, minus the dot, and maybe plus another dot stuff at the end of the line." I couldn't quite tell if you wanted the dots removed or turned to spaces - if the latter, change the substitution to " \1" (minus the quotes, of course).
[Edited to change the + to a *, as Helen's below.]
Or substitute all dots with space, then substitute [space][Extension] with .[Extension]
A.File.With.Dots.Instead.Of.Spaces.Extension
to
A File With Dots Instead Of Spaces Extension
to
A File With Dots Instead Of Spaces.Extension
Another pattern to find all dots but the last in a (windows) filename that I've found works for me in Mass File Renamer is:
(?!\.\w*$)\.
I don't know how useful that is to other users, but this page was an early search result and if that had been on here it would have saved me some time.
It excludes the result if it's followed by an uninterrupted sequence of alphanumeric characters leading to the end of the input (filename) but otherwise finds all instances of the dot character.
You can do that with Lookahead. However I don't know which kind of regex support you have.
/\.(?=.*\.)//
Which roughly translates to Any dot /\./ that has something and a dot afterwards. Obviously the last dot is the only one not complying. I leave out the "optionality" of something between dots, because the data looks like something will always be in between and the "optionality" has a performance cost.
Check:
http://www.regular-expressions.info/lookaround.html