Delete one and two letter lines in VIM? - regex

I have a data source that contains a bunch of city names, but mixed in are quite a few state abbreviations that shouldn't be there..
Is there a way in VIM to delete each line than contains two characters or less?

Here it is:
:g/^\a\{1,2}$/d
Explanation
delete each line that ... → that calls for a :global command (which defaults to % range, the entire buffer); the executed command is :delete.
two characters or less → in a regular expression, any character is matched by ., to restrict this to 1 or 2 the \{n,m} multi is used. This still needs to be anchored via ^ and $ to the beginning and end of the line, so that additional characters don't make this match. Oh, and if you also want to remove completely empty lines, change this to .\{,2}. See :help /\{ for details.
more robust "characters": . will match any character, i.e. also whitespace. To avoid unwanted matches, it's best to restrict this as much as possible. If your state abbreviations are only alphabetic, you can use the \a atom instead of . The available character classes start in the help at :help /\i.

Related

Extract specific string using regular expression

I want to extract only a specific string if its match
example as an input string:
13.10.0/
13.10.1/
13.10.2/
13.10.3/
13.10.4.2/
13.10.4.4/
13.10.4.5/
I'm using this regex [0-9]+.[0-9]+.[0-9] to extract only digit.digit.digit from a string if its match
but in that case, this is the wrong output related to my regex :
13.10.0
13.10.1
13.10.2
13.10.3
13.10.4.2 (no need to match this string 13.10.4 )
13.10.4.4 (no need to match this string13.10.4 )
13.10.4.5(no need to match this string 13.10.4 )
the correct output that I need :
13.10.0
13.10.1
13.10.2
13.10.3
It's hard to say without knowing how you're passing these strings in -- are they lines in a file? An array of strings in a programming language?
If you're searching a file using grep or a similar tool, it will give you all lines that match anywhere, even if only part of the line matches.
Normally, you'd deal with this using anchors to specify the regex must start on the first character of the line, and end on the last (e.g. ^[0-9]+.[0-9]+.[0-9]$). ^ matches the start of the line, and $ matches at the end.
In your case, you've got slashes at the end of all the lines, so the easiest fix is to match that final slash, with ^[0-9]+.[0-9]+.[0-9]/.
You could also use lookahead or groups to match the slash without returning it -- but that depends a bit more on what tool you're running this regex in and how you're processing it.
If your strings are separated by whitespace (other than newlines), replacing ^ with (^|\s) (either the beginning of the string, or some whitespace character) may work -- but it will add a leading space to some of your results.
You may also need to set your regex tool to match multiple times in a line (e.g. the -o flag in grep). Again, it's hard to give useful advice about this without knowing what regular-expression tool you're using, or how you're processing the results.
I think you want:
^\d+\.\d+\.\d+$
Which is exactly 3 groups of digit(s) separates by (literal) dots.
Some tools (like grep) match all lines that contain your regex, and may have additional characters before/after.
Use $ character to match end of line after your regex. (Also note, that . matches any character, not literal dot)
[0-9]+\.[0-9]+\.[0-9]$

How do I regex search in x and y for a, and only include the replacement of y if a was found in x?

I need to search through a larger text file.
This is an example of what I'm searching through.
https://pastebin.com/JFVy2TEt
recipes.addShaped("basemetals:adamantine_arrow", <basemetals:adamantine_arrow> * 4, [[<ore:nuggetAdamantine>], [<basemetals:adamantine_rod>], [<minecraft:feather>]]);
I need to look for lines that match a specific part in the first argument.
For example the "_arrow" part in the above line.
And erase everything that doesn't match on the "_arrow" in the first argument.
And the arguments differ across all of them.
And also with different names in the place where "basemetals:adamantine" is in the above line.
And since the further arguments are all different I can't wrap my head around on how to include the end only when the first thing matches.
Edit: The end goal being to ease sort my 3k+ line text file.
basic, blacksmith, carpenter, chef, chemist, engineer, farmer, jeweler, mage, mason, scribe, tailor
I think what you're trying to do is filter your text file by removing lines that don't fit a set criteria. I've chosen the Atom text editor for this solution (because I'm running Windows OS and can't install gedit, and I want to ensure you have a working example).
To remove only lines that don't have a first argument ending in _arrow, one could do (?!recipes\.addShaped\("[^"]+_arrow")recipes.+\r?\n? and replace with nothing.
As a note: this task is made more difficult by Atom's low regex support. In a more well-supported environment, my answer would probably be ^recipes\.addShaped("[^"]+(?<!_arrow)").+\r?\n? (with multiline mode).
Also, please read "What should I do when someone answers my question?".
Regex explained:
(?! ) is a negative lookahead, which peeks at the succeeding text to ensure it doesn't contain "_arrow" at end of the first argument.
\. is an escaped literal period
[^"] is a character class that signifies a character that is not a ".
+ is a quantifier which tells the regex to match the preceding character or subexpression as many times as possible, with a minimum of one time.
. is a wildcard, representing any character
\r?\n? is used to match any kind of newline, with the ? quantifier making each character optional.
Everything else it literal characters; it represents exactly what it matches.

Find/Match every similar words in word list in notepad++

I have a word list in alphabetical order.
It is ranked as a column.
I do not use any programming languages.
The list in notepad format.
I need to match every similar words and take them on same line.
I use regex but I can't achieve correct results.
First list is like:
accept
accepted
accepts
accepting
calculate
calculated
calculates
calculating
fix
fixed
A list I want:
accept accepted accepts accepting
calculate calculated calculates calculating
fix fixed
This seems to work, but you will have to do Replace All multiple times:
Find (^(.+?)\s*?.*?)\R\2 and replace with \1\t\2. . matches newline should be disabled.
How it works:
It finds some characters at the start of line ^(.+?), then any linebreak \R, and those same characters again \2.
\s*?.*? is used to skip unnecessary characters after multiple Replace All. \s*? skips the first whitespace, and .*? any remaining chars on the line.
Match is replaced with \1\t\2, where \1 is anything matched in (^(.+?)\s*?.*?), and \2 is anything matched with (.+?). \t is used to insert tab character to replace linebreak.
How it breaks:
Note that this will not work well with different words with similar prefix, like:
hand
hands
handle
handles
This will be hand hands handle handles after 2 replaces.
I can imagine doing this programatically with limited success (take first word which comes as a root and if derived word with this root follows, place it on the same line, else take the word as a new root and put it to new line). This will still fail at irregular words where root is not the same for all forms.
Without programming there is a way only with (manual) preprocessing – if there are less than 4 forms for given word in the list, you insert blank line for each missing verb form, so there are always 4 lines for each word. Then you can use regex to get each such a quadruple into one line.

QRegExp match lines containing N words all at once, but regardless of order (i.e. Logical AND)

I have a file containing many lines of text, and I want to match only those lines that contain a number of words. All words must be present in the line, but they can come in any order.
So if we want to match one, two, three, the first 2 lines below would be matched:
three one four two <-- match
four two one three <-- match
one two four five
three three three
Can this be done using QRegExp (without splitting the text and testing each line separately for each word)?
Yes it is possible. Use a lookahead. That will check the following parts of the subject string, without actually consuming them. That means after the lookahead is finished the regex engine will jump back to where it started and you can run another lookahead (of course in this case, you use it from the beginning of the string). Try this:
^(?=[^\r\n]*one)(?=[^\r\n]*two)(?=[^\r\n]*three)[^\r\n]*$
The negated character classes [^\r\n] make sure that we can never look past the end of the line. Because the lookaheads don't actually consume anything for the match, we add the [^\r\n]* at the end (after the lookaheads) and $ for the end of the line. In fact, you could leave out the $, due to greediness of *, but I think it makes the meaning of the expression a bit more apparent.
Make sure to use this regex with multi-line mode (so that ^ and $ match the beginning of a line).
EDIT:
Sorry, QRegExp apparently does not support multi-line mode m:
QRegExp does not have an equivalent to Perl's /m option, but this can be emulated in various ways for example by splitting the input into lines or by looping with a regexp that searches for newlines.
It even recommends splitting the string into lines, which is what you want to avoid.
Since QRegExp also does not support lookbehinds (which would help emulating m), other solutions are a bit more tricky. You could go with
(?:^|\r|\n)(?=[^\r\n]*one)(?=[^\r\n]*two)(?=[^\r\n]*three)([^\r\n]*)
Then the line you want should be in capturing group 1. But I think splitting the string into lines might make for more readable code than this.
You can use the MultilineOption PatternOption from the new Qt5 QRegularExpression like:
QRegularExpression("\\w+", QRegularExpression::MultilineOption)

Replace all characters in a regex match with the same character in Vim

I have a regex to replace a certain pattern with a certain string, where the string is built dynamically by repeating a certain character as many times as there are characters in the match.
For example, say I have the following substitution command:
%s/hello/-----/g
However, I would like to do something like this instead:
%s/hello/-{5}/g
where the non-existing notation -{5} would stand for the dash character repeated five times.
Is there a way to do this?
Ultimately, I'd like to achieve something like this:
%s/(hello)*/-{\=strlen(\0)}/g
which would replace any instance of a string of only hellos with the string consisting of the dash character repeated the number of times equal to the length of the matched string.
%s/\v(hello)*/\=repeat('-',strlen(submatch(0)))/g
As an alternative to using the :substitute command (the usage of
which is already covered in #Peter’s answer), I can suggest automating
the editing commands for performing the replacement by means of
a self-referring macro.
A straightforward way of overwriting occurrences of the search pattern
with a certain character by hand would the following sequence of
Normal-mode commands.
Search for the start of the next occurrence.
/\(hello\)\+
Select matching text till the end.
v//e
Replace selected text.
r-
Repeat from step 1.
Thus, to automate this routine, one can run the command
:let[#/,#s]=['\(hello\)\+',"//\rv//e\rr-#s"]
and execute the contents of that s register starting from the
beginning of the buffer (or anther appropriate location) by
gg#s