Going through lines and doing deletion of a text on particular criteria - regex

Is it possible to delete everything after first word in a line using Notepad++? I will need this to go through each line.
Also I would like to know if it is possible to delete only first word in a line, leaving everything behind intact.

You should make replacement with Regex. Then in "Find what" you put
^(\S+)(.*)$
and in Replace with you put
$1
I assume you separate the first word with space, tab or something similar
For the second task you put
$2
in "Replace with" instead, but it leaves the first character like space or tab intact, if that's what you want.

Related

Regex to replace spaces with tabs at the start of the line

I'd like to be able to fix a Text File's tabs/spaces indentation.
Currently each line has spaces in random locations for some reason.
For example:
space tab if -> tab if
space tab space tab if -> tab tab if
tab tab space if -> tab tab if
etc.
It should not affect anything after the first word, so only the indentation will be affected: So tab space if space boolean should be changed to tab if space boolean not tab if tab boolean.
The regex command should keep the correct number of tabs and just remove the spaces.
If there are 4 spaces in a row it should be converted to a tab instead:
space space space space -> tab
Thank you for your help. If you could also explain how your regex works it would be very much appreciated as I'm trying to learn how to do my own regex instead of always asking others to do it.
If you need any more information or details please ask I'll respond as quickly as I can.
I can accomplish this for a single case at a time like so:
For spaces first: Find: space*if Replace: if This only works for lines with no tabs and where the first word is if so I would do this for the starting word of the line.
Then I would repeat with space*\tif.
Looks like I can match a word without capturing by doing (?:[A-Za-z]) So I can just swap out the if for this and it'll work better.
You could probably do this in one step, but I'm more partial to simple approaches.
Translate the 4 spaces to tabs first. First line is the match, second is the replace.
^(\s*)[ ]{4}(\s*)
$1\t$2
Then replace all remaining single spaces with nothing.
^(\t*)[ ]+
$1
You don't need the square brackets in this case, but it's a little hard to be sure that there's a space, even with SO's code formatting.
The first line searches for the start of the line ^, then finds any amount of whitespace (including tabs) and puts them in a matching group later named $1 with (\s*). The middle finds exactly four spaces [ ]{4}. The last part repeats the matching group in case there are tabs or more spaces on that side, too.
Since the second match is supposed to be finding all the remaining spaces, the second just looks for 0 or more tabs, puts them in a capture group, and then finds any remaining spaces left. Since it finds and replaces as it goes along, it gobbles up all spaces and replaces with tabs.

NOTEPAD ++ List: How to put each word on new line

If this cane be done with notepad++ I'm sure it's something simple I'm looking over. Or if there is another way i'm all ears.
I have a list of 10,000 - 20,000 words. Each word is a single word. No spaces in any one word but a single space between each and every word.
All the words are in a straight line format and rap-around. I would like to put each word on a new line all the way down my txt file. I need this as I need to be able to append something on the front and back of each word. That I can do. But I do not have the the 24 hours its going to take to drop each word manually. Any ideas? Thanks!
use the Replace function search for space and replace with \n remember to use the extended option.
I tried with the extended version but it didn't work for me so I tried with regular expression it works for me. Here is step by step process:
First type ctrl + H on windows. (Find & Replace)
In find section type: [ ]+ (there is single space between the brackets)
In the replace section type: \n
Select the Regular Expression option.
Finally, click on find & replace all.
It will automatically put all words in the new line.
Hope it will work for you as well!

How can I remove all the text between matches on a line?

I have this problem:
Input text:
this is my text text text and more text
this is my text myspace this is my text
this space is my text space this is my
this is my text this is my text
this space is my text space space myspace
Let say I want to search for "space"
I would like to have this as output:
this is my text text text and more text
space
space space
this is my text this is my text
space space space space
Matches on the same line have to be separated with a space.
Line without matches must remain as it is.
Same for all other search items.
I'm trying to realize this, this afternoon but without success.
Can anyone help me?
Solution:
:g/space/s/\(.*space\).*$/\1/|s/.\{-}space/ space/g|s/^ //
Explanation:
This is tricky, but it can be done. It can't be done with a single regular expression, though.
The first thing we do is get rid of anything after the last match (we actually exploit the fact that regular expressions are greedy by default here):
s/\(.*space\).*$/\1/
Then we remove anything between all the internal matches (notice we use the lazy version of * here, \{-}):
s/.\{-}space/ space/g
The previous step will leave an initial space in the result, so we get rid of that:
s/^ //
Fortunately, in vim, we can chain replacements together with the | character. So, putting it all together:
:g/space/s/\(.*space\).*$/\1/|s/.\{-}space/ space/g|s/^ //
is this tricky line ok for you?
:g/space/s/space/^G/g|s/[^^G]//g|s/^G/space /g
the ^G above you need press Ctrl-V Ctrl-G
the output of above command is same as your example except for the ending whitespace after pattern (space in this case). but it is easy to be fixed, e.g. chain another s/ $// after the :g line.
Kent's solution uses a nice trick that makes it work only for fixed strings, but it's clean and short. Ethan Brown's answer is more general, but also adds complexity with its three steps. I think the best solution can be developed based on the accepted answer in this very similar question.
Contrary to what Ethan Brown assumes, this can indeed be done with a single regular expression substitution. Here it is, in all its ugliness:
:g/space/s/\%(^\|\%(space \)*space\%( \%(.*space\)\#=\)\?\)\zs\%(\%(space \)*space\%( \%(.*space\)\#=\)\?\)\#!.\{-1,}\ze\%(\%(space \)*space\%( \%(.*space\)\#=\)\?\|$\)//g
It becomes somewhat more readable when you use the :DeleteExcept command from my PatternsOnText plugin:
:g/space/DeleteExcept/\%(space \)*space\%( \%(.*space\)\#=\)\?/
Explanation
This deletes everything except
potentially multiple sequential occurrences \%(space \)*
of the word space
including the trailing whitespace when it's not the last match in the line, i.e. there's a following match \%(.*space\)\#= so that the whitespace is not swallowed
or excluding (i.e. deleting) it \? after the last match in the line.
More practical alternative
Though it's a nice challenge to come up with the above solution, in practice, I would also favor a two-step approach, just because it's way simpler:
:g/space/DeleteExcept/space\%( \|$\)/
This leaves behind trailing whitespace that can be pruned with
:%s/ $//

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.
Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.
You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.
Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.

Using Regex in Word to Mass Modify Entries

How can I find the expression '([anumber][anumber],' in word?
I have [0-9][0-9], but this is repeated a few times per option therefore it removes everything in the pattern. How do i either dictate either to remove **(**[0-9][0-9], with that left parentheses or only remove [0-9][0-9], for the first instance of it in each line?
Let's clarify your problem first:
You have something like (0123, (4567, etc., and
you want to make them (23, (67, respectively.
In the replace dialog,
put ([(])[0-9][0-9] in the find box, and
put \1 in the replace with box.
Actually, put ( in the replace with box is just fine, but \1 is a more flexible option.