Regex to replace spaces with tabs at the start of the line - regex

I'd like to be able to fix a Text File's tabs/spaces indentation.
Currently each line has spaces in random locations for some reason.
For example:
space tab if -> tab if
space tab space tab if -> tab tab if
tab tab space if -> tab tab if
etc.
It should not affect anything after the first word, so only the indentation will be affected: So tab space if space boolean should be changed to tab if space boolean not tab if tab boolean.
The regex command should keep the correct number of tabs and just remove the spaces.
If there are 4 spaces in a row it should be converted to a tab instead:
space space space space -> tab
Thank you for your help. If you could also explain how your regex works it would be very much appreciated as I'm trying to learn how to do my own regex instead of always asking others to do it.
If you need any more information or details please ask I'll respond as quickly as I can.
I can accomplish this for a single case at a time like so:
For spaces first: Find: space*if Replace: if This only works for lines with no tabs and where the first word is if so I would do this for the starting word of the line.
Then I would repeat with space*\tif.
Looks like I can match a word without capturing by doing (?:[A-Za-z]) So I can just swap out the if for this and it'll work better.

You could probably do this in one step, but I'm more partial to simple approaches.
Translate the 4 spaces to tabs first. First line is the match, second is the replace.
^(\s*)[ ]{4}(\s*)
$1\t$2
Then replace all remaining single spaces with nothing.
^(\t*)[ ]+
$1
You don't need the square brackets in this case, but it's a little hard to be sure that there's a space, even with SO's code formatting.
The first line searches for the start of the line ^, then finds any amount of whitespace (including tabs) and puts them in a matching group later named $1 with (\s*). The middle finds exactly four spaces [ ]{4}. The last part repeats the matching group in case there are tabs or more spaces on that side, too.
Since the second match is supposed to be finding all the remaining spaces, the second just looks for 0 or more tabs, puts them in a capture group, and then finds any remaining spaces left. Since it finds and replaces as it goes along, it gobbles up all spaces and replaces with tabs.

Related

How to use backreference to replace part of the string?

.*\t.*\t.*\t.*
I have a 4-column table with 3 tabs as above. How can I replace the 2nd and 3rd tabs as comma in vim? I was trying to use vim to do that, but failed.
Here's one way to do it:
:%s/\(\t.\{-}\)\#<=\t/,/g
It uses a look-behind match to find a previously occurring tab character on the line, so it will match all tabs except for the first, so it will replace the 2nd, 3rd, 4th, etc. tab characters with commas. See :help /\#<= for help on the look-behind operator.
Another way, matching only the second and third tab of a line, and only lines with at least two tab characters, is to use a backreference \1 to store and refer to the contents in between the tabs.
:%s/\t.\{-\}\zs\t\(.\{-}\)\t/,\1,/
This also uses .\{-}, which matches 0 or more characters, but is non-greedy (so it tries to match the smallest sequence possible and stays close to the beginning of the line) and also the \zs marker to only start the replacement at that part of the match (just before the second tab of the line.) Again, see Vim's help docs on search patterns for more details on all those.

Find and Partially Replace Notepad++ Regex

I have a file with a file with lines containing a space, 9 digits, 6 spaces and 5C18. Finding it is easy I'm using
\s\d{9}\s{6}\5C18
The problem is that I need to replace the space at the beginning of the line with a letter, say F. So that everything else remains in tact. Every time I try to do it the entire line is replaced with the expression. I know this is probably something stupidly basic but any help would be appreciated.
Move the part that you do not wish to replace into a lookahead expression:
^\s(?=\d{9}\s{6}5C18)
Now the portion in (?= ... ) is not considered part of the match; only the initial space is. Hence, running a replace with this regex would let you replace the initial space with whatever characters that you want.
It's text on a single line. The F needs to go where that first space is at the beginning of the line.
Note the use of ^ anchor to ensure that the match of the initial space is tied to the beginning of the line.

Find Tab at beginning of string and replace

I am using this to find ^\t+ a tab at the beginning of the string and replace it with a space, the issue is that if the string has more than one tab it wont replace it with multiple spaces. how can i replace the tab on the beginning with the same amount of spaces?
You may use
\G\t
See the regex demo
The \G matches the start of string and the end of the previous successful match and \t will match 1 tab. With multiple search mode enabled (global mode), you will replace each tab at the start of the string with a space.
If you deal with tabs at the beginning of a line, you may use
(?:^|\G)\t
This expression was tested and works well in Notepad++.

Find/Match every similar words in word list in notepad++

I have a word list in alphabetical order.
It is ranked as a column.
I do not use any programming languages.
The list in notepad format.
I need to match every similar words and take them on same line.
I use regex but I can't achieve correct results.
First list is like:
accept
accepted
accepts
accepting
calculate
calculated
calculates
calculating
fix
fixed
A list I want:
accept accepted accepts accepting
calculate calculated calculates calculating
fix fixed
This seems to work, but you will have to do Replace All multiple times:
Find (^(.+?)\s*?.*?)\R\2 and replace with \1\t\2. . matches newline should be disabled.
How it works:
It finds some characters at the start of line ^(.+?), then any linebreak \R, and those same characters again \2.
\s*?.*? is used to skip unnecessary characters after multiple Replace All. \s*? skips the first whitespace, and .*? any remaining chars on the line.
Match is replaced with \1\t\2, where \1 is anything matched in (^(.+?)\s*?.*?), and \2 is anything matched with (.+?). \t is used to insert tab character to replace linebreak.
How it breaks:
Note that this will not work well with different words with similar prefix, like:
hand
hands
handle
handles
This will be hand hands handle handles after 2 replaces.
I can imagine doing this programatically with limited success (take first word which comes as a root and if derived word with this root follows, place it on the same line, else take the word as a new root and put it to new line). This will still fail at irregular words where root is not the same for all forms.
Without programming there is a way only with (manual) preprocessing – if there are less than 4 forms for given word in the list, you insert blank line for each missing verb form, so there are always 4 lines for each word. Then you can use regex to get each such a quadruple into one line.

Regex: remove lines not starting with a digit

I have been fighting this problem with the help of a RegEx cheat sheet, trying to figure out how to do this, but I give up... I have this lengthy file open in Notepad++ and would like to remove all lines that do not start with a digit (0..9). I would use the Find/Replace functionality of N++. I am only mentioning this as I am not sure what Regex implementation is N++ using... Thank you
Example. From the following text:
1hello
foo
2world
bar
3!
I would like to extract
1hello
2world
3!
not:
1hello
2world
3!
by doing a find/replace on a regular expression.
You can clear up those line with ^[^0-9].* but it will leave blank lines.
Notepad++ use scintilla, and also using its regex engine to match those.
\r and \n are never matched because in
Scintilla, regular expression searches
are made line per line (stripped of
end-of-line chars).
http://www.scintilla.org/SciTERegEx.html
To clear up those blank lines, only way is choose extended mode, and replace \n\n to \n, If you are in windows mode change \r\n\r\n to \r\n
[^0-9] is a regular expression that matches pretty much anything, except digits. If you say ^[^0-9] you "anchor" it to the start of the line, in most regular expression systems. If you want to include the rest of the line, use ^[^0-9].+.
^[^\d].* marks a whole line whose first character is not a digit. Check if there are really no whitespaces in front of the digits. Otherwise you'd have to use a different expression.
UPDATE:
You will have to do ot in two steps. First empty the lines that do not start with a digit. Then remove the empty lines in extended mode.
One could also use the technique of bookmarking in Notepad++. I started benefiting from this feature (long time present but only more recently made somewhat more visible in the UI) not very long ago.
Simply bring up the find dialogue, type regex for lines not starting with digit ^\D.*$ and select Mark All. This will place blue circles, like marbles, in the left gutter - these are line bookmarks. Then just select from main menu Search -> Bookmark -> Remove bookmarked lines.
Bookmarks are cool, you could extract these lines by simply selecting to copy bookmarked lines, opening new document and pasting lines there. I sometimes use this technique when reviewing log files.
I'm not sure what you are asking. but the reg exp for finding the lines with a digit at the beginning would be
^\d.*
you can remove all the lines that match the above or alternatly keep all the lines that match this expression:
^[^\d].*