Regex to remove the first 2 lines of a text file - regex

I am trying to delete only the first 2 lines of a text file.
I tried using \A.*, but this gets the first line and deletes the rest.
Is there a way to do the inverse?

It is maybe not the most convenient way, but it is possible with Regex:
^.*\n.*\n([\s\S]*)$
With default settings (neither single-line nor multi-line modifiers) the '.' captures everything, except newline. Therfore, .*\n captures one line, including the new line character. Repeat it twice, and we are at the beginning of the third line. Now capture all characters, including the new line character ([\s\S] is a nice workaround for this behavior) until the end of the file $.
Then substitute by the first capturing group
\1
and you have everything but the first 2 lines.
The details depend on your regex engine, how you give the substitute string. And depending on the platform or the used new line character of the file, you might need to exchange the \n with \r\n or \r or the one that matches it all (\r\n?|\n).
Here is a working Demo.

Related

How to use backreference to replace part of the string?

.*\t.*\t.*\t.*
I have a 4-column table with 3 tabs as above. How can I replace the 2nd and 3rd tabs as comma in vim? I was trying to use vim to do that, but failed.
Here's one way to do it:
:%s/\(\t.\{-}\)\#<=\t/,/g
It uses a look-behind match to find a previously occurring tab character on the line, so it will match all tabs except for the first, so it will replace the 2nd, 3rd, 4th, etc. tab characters with commas. See :help /\#<= for help on the look-behind operator.
Another way, matching only the second and third tab of a line, and only lines with at least two tab characters, is to use a backreference \1 to store and refer to the contents in between the tabs.
:%s/\t.\{-\}\zs\t\(.\{-}\)\t/,\1,/
This also uses .\{-}, which matches 0 or more characters, but is non-greedy (so it tries to match the smallest sequence possible and stays close to the beginning of the line) and also the \zs marker to only start the replacement at that part of the match (just before the second tab of the line.) Again, see Vim's help docs on search patterns for more details on all those.

Regex: how to remove the last, empty line?

I want to remove the very last, empty line from a file using regex search-replace. Matching a new line with an end-of-line marker:
\n$
seem to be a step in a good direction, but it simply matches all empty lines (new lines character followed by an empty line, to be precise):
I'm using Sublime on Windows, if the line ending characters convention and regex engine does matter.
You can use \s*\Z to select all whiltespaces including newlines and \Z marks the end of input and replace it with empty string.
This will indeed get rid of all the newlines at the end of text (one or more) even when those newlines may contain spaces (not easily visible), which might be helpful, because in general we want to get rid of extra useless lines at the end of text in file.
Just in case if you want to get rid of ONLY ONE line from end of file, you can use \n\Z instead of \s*\Z.
Please check following screenshots demonstrating same.
Before replace,
After replace,
The following regex should help you achieve it
\n\s*$(?!\n)
It begins at line 6, and matches everything at line 7 and deletes it.
Basically it searches for the line that is empty and doesn't have a carriage return at the end
Demo 1
Look close, you'll see that line 7 has disappeared in the replacement
Demo 2 (in Visual Studio Code)
Before
After

Find and Partially Replace Notepad++ Regex

I have a file with a file with lines containing a space, 9 digits, 6 spaces and 5C18. Finding it is easy I'm using
\s\d{9}\s{6}\5C18
The problem is that I need to replace the space at the beginning of the line with a letter, say F. So that everything else remains in tact. Every time I try to do it the entire line is replaced with the expression. I know this is probably something stupidly basic but any help would be appreciated.
Move the part that you do not wish to replace into a lookahead expression:
^\s(?=\d{9}\s{6}5C18)
Now the portion in (?= ... ) is not considered part of the match; only the initial space is. Hence, running a replace with this regex would let you replace the initial space with whatever characters that you want.
It's text on a single line. The F needs to go where that first space is at the beginning of the line.
Note the use of ^ anchor to ensure that the match of the initial space is tied to the beginning of the line.

Language Syntax Highlight - Comment Line Starts With * may or may not have following words

I am creating a syntax highlight file for a language and I have everything mapped out and working with one exception.
I cannot come up with a regex that will match the following conditions for a specific line comment style.
If the first non white-space character is an asterisk (*) the line is considered a comment.
I have created many samples that work in regexr but it never captures in vscode.
For example, regexr is cool with this:
^(?:\s*)\*+(?:.*)?\n
So I convert it into the proper format for the tmlanguage.json file:
^(?:\\s*)\\*+(?:.*)?\\n
But it is not capturing properly, if the first character of the line is an *, it does not catch, but if the first character is a whitespace character followed by an * it does work.
I suck at formatting on stackoverflow, so represents a chr(9) tab character. is a space.
*******************************
*****************************
<tab>*************************
* comment
* comment
<tab>* comment
But it shouldn't work in these cases:
string *******************************
string ***************************** string
<tab>string *************************
x *= 3
I am guessing that either the anchor ^ isn't working in my regex or I am escaping something incorrectly.
Any advice?
Please see sample image attached: screenshot
I don't know the regex engine you're using. I'm just going to give you some
general tips on how it should be done.
First off, if you're reading a string with more than 1 newline in it,
the anchor ^, in an engines default state means Beginning of String (BOS)
What you want in this case is Multi-Line-Mode. This makes the anchor ^ match at the Beginning of Line (BO
L) as well as the BOS.
Second, you don't need those non capture groups (?:\s*) (?:.*), they encapsulate single constructs.
Third, it is redundant to make a group optional when its enclosed contents are optional (?:.*)?
Fourth, you don't need the newline \n construct at the end, since it should not be highlighted anyway, and it might not be present on the last line of text.
The latter will make it not match.
So, putting it all together, the modified regex would be (?m)^\s*\*.*
Explained
(?m) # Inline modifier: Multi-line mode
^ # Beginning of line
\s* # Optional many whitespace
\* # Required at least a single asterisk
.* # Optional rest of non-newline characters
Note that you could put a single capture group around the data
if you need to reference it in a replace (?m)^(\s*\*.*)
Also, the language you're using should have a way to specify options when compiling the regex. If the engine doesn't accept inline modifiers (?m) take it out and specify that option when compiling the regex.
Apparently VS Code's syntax highlighter is single-line. No matter how much i tried matching regeces that are over several lines, these never worked.
Second, if you're designing a language I suggest you not to use an arithmetic operator for comments.
Third, apparently you can match newlines in the begin and end attributes. You can try it there.

reuse last matched character of regex in sed

Many of you with a certain leaning towards proper formatting will know the pain of having a lot of space characters insted of a tab character in the beginning of indented lines after another person edited a file and added lines. I seem to be unable to teach my colleagues how to use vim's integrated line pasting function, so I'm searching for some simple ways to automatically correct lines beginning with a certain pattern. ;)
I'm using a regex to find the corresponding lines, but I can't work out how to "reuse" the last matched character in sed when using "find and replace". The regex matching the lines is
'^\ *[A-Z]'
I would like to replace those space characters, but keep the uppercase letter. My idea would be something like
sed 's|^\ *[A-Z]|\t$|g'
or so, but I guess that would replace the whole line with a single tab character since $ usually matches the line ending?
Is there a simple way to reuse parts of the matched regex in sed?
How about simply not including the first non-space character in the match in the first place?
This matches all spaces at the beginning of a line:
^ *
Edit (quote from the comments):
obviously I don't want to replace spaces in front of other characters than uppercase letters
A look-ahead could do that, but unfortunatey sed does not support them. But you can use the next best thing, an expression that determines which lines sed operates on:
sed '|^ *[A-Z]| s|^ *|\t|'
Of course a back-reference would do it as well:
sed 's|^ *\([A-Z]\)|\t\1|'