vscode multiline regex for flat txt database - regex

I am looking for a multiline search-replace regex for VScode that will find the first \this line in a txt file within sets of lines starting with \begin and \end. The textfile contains about 12000 sets of \begin ...\end.
One could say that these sets of \begin .. \end are records.
\thisline never occurs right after \begin or right before \end. There can however be about +500 characters between \begin and the first \thisline. The regex should not look beyond \end into the next set. If \thisline is found a
Example of text file
\begin
line
line /* there can be 20 lines or more here with +500 chars */
\thisline
line /* there can be 20 lines or more here with +500 chars */
\thisline
line
line
line /* there can be 20 lines or more here with +500 chars */
line
line
\end
\begin
line
\thisline
line
line
\thisline
line
\end
I tried the following multiple line Find - replace:
(\\begin.*)
((.|\n|\r){1,100}?)
(\\this)(line.*)
((.|\n|\r){1,100}?)
(\\end.*)
Replace:
$1
$2
$4 changed $5
$6
$8
See image
The problem is how to find the last \thisline in \begin .. \end sequences that have multiple \thisline(s)?
The second problem is how to keep the regex within the borders of the \begin .. \end records?

Related

why grep '\s*' is not working, but grep '\S*' works

I am new to shell script.
I want to display the line starts with whitespace or non-whitespace in the file, but grep '\S*' works, grep '\s*' does not match any line.
And '\s' looks works
My grep version is 3.4. I am using WSL Ubuntu. The read color means it is matched. I tried [[:space:]], the result is same
Anyone can help? Thanks
test.fa includes
ctatccagcaccagatagcatcattttactttcaagcctagaaattgcac
haha
ok
acttgtatataaaccaaccgaagatgaggattgagagttcatcttggtgg
running result
* means "zero or more repetitions of the preceding expression". So \S* matches zero or more non-spaces while \s* matches zero or more spaces, and puting a ^ in front means match those at the start of a line (when the string being compared is a line as is the case with grep by default).
So in your input file:
Line 1: ctatccagcaccagatagcatcattttactttcaagcctagaaattgcac
Line 2: haha
Line 3:
Line 4: ok
Line 5: acttgtatataaaccaaccgaagatgaggattgagagttcatcttggtgg
^\S* matches the following on each line:
line 1: ctatccagcaccagatagcatcattttactttcaagcctagaaattgcac
Line 2: the null string before the leading blank
Line 3: the null string that is the whole line
Line 4: the null string before the leading blanks
Line 5: acttgtatataaaccaaccgaagatgaggattgagagttcatcttggtgg
while ^\s* matches the following on each line:
line 1: the null string before ctatccagcaccagatagcatcattttactttcaagcctagaaattgcac
Line 2: the leading blank
Line 3: the null string that is the whole line
Line 4: the leading blanks
Line 5: the null string before acttgtatataaaccaaccgaagatgaggattgagagttcatcttggtgg
So both regexps match something on every line, and what is colored as matching is the printable (i.e. non-null and non-blank) chars from each matching string.
To display the lines that start with whitespace would be:
grep '^\s'
and to display the lines that start with non-whitespace would be:
grep '^\S'
and to display empty lines would be:
grep -v '.'
If your grep doesn't support \s/\S then use [[:space:]]/[^[:space:]] instead if it's a POSIX grep or [ \t]/[^ \t] in any grep.

Remove first line if blank using RegEx in YML file

I'm running a script to grab text from websites, but the current code sometimes has the output start with a blank line.
data1 data2 data3
data4 data5 data6
data7 data8 data9
I also have other files that don't have a blank line to start.
Running this regex script on all the files at once, how can I remove the first line of the file only if the first line is blank, while keeping the blank lines in the middle of the files?
I am using regex in a yml config file.
You can use the following regex to match if file's first line is blank:
^\s*$
together with two regex flags - multiline (m) and Anchored (A).
Explanation:
^ # line start
\s* # match between 0 and unlimited amount of whitespace chars
$ # end of line
The Anchored flag allows to match only the first line, rather than all blank lines.
See demo here.

regex: match lines that doesn't end with '}' and has match one of three words [duplicate]

This question already has answers here:
Using the star sign in grep
(12 answers)
Closed 3 years ago.
I have this text:
NBA:red this line has a tab and ends with a curly braces}
some random text qwertyuiop
NBA:green this line must match
NBA:red this line has a tab and must match
NBA:response this line has spaces and must match
NBA:blue this line has a tab and ends with a curly braces}
some random text qwertyuiop
NBA:blue this line has spaces at the begining and ends with curly braces}
random text qwertyuiop
this line must not match}
this line must not match }
I want to match the lines that contains 'NBA:' following by the word 'red' or 'green' or 'blue', and also that doesn't end with a curly braces'}', this command match only 'NBA:' and one of the three words:
$ egrep 'NBA:(red|green|blue)' myfile.txt
NBA:red this line has a tab and ends with a curly braces}
NBA:green this line must match
NBA:red this line has a tab and must match
NBA:blue this line has a tab and ends with a curly braces}
NBA:blue this line has spaces at the begining and ends with curly braces}
But I don't know how to match the lines that doesn't end with '}':
I tried this but it doesn't work:
egrep 'NBA:(red|green|blue)*[^}]$' myfile.txt
But this works:
egrep 'NBA:(red|green|blue)' lorem.txt | egrep '[^}]$'
NBA:green this line must match
NBA:red this line has a tab and must match
I want to do it in just one command
You were just one character off. This should work fine:
egrep 'NBA:(red|green|blue).*[^}]$'
# ^
# Note this bit.
* doesn't mean the same thing in regex that it does in glob patterns. It means zero-or-more of the preceding item (a preceding item in this answer being ., any character).

grep not the begining of a line

I want to find all lines in a file containing a number, but not at the beginning of a line. I tried the following:
grep -E '[^^][1-9]?[0-9]+' test.txt
However, it does not work: this expression matches the lines starting with numbers consisting of two-(or more) digits. As I understand it, [^^] does not mean "any symbol except the beginning of a line". Why is so, and how to write this correctly?
Edited according to comment:
This Regex should do it, it matches lines not starting with a number (one or more characters), then find one or more numbers.
^[^1-9]+?\d+
You will need to set the 'multiline' option, if you check multiple lines at one time.
Your issue is the [^^] part of your regex. That is a negative character class (a ^ inside the [ ] negates what is inside the brackets).
Instead, I think you are looking for ^ outside of the brackets to state 'start of the line' and then a negated character class of [^0-9] for something other than a digit at the start of the line:
$ echo "1 line
line 2
3 line
line 4
no num" | grep '^[^0-9]'
line 2
line 4
no num
Then add .* for 'anything of any length' and [0-9] for at least one digit to filter for lines that have a digit in the line:
$ echo "1 line
line 2
3 line
line 4
no num" | grep '^[^0-9].*[0-9]'
line 2
line 4
Or, if you want to be locale aware, you can use POSIX character classes to the same result:
$ echo "1 line
line 2
3 line
line 4
no num" | grep '^[^[:digit:]].*[[:digit:]]'
line 2
line 4

Emacs regex search for lines with less then 20 characters

So if we have some txt file with variable line length, how to search with emacs regex lines with fewer of some number (let say 20) of characters.
This should do the job it matches any line with between 0 and 20 tokens on it
^[^\n]{0,20}$
^ Start of a line (or String)
[^\n] Anything that is not a new line
{0,20} The previous between 0 and 20 times
$ End of line (or String)