Notepad++ - Searching for elevated lines that don't end with "\r - regex

I am working in Notepad++ and I'm looking for a solution to an issue I'm having.
I have a few hundred thousand lines of code that I need to check with regex.
Basically, I need to check for any elevated lines (3-tabbed) that don't end with a parentheses and newline. ("\r)
Here's an example of the code:
side
{
"id" "13"
"plane" "(-1152 256 1) (-1152 256 -0) (2048 256 -0)"
"material" "CONCRETE/CONCRETEFLOOR001A
"uaxis" "[1 0 0 0] 0.25"
"vaxis" "[0 0 -1 0] 0.25"
"rotation" "0"
"lightmapscale" "16"
"smoothing_groups" "0"
}
You can see that the Material line is missing a parentheses. I need a way to find these lines that are missing them.
Thank you for your help!

^\s+".*(?<!")$\r\n
This expression finds lines which:
Starts with white spaces, following with a "
Ends with " and new line character (\r\n)

Related

Regex to match position from right to left

I hope you can help me, I'm studying how to make a regex and now I have this problem:
Write a regex that accepts strings with 0 and 1 and that has a 1 on position 5 from right to left.
e.g. 10000 is accepted because it has an 1 on the position 5 from right to left or 010000, 0010000 or 1110000 are accepted.
I was thinking with something like: (0+1)*+1(0+1)(0+1)(0+1)(0+1)(0+1)
You can use this regex:
1[01]{4}$
If you want to match full input then use:
^[01]*1[01]{4}$
Here 1[01]{4}$ ensures that we have 4 digits of 0 and 1 after we match 1 thus making 1 at 5th position from right to left.
RegEx Demo
Well - think of it this way. It needs to be as many 1s and 0s as you please, followed by a 1, followed by 4 more ones or zeroes.
So:
my_regex =
"^[01]*" + // Starts with One or zero, zero or more times
"1" + // Followed by a one
"[01]{4}$" // Followed by four things, which could be either zero or one, before ending.
Your (0 + 1) syntax looks foreign to me. I'm using character classes to specify the [01] things but you could use (0|1) in their place, which is what your attempt looks more like.
The full thing, together, is ^[01]*1[01]{4}$

Find repeating gps using regular expression

I work with text files, and I need to be able to see when the gps (last 3 columns of csv) "hangs up" for more than a few lines.
So for example, usually, part of a text file looks like this:
5451,1667,180007,35.7397387,97.8161897,375.8
5448,1053z,180006,35.7397407,97.8161814,375.7
5444,1667,180005,35.7397445,97.8161674,375.6
5439,1668,180004,35.7397483,97.8161526,375.5
5435,1669,180003,35.7397518,97.8161379,375.5
5431,1669,180002,35.7397554,97.8161269,375.6
5426,1054z,180001,35.7397584,97.8161115,375.6
5420,1670,175959,35.7397649,97.8160931,375.9
But sometimes there is an error with the gps and it looks like this:
36859,1598,202603.00,35.8867316,99.2515545,555.700
36859,1598,202608.00,35.8867316,99.2515545,555.700
36859,1142z,202610.00,35.8867316,99.2515545,555.700
36859,1597,202612.00,35.8867316,99.2515545,555.700
36859,1597,202614.00,35.8867316,99.2515545,555.700
36859,1596,202616.00,35.8867316,99.2515545,555.700
36859,1595,202618.00,35.8867316,99.2515545,555.700
I need to be able to figure out a way to search for matching strings of 7 different numbers, (the decimal portion of the gps) but so far I've only been able to figure out how to search for repeating #s or consecutive numbers.
Any ideas?
If you were to find such repetitions in an editor (such as Notepad++), you could use the following regex to find 4 or more repeating lines:
([^,]+(?:,[^,]+){2})\v+(?:(?:[^,]+,){3}\1(?:\v+|$)){3,}
To go a bit into detail
([^,]+(?:,[^,]+){2})\v+ is a group consisting of one or more non-commas followed by comma and another one or more non-commas followed by a vertical space (linebreak), that is not part of the group (e.g. 1,1,1\n)
(?:[^,]+,){3} matches one or more non-commas followed by comma, three times (your columns that don't have to be considered)
\1 is a backreference to group 1, matching if it contains exactly the same as group 1
(?:\v+|$) matches either another vertical whitespaces or the end of the text
{3,} for 3 or more repetitions - increase it if you want more
Here you can see, how it works
However, if you are using any programming language to check this, I wouldn't walk on the path of regex, as checking for those repetitions can be done a lot easier. Here is one example in Python, I hope you can adopt it for your needs:
oldcoords = [0,0,0]
lines = [line.rstrip('\n') for line in open(r'C:\temp\gps.csv')]
for line in lines:
gpscoords = line.split(',')[3:6]
if gpscoords == oldcoords:
repetitions += 1
else:
oldcoords = gpscoords
repetitions = 0
if repetitions == 4: #or however you define more than a few
print(', '.join(gpscoords) + ' is repeated')
If you can use perl, and if I understood you:
perl -ne 'm/^[^,]*,[^,]*,[^,]*,([^,]*,[^,]*,[^,]*$)/g; $current_line=$1; ++$line_number; if ($prev_line==$current_line){$equals++} else {if ($equals>=6){ print "Last three fields in lines ".($line_number-$equals-1)." to ".($line_number-1)." are equals to:\n$prev_line" } ; $equals=0}; $prev_line=$current_line' < onlyreplacethiswithyourfilepath should do the trick.
Sample output:
Last three fields in lines 1 to 7 are equals to:
35.8867316,99.2515545,555.700
Last three fields in lines 16 to 22 are equals to:
37.8782116,99.7825545,572.810
Last three fields in lines 31 to 44 are equals to:
36.6868916,77.2594245,581.358
Last three fields in lines 57 to 63 are equals to:
35.5128764,71.2874545,575.631

Replace groups of text all together with gVim

Consider the following data:
Class Gender Condition Tenis
A Male Fail Fail 33
A Female Fail NotFail 23
S Male Yellow 14
BC Male Happy Elephant 44
I have a comma separated value with unformatted tabulation (it varies among tabs and whitespaces).
In one specific column I have compound words which I would like to eliminate the space. In the above example, I would like to replace "Fail " with "Fail_" and "Happy" with "Happy_".
The result would be the following:
Class Gender Condition Tenis
A Male Fail_Fail 33
A Female Fail_NotFail 23
S Male Yellow 14
BC Male Happy_Elephant 44
I already managed to do that in two steps:
:%s/Fail /Fail_/g
:%s/Happy /Happy_/g
Question: As I'm very new to gVim I am trying to implement these replacements all together, but I could not find how to do that*.
After this step, I will tabulate my data with the following:
:%s/\s\+/,/g
And get the final result:
Number,Gender,Condition,Tenis
A,Male,Fail_Fail,33
A,Female,Fail_NotFail,23
S,Male,Yellow,14
BC,Male,Happy_Elephant,44
On SO, I searched for [vim] :%s two is:question and some variations, but I could not find a related thread, so I guess I am lacking the correct terminology.
Edit: This is the actual data (with more than 1 million rows). The problem starts in the 12th column (e.g. "Fail Planting" should be "Fail_Planting").
SP1 51F001 3 1 1 2 3 2001 52 52 H Normal 17,20000076 23,39999962 NULL NULL
SP1 51F001 3 1 1 2 3 2001 53 53 F Fail Planting 0 0 NULL NULL
SP1 51F001 3 1 1 2 3 2001 54 54 N Normal 13,89999962 0 NULL NULL
You can use an expression on the right hand side of the substitution.
:%s/\(Fail\|Happy\) \|\s\+/\= submatch(0) =~# '^\s\+$' ? ',' : submatch(1).'_'/g
So this finds Fail or Happy or whitespace and then converts checks to see if the matched part is completely whitespace. It it is replace by a comma if it is not use the captured part and append an underscore. submatch(0) is the whole match and submatch(1) is the first capture group.
Take a look at :h sub-replace-expression. If you want to do something very complex define you can define a function.
Very magic version
:%s/\v(Fail|Happy) |\s+/\= submatch(0) =~# '^\v\s+$' ? ',' : submatch(1).'_'/g
You have all the parts you just need to combine them together with |. Example:
:%s/\>\s\</_/g|%s/\s\+/,/g
I am using \> and \< to find words that only have one space between them so we can replace it with _.
For more help see:
:h /\>
:h :range
:h :bar
You could perhaps try a macro if there are certain conditions that are true (or write a vimscript, but my vimscript is very rusty). I will show a sample macro you could use:
Go to first line in file after the headings
press q to begin recording a macro
press t to choose the register t for recording to (I use t for "temp")
press ^ to move to the beginning of the line
press 2w to move to the third word (move 2 words to the right)
press e to move to the end of the word
press l (letter l) to move right one character (to the space)
press r to enter replace single character mode
press _ to enter an underscore
press j to move down a line
press q to stop recording the macro
Now that you have the macro stored in register t you can run the macro on every line in the file. If there are 100 lines in the file, you have already done 1 and there is a header, so you would type the following to run it on the remaining 98 lines:
98#t
These two commands:
:%s/\(\a\) \(\a\)/\1_\2/g
:%s/\s\+/,/g
seem to work on your sample:
SP1,51F001,3,1,1,2,3,2001,52,52,H,Normal,17,20000076,23,39999962,NULL,NULL
SP1,51F001,3,1,1,2,3,2001,53,53,F,Fail_Planting,0,0,NULL,NULL
SP1,51F001,3,1,1,2,3,2001,54,54,N,Normal,13,89999962,0,NULL,NULL
but you have decimal numbers here with a comma as separator that will mess with the "comma-separated-ness" of your data. Changing those commas into periods beforehand might be a good idea:
:%s/,/./g
SP1,51F001,3,1,1,2,3,2001,52,52,H,Normal,17.20000076,23.39999962,NULL,NULL
SP1,51F001,3,1,1,2,3,2001,53,53,F,Fail_Planting,0,0,NULL,NULL
SP1,51F001,3,1,1,2,3,2001,54,54,N,Normal,13.89999962,0,NULL,NULL

Match a number only in a certain range

I want to match a 3-digit number only from a webpage.
So for example if webpage has number 1 599-+ (white space between 1 and 5 and -+ signs after). I only want to capture/match numbers between 0 and 599-+ and nothing else.
My regex is: regex(?:^|(?:[^\d\s]\s*))([0-5]\d\d-+) but this one also matches "i 1599-+"
or regex(\^[0-5]?[0-9]?[0-9]-+$) doesnt work either...
A solution would be to use this regular expression with a non capturing group matching either the start of the string or something that's not a digit (with a little more verbosity due to space handling) :
(?:^|(?:[^\d\s]\s*))([0-5]\d\d)
Examples (in javascript as you didn't specify a language) :
"1 599".match(/(?:^|(?:[^\d\s]\s*))([0-5]\d\d)/) => null
"a sentence with 1 599 inside".match(/(?:^|(?:[^\d\s]\s*))([0-5]\d\d)/) => null
"another with 599".match(/(?:^|(?:[^\d\s]\s*))([0-5]\d\d)/) => ["h 599", "599"]
"599 at the start".match(/(?:^|(?:[^\d\s]\s*))([0-5]\d\d)/) => ["599", "599"]
(desired group is at index 1)
I hope this is needed for you.Try it, if it is not fulfilling.Write a little more description.
/^[0-5]?[0-9]?[0-9]$/.test("599");
From the above I understood and developed this, I think this is what you needed.
/^[0-5]?[0-9]?[0-9][\+|-]?$/.test("599");
In the above regex I made + - as optional and it'll check for presence of any one sign.
If you want in the order of -+ then try this
/^[0-5]?[0-9]?[0-9][\-][\+]$/.test("99-+"); .
Okay #user3214294

Regex to match numbers and commas, but not numbers starting with 0 unless it's 0,

Well I tried to sum it up in the title.
I need a reg ex to match numbers and commas, but not numbers starting with 0 unless it's 0,number
My users enter hours in a field, so they have to be able to enter 0,3 hours, but they are not allowed to write 002 or 09.
I have this reg ex
^[0-9]*\,?[0-9]+$
How can I extend it to not allow start with 0 unless the 0 is followed by a comma
Another one :)
^(0|[1-9]\d*(|,\d+)|0,\d+)$
This one should suit your needs:
^0,\d*[1-9]|[1-9]\d*$
either 0,\d*[1-9]: a 0, followed by a comma, followed by 0 or more digit, followed by one digit between 1 and 9
or [1-9]\d*: a digit between 1 and 9, followed by zero or more digit
Matches:
0,3
0,03
3
30
Doesn't match:
0
0,0
0,30
03
You don't need to force everything into a single regex to do this.
It will be far clearer if you use multiple regexes, each one making a specific check.
if ( /^[0-9]+,[0-9]+$/ || /^[1-9][0-9]*$/ )
Here we are making two different checks. "Either this one matches, or the other one matches", and then you don't have to jam both conditions into one regex.
Let the expressive form of your host language be used, rather than trying to cram logic into a regex.