I have an "old school" import file. First character contains a 1, 2, or 3 to indicate type of line. Second character through 7th character has a vendor number. I wish to find a 1 in the first character and then a variable vendor in positions 2 through 7. Then for records that meet that criteria change the contents to a variable in positions 92 through 99 regardless of the contents of positions 92 through 99.
My file:
1894 004dzxjvugin PCard11012019 10031910031912611 0
Looking for a 1 in character postion 1 then supplying 894 as the variable vendor to match beginning in character position 2. Then I wish to update this record to contain the supplied variable V9952164911/12/19 beginning in position 92.
It seems the syntax of Notepad++ should do the job but I am inexperienced using it for this purpose.
Ctrl+H
Find what: ^1894.{86}\K.{17}
Replace with: V9952164911/12/19
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
1894 # literally 1894
.{86} # exactly 86 any characters but newline
\K # forget all we have seen until this position
.{17} # 17 characters
Screen capture (before):
Screen capture (after):
Related
I am very new to the world of regular expressions. I am trying to use Notepad++ using Regex for the following:
Input file is something like this and there are multiple such files:
Code:
abc
17
015
0 7
4.3
5/1
***END***
abc
6
71
8/3
9 0
***END***
abc
10.1
11
9
***END***
I need to be able to edit the text in all of these files so that all the files look like this:
Code:
abc
1,2,3,4,5
***END***
abc
6,7,8,9
***END***
abc
10,11,12
***END***
Also:
In some files the number of * around the word END varies, is there a way to generalize the number of * so I don't have to worry about it?
There is some additional data before abcs which does not need to be transposed, how do I keep that data as it is along with transposing the data between abc and ***END***.
Kindly help me. Your help is much appreciated!
Try the following find and replace, in regex mode:
Find: ^(\d+)\R(?!\*{1,}END\*{1,})
Replace: $1,
Demo
Here is an explanation of the regex pattern:
^ from the start of the line
(\d+) match AND capture a number
\R followed by a platform independent newline, which
(?!\*{1,}END\*{1,}) is NOT followed by ***END***
Note carefully the negative lookahead at the end of the pattern, which makes sure that we don't do the replacement on the final number in each section. Without this, the last number would bring the END marker onto the same line.
This will eplace only between "abc" and "***END***" with any number of asterisk.
Ctrl+H
Find what: (?:(?<=^abc)\R|\G(?!^)).+\K\R(?!\*+END\*+)
Replace with: ,
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline*
Replace all
Explanation:
(?: # non capture group
(?<=^abc) # positive look behind, make sure we have "abc" at the beginning of line before
\R # any kind of linebreak
| # OR
\G # restart from last match position
(?!^) # negative look ahead, make sure we are not at the beginning of line
) # end group
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
\R # any kind of linebreak
(?!\*+END\*+) # negative lookahead, make sure we haven't ***END*** after
Screen capture (before):
Screen capture (after):
I have a text file where almost all the lines start with the letter N followed by 3 or 4 numbers as below
N970 G2 X-1.0591 Y-1.7454 I0. J-.04
N980 G1 Y-1.7554
N990 X-1.0594 Y-1.7666
N1000 Z-.2187
N1010 Y-1.7566
How can I remove the N followed by the 3 or 4 numbers in Notepad++ to look like this? if i need to search twice (once for N### and then again for N####) that is fine also.
G2 X-1.0591 Y-1.7454 I0. J-.04
G1 Y-1.7554
X-1.0594 Y-1.7666
Z-.2187
Y-1.7566
the numbers go from 100-9990 in increments of 10 if that helps
You can use the following regex that should work for your case:
^N[0-9]+\s*(.*)
It will match every line that starts with a capital letter N immediately followed by one or more digits. Matched results will include a single group which will contain the text you are looking for.
Note that whitespaces between the N tags and the actual text will not be matched.
Try it out in this DEMO
Breakdown
^ # Assert position at the start of the line
N # Matches capital letter 'N' literally
[0-9]+ # Matches any digit between 1 and unlimited times
\s* # Matches whitespace between 0 and unlimited times
(.*) # The rest of the text you are looking for
Find/Replace
The regex will match each individual line so you can either select Find Next and then Replace and process your file one line at a time or you can choose Replace All to process the whole file at once.
Substitution line (Replace with:) line should just include the first group ($1) which represents the rest of your text with N-prefix tags trimmed.
Make sure that the Search Mode is set to Regular expression.
I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?
Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group
I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)
I need to take a string of concatenated keyword commands and numbers, and put the commands and the numbers into lists.
Pattern:
{command words} by {number} {command words} by {number} etc...
Input string:
"turn right by 1 turn left by 99 up by 11 left by 28"
I thought I might split on the word " by " but that causes the second group to have the number and the next command (eg. 1 turn left).
Regex:
\sby\s
Desired Output:
turn right by 1
turn left by 99
up by 11
left by 28
Desired Lists:
turn right,turn left,up,left
1,99,11,28
How can I split a long string of commands that follow that pattern?
The text is one big long string with no punctuation. The word by is always followed by a number and the pattern is consistent. The first part may contain one or two keyword commands.
Brief
It seems your strings all share the same structure: word or words by 111 (one or more words, followed by by literally, followed by at least one digit)
Code
See regex in use here
(\w[\w ]*?)\s+by\s+(\d+)
Results
Input
turn right by 1 turn left by 99 up by 11 left by 28
Output
Full Match: turn right by 1
Group 1: turn right
Group 2: 1
Full Match: turn left by 99
Group 1: turn left
Group 2: 99
Full Match: up by 11
Group 1: up
Group 2: 11
Full Match: left by 28
Group 1: left
Group 2: 28
Explanation
(\w[\w ]*?) Capture the following into capture group 1
\w[\w ]*? Any word character, followed by anything in the set [\w ] (any word character or space) any number of times, but as few as possible
\s+by\s+ One or more spaces followed by by literally, followed by one or more spaces.
(\d+) Capture one or more digits into capture group 2
I am trying to parse a GEDCOM file using regular expressions and am almost there, but the expression grabs the next line of the text for lines where there is optional text at the end of line. Each record should be a single line.
This is an extract from the file:
0 HEAD
1 CHAR UTF-8
1 SOUR Ancestry.com Family Trees
2 VERS (2010.3)
2 NAME Ancestry.com Family Trees
2 CORP Ancestry.com
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
0 #P6# INDI
1 BIRT
And this is the regular expression I am using:
(\d+)\s+(#\S+#)?\s*(\S+)\s+(.*)
This works for all lines except those that do not contain any text at the end, such as the first one. For instance, the last capture group for the first record contains the '1 CHAR UTF-8'.
Here's a screenshot from regex101.com, showing how the purple capture group bleeds onto the next line:
I have tried using the $ qualifier to limit the .* to just line ends, but this fails as the second line is also a line end.
The \s pattern matches newline symbols. Replace it with a regular space, or [^\S\r\n], or \h if it is PCRE, or [\p{Zs}\t].
(\d+) +(#\S+#)? *(\S+) +(.*)
See the regex demo
If you need to match lines, you may add a multiline option and add anchors (^ at the start and $ at the end of the patten) on both sides (see another demo).