Add character to a some string in NotePad++ - regex

I have a big .txt file, I need to modified this file in NotePad++, find a line start with "1H" and add a number "2" at position 10 for this line. for example
1A 3333333333333
1B 4444444444444
1H 5555555555555
1A 6666666666666
1B 7777777777777
1H 8888888888888
I want the line in 1H to be modified by adding 2 at position 10. How can I do that in NotePad++?
I don't know how to combine the ^(1H) and ^(.{10}) together for the search part.

Find what: ^(1H.{7})(.)
Replace with: \12
This pattern requires a line that starts with 1H and 7 other characters. The parentheses make sure this 9-character string is stored as the first group. Then the next character, which is in the tenth position, is stored as the second group.
The full match is then replaced by group 1 (\1) and the character '2' to get the desired result.
1A 3333333333333
1B 4444444444444
1H 5555552555555
1A 6666666666666
1B 7777777777777
1H 8888882888888

Related

Extracting multi values with regex ( Only values, Not Fieldname )

Can someone help me with this regex?
I would like to extract either 1. or 2.
1.
(2624594000) 303 days, 18:32:20.00 <-- Timeticks
.1.3.6.1.4.1.14179.2.6.3.39. <-- OID
Hex-STRING: 54 4A 00 C8 73 70 <-- Hex-STRING (need "Hex-STRING" ifself too)
0 <--INTEGER
"NJTHAP027" <- STRING
OR
2.
Timeticks: (2624594000) 303 days, 18:32:20.00
OID: .1.3.6.1.4.1.14179.2.6.3.39
Hex-STRING: 54 4A 00 C8 73 70
INTEGER: 0
STRING: "NJTHAP027"
This filedname and value will return different data each time. (The data will be variable.)
I don't need to get the field names and only want to get the values in order from the top (multi value)
(?s)[^=]+\s=\s(?<value_v2c>([^=]+)-)
https://regex101.com/r/lsKeEM/2
-> I can't extract the last STRING: "NJTHAP027" at all!
The named group value_v2c is already a group, so you can omit the inner capture group.
Currently the - char should always be matched in the pattern, but you can either match it or assert the end of the string.
As you are using negated character classes and [^=]+ and \s, you can omit the inline modifier (?s) as both already match newlines.
To match the 2. variation, you can update the pattern to:
[^=]+\s=\s(?<value_v2c>[^=]+)(?:-|$)
Regex demo
To get the 1. version, you can match all before the colon as long as it is not Hex-String.
Then in the group optionally match it.
[^=]+\s=\s(?:(?!Hex-STRING:)[^:])*:?\s*(?<value_v2c>(?:Hex-STRING: )?[^=]+?)(?: -|$)
Regex demo

Find a String from a varying number block to the end

I have nearly 8000 lines of the following text:
DIL 2 M 006 SC SCHÜTZ 083 1 Stck
25215-1 BIN-SORT 2152310251724-1 BIN-SORT getestet 048 133 Stck
RBBE60-T3dsg 21S003 SEALING 6X8.9X2.4 MM 082 3 Stck
I am only interested in the 3 digit block at the end and the number behind.
So this should be the output:
083 1
048 133
082 3
It could be, that the same number e.g. 048 appears at the beginning of the line. this shouldn't be a hit.
Unfortunatelly i have no idea how to extract this strings with the help of notpad++.
This expression,
.*(\d{3}\s+\d+).*
with a replacement of $1 is likely to work here.
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
You may try the following find and replace, in regex mode:
Find: ^.*?(\d+ \d+) \S*$
Replace: $1
The logic here is to use .* to consume everything up until the last two consecutive digits in the line. Then, we replace with only the captured two digits.
Demo

Grouping lines with a header using regex

I'm trying to write a regex query that groups lines which start with a type of key as a header.
For example the key will be an line containing an 'A' followed by a number, I'm alternating bold lines to indicate a group. So the first 4 lines are one group, the next 2 a group etc. :
dd A3
This line is arbitrary
This line is also arbitrary
1234 Arbitrary
A9
This line is arbitrary
ff A3 d
A5ff
Hi there
Hello
This is what I ended up with that worked: .A[0-9].*\n((?!A[0-9]).|\n)

RegEx replaceAll but ignore newlines

Looking for some regex help! If this can be done in another way / using another tool - please let me know.
Here's a snippet from my data set (there are ~10million rows in total). Every new sequence starts with a '>'.
Note: The line numbers are not in the actual textfile
01 >M00707:15:000000000-AEN4L:1:1101:13198:1037_PairEnd_SUB_SUB merged_sample={14.3: 1}; count=1; 2:N:0:1
02 ctcccggaaaaatttgagcctccagagtagcatataaccgacacgttgccgcctgaaaat
03 acattttccaggtcttnnnnnaaannnggaagcgcgcaccgacgagctttnnannacaag
04 tgtggctctagtgctcggtatttgcaactttttaagtannatgnnngtcgnnnnngaggn
05 nnnnnnnnntaaccnnncaccttcaagcaagtctaagttctcgactaatcaaactataaa
06 tccgctacacggacccagatctcccgccncgtgcannttaaagcaagtctacgttattga
07 agatagaaactattatatcgctaaacgtagctctganncacgctcgccttgactccgact
08 ctgtcaatgtctacgaccaattgaggtggaacatgtgcacatgtgtttcagancattgga
09 ggaattccgggaaaataaattgaggcacaancgaacggtgatctnnnnnnnttagattct
10 gccatgttttttggcacgaacacaattgggcaaatactgttgggatgtggatggat
11 >M00707:15:000000000-AEN4L:1:1101:10949:1045_PairEnd_SUB_SUB_CMP merged_sample={13.3: 1}; count=1; 2:N:0:1
12 atgacatattaatgattcagcccacattccttaatataccacatatgacttacttttcta
13 tatcaacnnnnnnntactttccacaggtatatacatactatgtttaatactcattaattt
14 acttgncactatattattacattatatgattaatccacatttctataacatattagactt
15 tcctcaactagatattat(first)tttcgt(first)aattattatgcagttgtatgacatattactgaatca
16 gccaacattccttaataaaccncatacgactactctgttatcgtatgtgttttatggtct
17 tgattcttagtaatgggtatgacatattattgattcagccnnnattgttnannannnnac
18 atnnancttactnntcttnttcaactctaatatactttccacaggtatatacatactatg
19 ttnaat(last)actcattaat(last)ttacttgccaatatatcattnnnntatatgattaatccacattt
20 ctataacatattagactttcctcaactagatattattttcgtaattattatgcag
I want to cut out everything between the order of characters "tttcgt" and "actcattaat" (but only in that specific order), then replace it with nothing and preserve everything else in its format (with the line breaks etc).
A big challenge to this is also that i need to find tttcgt and actcattaat even if either of those had a line break in between, ie. goes from the end of one line, line break plus line number plus space, and then continued on the next line. (Thanks for #CBroe for pointing that out)
I wrapped "(first)" around the tttcgt chars - see line number 15
I wrapped "(last)" around the actcattaat chars - see line number 19
So far I've mustered up this thinggy (?<=tttcgt).*?(?=actcattaat) - but how can I make my expression ignore newlines?
To make your regex dot match .* include newlines, you need to specify the s modifier. Modifier depends on the implementation of regex.
In python it's the DOTALL flag.
You can't regex a non-consecutive capture group (with characters missing from between input), but you can concat the two capture groups later on, or just string replace the sequence to be removed with an empty string.
Example:
import re;
data = """>M00707:15:000000000-AEN4L:1:1101:13198:1037_PairEnd_SUB_SUB merged_sample={14.3: 1}; count=1; 2:N:0:1
ctcccggaaaaatttgagcctccagagtagcatataaccgacacgttgccgcctgaaaat
acattttccaggtcttnnnnnaaannnggaagcgcgcaccgacgagctttnnannacaag
tgtggctctagtgctcggtatttgcaactttttaagtannatgnnngtcgnnnnngaggn
nnnnnnnnntaaccnnncaccttcaagcaagtctaagttctcgactaatcaaactataaa
tccgctacacggacccagatctcccgccncgtgcannttaaagcaagtctacgttattga
agatagaaactattatatcgctaaacgtagctctganncacgctcgccttgactccgact
ctgtcaatgtctacgaccaattgaggtggaacatgtgcacatgtgtttcagancattgga
ggaattccgggaaaataaattgaggcacaancgaacggtgatctnnnnnnnttagattct
gccatgttttttggcacgaacacaattgggcaaatactgttgggatgtggatggat
>M00707:15:000000000-AEN4L:1:1101:10949:1045_PairEnd_SUB_SUB_CMP merged_sample={13.3: 1}; count=1; 2:N:0:1
atgacatattaatgattcagcccacattccttaatataccacatatgacttacttttcta
tatcaacnnnnnnntactttccacaggtatatacatactatgtttaatactcattaattt
acttgncactatattattacattatatgattaatccacatttctataacatattagactt
tcctcaactagatattat(first)tttcgt(first)aattattatgcagttgtatgacatattactgaatca
gccaacattccttaataaaccncatacgactactctgttatcgtatgtgttttatggtct
tgattcttagtaatgggtatgacatattattgattcagccnnnattgttnannannnnac
atnnancttactnntcttnttcaactctaatatactttccacaggtatatacatactatg
ttnaat(last)actcattaat(last)ttacttgccaatatatcattnnnntatatgattaatccacattt
ctataacatattagactttcctcaactagatattattttcgtaattattatgcag"""
output = re.sub(r'(tttcgt).*(actcattaat)', r'\1\2', data, 0, flags=re.DOTALL)
print output
EDIT: made the code preserve the starting and ending sequences instead of removing them from output.

Find all substrings with at least one group

I try to find in a string all substring that meet the condition.
Let's say we've got string:
s = 'some text 1a 2a 3 xx sometext 1b yyy some text 2b.'
I need to apply search pattern {(one (group of words), two (another group of words), three (another group of words)), word}. First three positions are optional, but there should be at least one of them. If so, I need a word after them.
Output should be:
2a 1a 3 xx
1b yyy
2b
I wrote this expression:
find_it = re.compile(r"((?P<one>\b1a\s|\b1b\s)|" +
r"(?P<two>\b2a\s|\b2b\s)|" +
r"(?P<three>\b3\s|\b3b\s))+" +
r"(?P<word>\w+)?")
Every group contain set or different words (not 1a, 1b). And I can't mix them into one group. It should be None if group is empty. Obviously the result is wrong.
find_it.findall(s)
> 2a 1a 2a 3 xx
> 1b 1b yyy
I am grateful for your help!
You can use following regex :
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s?)+(?:\w+|\.))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b.']
Here I just concise your regex by using character class and modifier ?.The following regex is contain 2 part :
[12][ab]|3b?
[12][ab] will match 1a,1b,2a,2b and 3b? will match 3b and 3.
And if you don't want the dot at the end of 2b you can use following regex using a positive look ahead that is more general than preceding regex (because making \s optional is not a good idea in first group):
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s)+\w+|(?:(?:[12][ab]|3b?))+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b']
Also if your numbers and example substrings are just instances you can use [0-9][a-z] as a general regex :
>>> reg=re.compile('((?:[0-9][a-z]?\s)+\w+|(?:[0-9][a-z]?)+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '5h 9 7y examole', '2b']