What will be regular expression for deleting all the text inside the "" quotation in vim?
I am getting problem is using capture block in this case.
INPUT:
16
17
18
19
OUTPUT:
16
17
18
19
this should work for your example:
%s/"\zs[^"]*//
if you like, you can record macro to achieve that too (using less keystrokes):
(assume your cursor is at line 1)
qq0di"j#qq
than you type #q to replay the macro for all lines in your buffer.
note that the recursive macro is just for saving 999#q
Try this:
:%s/HREF="[^"]*"/HREF=""/
By using "[^"]*" instead of just ".*" you avoid matching all the way past the first closing quote to the later closing quote of the second attribute.
Related
I quickly found a way to get a working multi-line regular expression for my needs, but having trouble with its conversion into a single line.
So, consider this input with regex /^[2-9]\d{1}(?:\s){0}/gm applied:
4126-54D429-001,
5149-A42102-002,
9251-Z48910-003
...
However, when I turn it to one line, I'm getting only first two digits in ouput:
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003 ...
How can this regexp be written to get this capture:
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003 ... ?
This Should Work.
REGEXP
\b\d{2}(?=\d{2})
INPUT
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003, 7851-Z48910-003
OUTPUT
41
51
92
78
The comma is not essential
If i help u, mark me as correct and vote up
This will capture the first two digits of each in groups:
(\d{2})[^,]*
Looking for some regex help! If this can be done in another way / using another tool - please let me know.
Here's a snippet from my data set (there are ~10million rows in total). Every new sequence starts with a '>'.
Note: The line numbers are not in the actual textfile
01 >M00707:15:000000000-AEN4L:1:1101:13198:1037_PairEnd_SUB_SUB merged_sample={14.3: 1}; count=1; 2:N:0:1
02 ctcccggaaaaatttgagcctccagagtagcatataaccgacacgttgccgcctgaaaat
03 acattttccaggtcttnnnnnaaannnggaagcgcgcaccgacgagctttnnannacaag
04 tgtggctctagtgctcggtatttgcaactttttaagtannatgnnngtcgnnnnngaggn
05 nnnnnnnnntaaccnnncaccttcaagcaagtctaagttctcgactaatcaaactataaa
06 tccgctacacggacccagatctcccgccncgtgcannttaaagcaagtctacgttattga
07 agatagaaactattatatcgctaaacgtagctctganncacgctcgccttgactccgact
08 ctgtcaatgtctacgaccaattgaggtggaacatgtgcacatgtgtttcagancattgga
09 ggaattccgggaaaataaattgaggcacaancgaacggtgatctnnnnnnnttagattct
10 gccatgttttttggcacgaacacaattgggcaaatactgttgggatgtggatggat
11 >M00707:15:000000000-AEN4L:1:1101:10949:1045_PairEnd_SUB_SUB_CMP merged_sample={13.3: 1}; count=1; 2:N:0:1
12 atgacatattaatgattcagcccacattccttaatataccacatatgacttacttttcta
13 tatcaacnnnnnnntactttccacaggtatatacatactatgtttaatactcattaattt
14 acttgncactatattattacattatatgattaatccacatttctataacatattagactt
15 tcctcaactagatattat(first)tttcgt(first)aattattatgcagttgtatgacatattactgaatca
16 gccaacattccttaataaaccncatacgactactctgttatcgtatgtgttttatggtct
17 tgattcttagtaatgggtatgacatattattgattcagccnnnattgttnannannnnac
18 atnnancttactnntcttnttcaactctaatatactttccacaggtatatacatactatg
19 ttnaat(last)actcattaat(last)ttacttgccaatatatcattnnnntatatgattaatccacattt
20 ctataacatattagactttcctcaactagatattattttcgtaattattatgcag
I want to cut out everything between the order of characters "tttcgt" and "actcattaat" (but only in that specific order), then replace it with nothing and preserve everything else in its format (with the line breaks etc).
A big challenge to this is also that i need to find tttcgt and actcattaat even if either of those had a line break in between, ie. goes from the end of one line, line break plus line number plus space, and then continued on the next line. (Thanks for #CBroe for pointing that out)
I wrapped "(first)" around the tttcgt chars - see line number 15
I wrapped "(last)" around the actcattaat chars - see line number 19
So far I've mustered up this thinggy (?<=tttcgt).*?(?=actcattaat) - but how can I make my expression ignore newlines?
To make your regex dot match .* include newlines, you need to specify the s modifier. Modifier depends on the implementation of regex.
In python it's the DOTALL flag.
You can't regex a non-consecutive capture group (with characters missing from between input), but you can concat the two capture groups later on, or just string replace the sequence to be removed with an empty string.
Example:
import re;
data = """>M00707:15:000000000-AEN4L:1:1101:13198:1037_PairEnd_SUB_SUB merged_sample={14.3: 1}; count=1; 2:N:0:1
ctcccggaaaaatttgagcctccagagtagcatataaccgacacgttgccgcctgaaaat
acattttccaggtcttnnnnnaaannnggaagcgcgcaccgacgagctttnnannacaag
tgtggctctagtgctcggtatttgcaactttttaagtannatgnnngtcgnnnnngaggn
nnnnnnnnntaaccnnncaccttcaagcaagtctaagttctcgactaatcaaactataaa
tccgctacacggacccagatctcccgccncgtgcannttaaagcaagtctacgttattga
agatagaaactattatatcgctaaacgtagctctganncacgctcgccttgactccgact
ctgtcaatgtctacgaccaattgaggtggaacatgtgcacatgtgtttcagancattgga
ggaattccgggaaaataaattgaggcacaancgaacggtgatctnnnnnnnttagattct
gccatgttttttggcacgaacacaattgggcaaatactgttgggatgtggatggat
>M00707:15:000000000-AEN4L:1:1101:10949:1045_PairEnd_SUB_SUB_CMP merged_sample={13.3: 1}; count=1; 2:N:0:1
atgacatattaatgattcagcccacattccttaatataccacatatgacttacttttcta
tatcaacnnnnnnntactttccacaggtatatacatactatgtttaatactcattaattt
acttgncactatattattacattatatgattaatccacatttctataacatattagactt
tcctcaactagatattat(first)tttcgt(first)aattattatgcagttgtatgacatattactgaatca
gccaacattccttaataaaccncatacgactactctgttatcgtatgtgttttatggtct
tgattcttagtaatgggtatgacatattattgattcagccnnnattgttnannannnnac
atnnancttactnntcttnttcaactctaatatactttccacaggtatatacatactatg
ttnaat(last)actcattaat(last)ttacttgccaatatatcattnnnntatatgattaatccacattt
ctataacatattagactttcctcaactagatattattttcgtaattattatgcag"""
output = re.sub(r'(tttcgt).*(actcattaat)', r'\1\2', data, 0, flags=re.DOTALL)
print output
EDIT: made the code preserve the starting and ending sequences instead of removing them from output.
I have received a very long file. It has 1000+ lines of SQL code. Each line start with line number.
14 PROCEDURE sp_processRuleset(pop_id IN NUMBER);
15
16 -- clear procedure for preview mode to clean crom_population_member_temp table and global variables
17 PROCEDURE sp_commit; -- 28-Oct-09 J.Luo
18
19 -- The rule Set string for the Derived Population Member Preview
20 -- The preview mode will set gv_context_ruleSet by setContext_ruleSet,
21 -- sp_processRuleset uses gv_context_ruleSet to build derived population instead of getting rules from crom_rule_set table
22 gv_context_ruleSet VARCHAR2(32767) := NULL; -- 27-Oct-09 J.Luo
23 -- The population Role Id for the Derived Population Member Preview
I want to remove only line numbers using NotePad++ Find+Replace functionality. Is there any regex available to get this done ?
This using regex is the easiest way.
Other handy way (scrolling a 1K lines is not much IMO) could be :
Block Selection using ALT key and dragging your mouse, like following:
You can use this regex:
^\d+
Working demo
Open Replace window with CTRL+H and run Replace All with these settings:
Find what: ^\s*\d+
Replace with: (empty)
Search mode: Regular expression
Notes:
\s can also be [[:space:]] or [ \t]
\d can also be [[:digit:]] or [0-9]
If the new edit is correct, the pattern \s* that matches the leading space may not be needed.
You can use this one if you have colon after numbers
^\d+:
How can I write this as a regular expression?
tabspaceSTRINGtabspace
My data looks like this:
12345 adsadasdasdasd 30
34562 adsadasdasdasd asdadaads<adasdad 30
12313 adsadasdasdasd asdadas dsaads 313123<font="TNR">adsada 30
1232131 adsadasdasdasd asdadaads<adasdad"asdja <div>asdjaıda 30
I want to get
12345 30
34562 30
12313 30
1232131 30
\t*\t doesn't work.
try the following regular expression
\t.+\t
The problem there is your definition of String...
If you use something like the suggested above, it'll match
tabspaceSTRINGtabspacetabspace
You get the picture. This might be acceptable, if not, you need to limit your "STRING" definition, like:
\t\w+\t
or:
\t[a-zA-Z]+\t
What characters are allowed in your string?
\t\w+\t
\w would allow letters, digits and the underscore (depending on your regex engine ASCII or Unicode)
See it here on Regexr, a good platform to test regular expressions.
Your "regex" \t*\t would match 0 or more tabs and then one tab. The * is a quantifier meaning 0 or more and is referring to the character or group before (here to your \t)
If your whitespace are not tabs, try this
\s+.+\s+30
\s is a whitespace character (space, tab, newline (not important for Notepad++)).
If you are not sure about the strings you are looking for except that they are separated by tabs it is a good approach to describe such a string as everything but a tab: (^\t*)
[^\t]*\t([^\t]*)\t[^\t]*
You can test it on regexpad.com.
I'd like to be able to match this entire line (to highlight this sort of thing in vim): Fri Mar 18 14:10:23 ICT 2011. I'm trying to do it by finding a line that contains ICT 20 (first two digits of the year of the year), like this: syntax match myDate /^*ICT 20*$/, but I can't get it working. I'm very new to regex. Basically what I want to say: find a line that contains "ICT 20" and can have anything on either side of it, and match that whole line. Is there an easy way to do this?
.*ITC 20.*
should do the trick. . is a wildcard that matches any character, and * means you can have 0 or more of the pattern it follows. (i.e. ba(na)* will match ba, banana, bananananana and so on)