Regular Expression to surround date with quotation marks - regex

I have a list of data all in the same format which I need to analyse in Weka.
I need to surround the date/time values with quotation marks "" but can't work out a regular expression to complete it..
I need to change a row from this:
1028,NULL,1,21,7,AD9,06A,60136859,NULL,1,4,3,2012-02-21 10:05:00.100,2012-02-21 10:05:23.170
to a row like this:
1027,NULL,1,21,7,AD9,06A,60136859,NULL,1,5,4,"2012-02-21 10:03:53.643","2012-02-21 10:04:29.787"
where the date/time values are surrounded by quotation marks.

This will work in notepad++ as long as your datetime values are always fully formatted.
Find what: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})
Replace with: "\1"
This works because of backreferences. Everything that is captured within parenthesis is stored as a backreference. You access backreferences by typing \number where number correlates to the position of parenthesis in the regex. So since we are only using one pair of parenthesis, want backreference 1, and we use \1.
So you find the entire date and it gets stored in \1 because of the parenthesis in your regex. Then you replace the entire date with "entire date" aka "\1".

Related

Regular expression to replace 2023-02-06T07:43:51.9732381Z with blanks

I have a date stamp at the beginning of my log files, in the format 2023-02-06T07:43:51.9732381Z
There are other various dates, I have tried to use notepad ++ to write a regular expression to replace all the data formats with just blanks. I would like to have a regular expression to replace the dates.
Secondly, what if I wanted the dates to be in the format 2023-02-06T07:43:51 ?
To remove the leading date on every line do this:
Find what: ^2023[^ ]* (with trailing space)
Replace with: (empty string)
check the "Regular expression" radio button
To remove just the fractional seconds on every line do this:
Find what: ^(2023[^\.]*)[^ ]*
Replace with: $1
check the "Regular expression" radio button
The manual indicates
that Notepad++ uses "Boost" for regular expressions with its search syntax and replacement syntax.
So:
2023-02-06T07:43:51.9732381Z
could be matched at the start of lines by:
^\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d[.]\d{7}Z
or more loosely by:
^\d+-\d+-\d+T\d+:\d+:\d+[.]\d+Z
To retain the part before the period, "mark" the sub-expression you want to keep:
^(\d+-\d+-\d+T\d+:\d+:\d+)[.]\d+Z
and refer to it in the replacement:
$1

Regex: How to find a string, then get charactes on either side up to a dilimeter?

I have a string like so:
foobar_something_alt=\"Brownfields1.png#asset:919\" /><p>MSG participat
And wish to find all oocurrences via the substring #asset: then select the characters around the match up to the quote marks.
Trying to extract specific ALT tags from a SQL dump. Is this possible with a regular expression?
Put [^"]* before and after the string you want to match. This will match any sequence of characters that aren't ".
[^"]*#asset:[^"]*

Regular expression to find and replace wrong quotation marks

I have a document which has been copy/pasted from MS Word. All the quotations are copied as ''something'' which basically is creating a mess in my LaTeX document, hence they have to be ``something''.
Is it possible to make a regular expression that finds all these ''something'' where something can be anything (including symbols, numbers etc.), and a regular expression that replaces it with the correct quotation? I am using Sublime Text which is able to use RegEX directly in the editor.
The below regex would match all the double single quoted strings and capture all the characters except the first two single quotes(only in the matched string). Replacing the matched characters with double backticks plus the characters inside group index 1 will give you the desired result.
Regex:
''(.*?'')
Replacemnet string:
``$1
DEMO

Regex to find strings separated by comma, and then add quotes?

I'm using Sublime Text 3 and have a CSV file with 1200 tax rates in this format:
Code,Country,State,Zip/Post Code,Rate,default
US-NY-10001-Rate 1,US,NY,10001,0.08875,
....
I need a regex to find each value separated by a comma so I can then wrap quotes around it.
Is this possible?
Find the following regular expression (using capturing group and backreference):
([^,\n]+)
and replace it with:
"\1"

unintended replacement in Regular Expression

I have hundreds lines of text like this:
gi|393925858|gb|AGTA02071966.1| 0000000739 . G A 121.20 PASS NS=74:AN=2:DP=8448 GT:DP:GQ:EC:SG 0/1:262:144:116:R
I wanted to ONLY replace the colon with semicolon in this portion "NS=74:AN=2:DP=8448" of the line. Here is how I matched and replaced it:
if re.match(r'.*NS=\d+(:)AN=\d(:)DP=\d+.*', line):
print line.replace(':', ";")
I thought I just replaced the matched pattern in all lines, but it replaced EVERY colon with semicolon in all lines, is there a way to specify just the matched ones, or my regular expression was incorrect? Thanks.
The way to do this is to use the full regex in the replacement, using capture groups (parentheses) to capture the digits you want to keep.
So your search term is this:
NS=(\d+):AN=(\d+):DP=(\d+)
And your replace term is this:
NS=\1;AN=\2;DP=\3
Note that in the replacement, the \1 will be filled in with what the first capture group (parens) captured from the original text.