Regex - how to get time and date and get ISO8601 timestamp - regex

I have this text
2014-01-30 10:15 some text here
2014-01-30 10:20 some other text here
I need a regex that matches a timestamp group in ISO 8601 format.
Required output:
2014-01-30T10:15Z
2014-01-30T10:20Z
With this REGEX I can't get what I want, replace the space with 'T' and append a 'Z at the end.
^(?<timestamp>\S+ \S+)
Does anyone know how to solve this problem?
--- UPDATE ---
BTW, I'm using http://rubular.com/ to test my regex

You could perhaps modify your current regex a bit to:
^(\S+) (\S+).*
And replace with $1T$2Z
regex101 demo

\d{4}-\d{2}-\d{2} \d{2}:\d{2} will match the required format – validation is another story though (if you need it).
You can do something like if (regex match) { replace " " with "T"; append "Z" }
If this doesn't help you or it is unclear it is because your question was vague.
Edit: you didn't specify what language you're writing this in. That is how you would do your replacements.

In php:
preg_replace('/^(\S+) (\S+).*/', "$1T$2Z", $str);
In perl:
$str =~ s/^(\S+) (\S+).*/$1T$2Z/;
In notepad++
Find what: ^(\S+) (\S+).*
Replace with: $1T$2Z

With:
(\d{4}-\d{2}-\d{2})( \d{2}:\d{2} )(?:.*)
You can capture 2014-01-30 10:15 in groups (and ignore the text in another group).
Then you use the second group (10:15) to add 'T' at the beginning and 'Z' at the end.
See demo at:
http://rubular.com/r/4icGfcIixa

Regex is a bit different from language to language, it could help if you told us what language you are using.
For example, in javascript, you can do something like this:
"2014-01-30 10:15 some text here".replace(/(\d{4}-\d{2}-\d{2})\s(\d{2}:\d{2})\s?.*/,"$1T$2Z")
Where the string can be a variable.
If you have a multiple line text them you should add a g at the end of the regex:
"2014-01-30 10:15 some text here\n2014-01-30 10:20 some other text here".replace(/.*(\d{4}-\d{2}-\d{2})\s(\d{2}:\d{2})\s?.*/g,"$1T$2Z")

Related

Matching from a starting delimiter to an end delimiter (Regex/pattern matching)

I am trying to match for a certain block of text.
The format of the text I want to match is
<pevz:url>https://some.server.com/arbitraryFoo.jpeg</pevz:url>
where only <pevz:url> and </pevz:url> are known.
My naive try was to match with
<pevz:url>*([0-9a-zA-Z:/._-])<\/pevz:url>
but that didn't work. I am using gedit to match with the default search and replace (no advanced-find).
How can I match for the whole string?
Best regards,
Joe Cocker
You can try:
<pevz:url>(.*?)<\/pevz:url>
or
<pevz:url>([^>]+)<\/pevz:url>
Regex Demo

regex to select only the zipcode

,Ray Balwierczak,4/11/2017,,895 Forest Hill Rd,Apalachin,NY,13732,y,,
i want to select only 13732 from the line. I came up with this regex
(\d)(\s*\d+)*(\,y,,)
But its also selecting the ,y,, .if i remove it that part from regex, the regex also gets valid for the date. please help me on this.
Generally, if you want to match something without capturing it, use zero-length lookaround (lookahead or lookbehind). In your case, you can use lookahead:
(\d)(\s*\d+)*(?=\,y,,)
The syntax (?=<stuff>) means "followed by <stuff>, without matching it".
More information on lookarounds can be found in this tutorial.
Regex: \D*(\d{5})\D*
Explanation: match 5 digits surrounded by zero or more non-digits on both sides. Then you can extract group containing the match.
Here's code in python:
import re
string = ",Ray Balwierczak,4/11/2017,,895 Forest Hill Rd,Apalachin,NY,13732,y,,"
search = re.search("\D*(\d{5})\D*", string)
print search.group(1)
Output:
13732

Positive lookbehind doesn't work with plus

I'm trying to select place after pattern word <Art><dot><digits><dot>
Code:
Art. 83.
xxx xxx xxx
Art. 3.
xxx xxx xxx
So far I tried this pattern, however if add + for \d selection fails.. why?
(?<=Art..\d\d.).
How can I select text after text with random digits length?
Edit 1
Ok I need add new line for every text with after text pattern Art. <digits length unknown>.
Input
Art. 3.
xxx xxx xxx
Output
Art. 3.
xxx xxx xxx
Edit 2
I am looking solution for language JAVA / Android / parser in Notepad++
You are using look behind, not lookahead, which has limitations in most implementations. From http://www.regular-expressions.info/lookaround.html
The bad news is that most regex flavors do not allow you to use just any regex inside a lookbehind, because they cannot apply a regular expression backwards. The regular expression engine needs to be able to figure out how many characters to step back before checking the lookbehind. When evaluating the lookbehind, the regex engine determines the length of the regex inside the lookbehind, steps back that many characters in the subject string, and then applies the regex inside the lookbehind from left to right just as it would with a normal regex.
In your case, maybe you can use an expression that matches text and uses the match in the replacement. For example, in Java:
String original = "Art. 3.\nxxx xxx xxx";
String replaced = original.replaceAll("Art\\. \\d+\\.", "$0\n");
You are not able to use variable length look behinds in most implementations of regex. However, you should be able to solve this without the look behind:
# Match your string in a group
/(Art\.\s\d+\.)/g
# Replace and append a new line to $1 match group
$1\n
Example: http://regex101.com/r/fW5jO7
We don't know what language you are using, but a PHP implementation:
preg_replace('/(Art\.\s\d+\.)/', "$1\n", $text);
In perl:
for (<DATA>) {
print;
print("\n") if (/Art\. \d+\./);
}
__DATA__
whatver
stuff
Art. 83.
123 456 789
Art. 3.
987 654 321
more
stuff
Use grouping instead of lookbehind
(Art..\d\d.)(.)
and then get group 2
A Notepad++ solution:
Find what: ^(Art\.\s*\d+\.)
Replace with: $1\n
May be you want crlf, so use: Replace with:$1\r\n

Regex expression to find file extension in a file with multiple periods

How would you write a regular expression to find the file extension of the following files, keeping in mind that what I am looking for is the ".pdf" or ".xls" portion of the string?
REPORTPDF.20130810.pdf.pgp
REPORTXLS.20130810.xls.pgp
EDIT:
The resulting filenames I want to end up with are the following:
REPORT20130810.PDF
REPORT20130810.XLS
I am on a Windows platform. I've played around with this a bit at http://regexpal.com/ but so far I can only figure out how to match the date:
([0-9]{4}[0-9]{2}[0-9]{2})
Using sed:
sed 's/^\(.*[^.]*\)\.[^.]*$/\1/' <<< "REPORTPDF.20130810.pdf.pgp"
REPORTPDF.20130810.pdf
Using grep -P (PCRE regex):
grep -oP '^.+[^.]+(?=\.[^.]+$)' <<< "REPORTPDF.20130810.pdf.pgp"
REPORTPDF.20130810.pdf
.+\.(\w+)\.\w+$ would deliver the last but one extension as group 1, how this is accessed would then be dependent of your host language for the regex.
If you don't need the file extension to be capitalized, this should work
([a-zA-Z]+)\.([0-9]{4}[0-9]{2}[0-9]{2})\.(xls|pdf)\.pgp
Matches:
REPORTXLS.20130810.xls.pgp
And then the groups you'd use are two and three
REPORT\2.\3
Matches:
REPORT20130810.xls
Problem is that you don't provide much context for how you're going about changing these file names.
You don't say what language/library you're using, but this Perl one-liner does the trick:
perl -lpe "s/^([^.]*)(...)\.(\d+)(\.\2)\.pgp/\1\3\4/i; $_=uc"
I think this will work for you :)
^(([A-Z a-z]*)(?:XLS.|PDF.)(\d{8})(.pdf|.xls))
Edit live on Debuggex
^ starts at the beginning of the string
(.*) any character before
\d any number 0-9
{8} only 8 times for that character section (in this case 8 times of
the numbers 0-9)
?: is non capture groups
I wrapped the capture groups into one large one so the thing that you want will be in the first capture group :).
This can be prob be replaced
([A-Z a-z]*)
with
(REPORT)
This (.*?(?:\..*)?)(\..*) will hold things like:
'hello.1a.2bb.3' ---> group(1) == 'hello.1a.2bb', group(2) == '.3'
'yep.1' ---> group(1) == 'yep', group(2) == '.1'
If the format is pretty much fixed you could use
(REPORT)([^.]++)[.]([^.]++)[.]([^.]++)[.](pgp)
and cherry pick replacement based on what you want
Used java here but regex match would still be same
String a = "REPORTPDF.20130810.pdf.pgp".replaceAll(
"(REPORT)([^.]++)[.]([^.]++)[.]([^.]++)[.](pgp)",
"$1--$2--$3--$4--$5");
;
String b = "REPORTXLS.20130810.xls.pgp".replaceAll(
"(REPORT)([^.]++)[.]([^.]++)[.]([^.]++)[.](pgp)",
"$1--$2--$3--$4--$5");
System.out.println(a);
System.out.println(b);
REPORT--PDF--20130810--pdf--pgp
REPORT--XLS--20130810--xls--pgp
in your case "$1$3.$2"
String b = "REPORTXLS.20130810.xls.pgp".replaceAll(
"(REPORT)([^.]++)[.]([^.]++)[.]([^.]++)[.](pgp)",
"$1$3.$2");
which produces intended result
REPORT20130810.XLS

Sublime Text 2 - Regular Expression Find and Replace

I am looking for a solution to find and replace the formatting of many prices within one of my documents.
I currently have prices that are formatting like so: $60 and would like to change the formatting to: 60 $
The following 'Find What' works to find the first format \$\d{0,2} but not too sure about what to 'Replace With'.
Is there a way to preserve the number?
Thank you.
Try this:
find: \$(\d{0,2})
replace: \1 $
Option+Cmd+F:
Place into the find field:
\$([0-9]{0,2})
Place into the replace field:
\1 \$
The backslash + number indicated which capture group to place in there.