WKT: regex to extract only the first two floats values

WKT: regex to extract only the first two floats values - regex

I have the input below:
LINESTRING(-111.928130305897 33.4490602213529,-111.928130305897 33.4490602213529)
and I need a regex that generates this:
-111.928130305897 33.4490602213529
Its essentially the first two floats.

You can use the following regex:
(?<=\()-?(:?[1-9]\d*|\d)(:?\.\d*)\s+-?(:?[1-9]\d*|\d)(:?\.\d*)(?=,)
DEMO: https://regex101.com/r/Q2HreC/3
Explanations and hypothesis:
(?<=\() positive lookbehind to have the constraint that the floats follow a parenthesis
-?(:?[1-9]\d*|\d)(:?\.\d*) capture the first float: - is optional then a number with several digits starting by at least a 1, or a simple digit followed eventually by a . and some decimals.
\s+ some spaces in the middle
followed by a second float
(?=,) positive look ahead to add the constraint followed by ,

To match the first 2 floats for your example, you might use:
^LINESTRING\(([-+]?\d*\.?\d+) ([-+]?\d*\.?\d+)
That would match:
^LINESTRING from the beginning of the string
\( an opening parenthesis
followed by matching a float ([-+]?\d*\.?\d+) 2 times in a capturing group
The float regex:
( # Capturing group
[-+]? # Optional + or -
\d* # Match a digits zero or more times
\.? # Optional dot
\d+ # Match a digit one or more times
) # Close capturing group
Or to match -111.928130305897 33.4490602213529 for your example
without capturing groups you could use:
(?<=^LINESTRING\()[-+]?\d*\.?\d+ [-+]?\d*\.?\d+
or
(?<=^LINESTRING\()[^,]+

What about using the right tool for the right job ? This is a perl module to proper parse WKT :
Code :
#!/usr/bin/env perl
use strict; use warnings;
use Geo::WKT::Simple;
my $arr = [];
push #{ $arr }, Geo::WKT::Simple::wkt_parse_linestring("LINESTRING(-111.928130305897 33.4490602213529,-111.928130305897 33.4490602213529)");
print join "\n", #{ $arr->[0] };
Output :
-111.928130305897
33.4490602213529
Doc :
https://metacpan.org/pod/distribution/Geo-WKT/lib/Geo/WKT.pod

Related

Regex to match substrings containing n non-repeated characters

I am facing a (naive) problem with regular expression.
I need to find any substrings composed of a fixed number (n) of different characters.
So, for "aaabcddd", if n=3 the substrings that I expect to find are: "abc" and "bcd".
My idea is to use n-1 capture groups and '[^' to exclude characters already matched. Thus, I wrote the following Perl regex (in Julia):
r"(([[:alpha:]])[^\2])[^\1]"
But, it is not working.
Do you have any tips?

You can not use a backreference to a capture group using a negated character class [^\1]
What you can do is use a negative lookahead to assert what is directly to the right of the current position is not what you have already captured in a previous group.
If that is the case, capture a single alpha in a new group.
The matches abc and bcd are in capture group 1
(?=(([[:alpha:]])(?!\2)([[:alpha:]])(?!\3|\2)[[:alpha:]]))
(?= Positive lookahead
( Capture group 1
([[:alpha:]]) Capture the first char in group 2
(?!\1)([[:alpha:]]) If not looking at what is captured by group 2 to the right, capture the second char in group 3
(?!\2|\1) If not looking to the right at what is captured by group 2 or 3
[[:alpha:]] Mach the 3rd char
) Close group 1
) Close the lookahead
Regex demo
Or a bit shorter using a case insensitive match:
(?=(([a-z])(?!\2)([a-z])(?!\3|\2)[a-z]))

Here is a solution to an arbitrary value of n characters:
#!/usr/local/bin/perl
use strict; use warnings; use feature ':5.10';
my $s="aaabcded";
my $n=3;
while ($s=~/(?=([[:alpha:]]{$n}))/g){
my $hit=$1;
my #chars = split //, $hit;
my %uniq;
#uniq{#chars} = ();
say "$hit" if (scalar keys %uniq) == $n;
}
Running with $n=3 prints:
abc
bcd
cde
Running with $n=4 prints:
abcd
bcde
And $n=5:
abcde

regexp to match only one occurrence followed by two digits

I want to replace the , by a . if both the following cases are true:
, should be present only once in the string
, should be followed by a maximum of two digits
These are OK: 1 000 000,51, 1.000,9
These are not: 9,523,036.11, 1,000
My evolution so far: https://regex101.com/r/njuKtb/1

You may use this regex for search:
^([^,]*),(?=\d{1,2}(?!\d))(?!.*,)
And use this replacement:
$1.
RegEx Demo
RegEx Details:
^([^,]*): Match 0 or more non-comma characters at the start
,: Match literal comma
(?=\d{1,2}(?!\d)): Match 1 or 2 digits not followed by another digit
(?!.*,): Make sure we don't have comma ahead
Alternatively use this for search:
^([^,]*),(?=\d{1,2}(?!\d))([^,\n]*)$
and replace by:
$1.$2

You can do:
/^(?!^[^,\n]*,[^,\n]*,[^,\n]*)(?:[^,\n]*),(?=\d{1,2}\D*$)/m
Demo
Which is:
^ Start of string or line
(?!^[^,\n]*,[^,\n]*,[^,\n]*) Only matches lines with a single ','
(?:[^,\n]*) Suck up the LH before the ,
, The ,
(?=\d{1,2}\D*$) no more than two \d before end of the line

Regex: find occurrence of a digit in a string

Problem: I want to match those strings which contains two digits.Their position is random and a digit should match 2 times.
Example for better understanding my question:
3abc3
a22de
b7abc7a
For these strings it must match.If a string contains two digits but they are different then it shouldn't match.
Example:
3abcd2 not supposed to match
3abc3 -> supposed to match
I tried using {n}, but it not helps, because it thinks the two number follows each other.

You can use this grep:
grep -E '([0-9]).*\1' file
3abc3
a22de
b7abc7a
About this Regex:
([0-9]) # match and capture any digit in group #1
.* # match 0 or more of any character in between
\1 # using back-reference \1, make sure we have same digit as in group #1

capture alphanumeric string using regex

I have a file that contain any of the following strings
"155"
>555.123.4567
>555-123-4567
>(555).123.4567
>(555)123-4567
>(555)-123-4567
I would like to capture the strings except the 1st with the output like below using regex
555.123.4567
555-123-4567
(555).123.4567
(555)123-4567
(555)-123-4567
So far I am only able to come up with the regex below but it work only to the last three strings
/(\([\d]+\).\-?(-|)[\d]+.-?[\d]+)/g

You can use this regex with optional delimiters to match your inputs:
/[(]?\d{3}[)]?[.-]?\d{3}[.-]?\d{4}\b/
RegEx Demo

It looks like you want to search for
Three digits in parentheses, or just three digits
A dot or a hyphen, or nothing
Three more digits
At least one dot or hyphen
Four more digits
This regex is a transliteration of that. Note that it's always sensible to add the /x modifier with complex patterns so that you can add insignificant spaces and newlines to make your program more readable
use strict;
use warnings 'all';
use feature 'say';
while ( <DATA> ) {
next unless / (
(?: \( \d{3} \) | \d{3} )
([.-]|)
\d{3}
[.-]
\d{4}
) /x;
say $1;
}
__DATA__
"155"
>555.123.4567
>555-123-4567
>(555).123.4567
>(555)123-4567
>(555)-123-4567
output
555.123.4567
555-123-4567
(555).123.4567
(555)123-4567
(555)-123-4567
But that's very specific, and it looks like you're verifying manually-input phone numbers. I'm not sure you wouldn't be better off with this
Some digits and parentheses
A dot or a hyphen
More digits, dots, or hyphens
which looks like this
/ ( [()\d]+ [.-] [\d.-]+ ) /x

Selecting if no delimiter, and no selecting if it is

I have string like "smth 2sg. smth", and sometimes "smth 2sg.| smth.".
What mask should I use for selecting "2sg." if string does not contains"|", and select nothing if string does contains "|"?

I have 2 methods. They both use something called a Negative Lookahead, which is used like so:
(?!data)
When this is inserted into a RegEx, it means if data exists, the RegEx will not match.
More info on the Negative Lookahead can be found here
Method 1 (shorter)
Just capture 2sg.
Try this RegEx:
(\dsg\.)(?!\|)
Use (\d+... if the number could be longer than 1 digit
Live Demo on RegExr
How it works:
( # To capture (2sg.)
\d # Digit (2)
sg # (sg)
\. # . (Dot)
)
(?!\|) # Do not match if contains |
Method 2 (longer but safer)
Match the whole string and capture 2sg.
Try this RegEx:
^\w+\s*(\dsg\.)(?!\|)\s*\w+\.?$
Use (\d+sg... if the number could be longer than 1 digit
Live Demo on RegExr
How it works:
^ # String starts with ...
\w+\s* # Letters then Optional Whitespace (smth )
( # To capture (2sg.)
\d # Digit (2)
sg # (sg)
\. # . (Dot)
)
(?!\|) # Do not match if contains |
\s* # Optional Whitespace
\w+ # Letters (smth)
\.? # Optional . (Dot)
$ # ... Strings ends with

Something like this might work for you:
(\d*sg\.)(?!\|)
It assumes that there is(or there is no)number followed by sg. and not followed by |.

^.*(\dsg\.)[^\|]*$
Explanation:
^ : starts from the beginning of the string
.* : accepts any number of initial characters (even nothing)
(\dsg\.) : looks for the group of digit + "sg."
[^\|]* : considers any number of following characters except for |
$ : stops at the end of the string
You can now select your string by getting the first group from your regex

Try:
(\d+sg.(?!\|))
depending on your programming environment, it can be little bit different but will get your result.
For more information see Negative Lookahead

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

WKT: regex to extract only the first two floats values - regex

I have the input below: LINESTRING(-111.928130305897 33.4490602213529,-111.928130305897 33.4490602213529) and I need a regex that generates this: -111.928130305897 33.4490602213529 Its essentially the first two floats.

Related

Regex to match substrings containing n non-repeated characters

regexp to match only one occurrence followed by two digits

Regex: find occurrence of a digit in a string

capture alphanumeric string using regex

Selecting if no delimiter, and no selecting if it is

Categories

Resources