REGEX: Extract latitude and longitude from Google Maps static image url - regex

I want latitude and longitude from this string, representing an url (latitude and longitude are the numbers between ?center= and &zoom):
http://maps.google.com/maps/api/staticmap?center=40.390400788244364,-3.689793032995914&zoom=16&size=710x440&maptype=roadmap&sensor=false&markers=color:red%7C40.390400788244364,-3.689793032995915
I'm using this regex:
http:\/\/maps\.google\.com\/maps\/api\/staticmap\?center=(\d),(\d)&zoom=[[:ascii:]]+
But not getting any results. My regex skills are rusty...
Any clues?
Many thanks in advance!

Try this:
center=(-?[\d]*\.[\d]*),(-?[\d]*\.[\d]*)&
Code sample

This simpler version worked for me (used it in javascript) when the map links is shorter as well like this:
https://maps.google.com/maps?q=19.340742111206055%2C-99.21727752685547&z=17&hl=es
I'm using this regex to extract the coordinates set.
[-]?[\d]+[.][\d]*
I end up with this:
19.340742,-99.217278

I would use this to exclude center and the & symbol.
(?<=\=)([\-]?[\d]*\.[\d]*),([\-]?[\d]*\.[\d]*)(?=&)
I should mention in a few languages it's actually this
(?=)([\-]?[\d]*\.[\d]*),([\-]?[\d]*\.[\d]*)(?=&)
This will leave you with
40.390400788244364,-3.689793032995914
without "center=" and without "&" on the end

Your regex didn't match because of (\d),(\d). Each group captures a single digit.
http:\/\/maps\.google\.com\/maps\/api\/staticmap\?center=(\d),(\d)&zoom=[[:ascii:]]+
↑ ↑
But you want to match one or more out of minus, period and digit. Use ([-\d.]+) instead of (\d)
See demo with your updated regex at regex101.

An alternative to Regex in this case would be to simply split the string until you get what you want, since the URL structure stays the same before the latitude value.
In python for example, you can do :
my_url = 'http://maps.google.com/maps/api/staticmap?center=40.390400788244364,-3.689793032995914&zoom=16&size=710x440&maptype=roadmap&sensor=false&markers=color:red%7C40.390400788244364,-3.689793032995915'
lat_long = my_url.split('http://maps.google.com/maps/api/staticmap?center=')[1].split('&zoom')[0].split(',')
print(lat_long)
Out: ['40.390400788244364', '-3.689793032995914']

Related

RegEx. Get the value from the quotes and check for the attribute name [duplicate]

What would be a quick way to extract the value of the title attributes for an HTML table:
...
<li>Proclo</li>
<li>Proclus</li>
<li>Ptolemy</li>
<li>Pythagoras</li></ul><h3>S</h3>
...
so it would return Proclo, Proclus, Ptolemy, Pythagoras,.... in strings for each line. I'm reading the file using a StreamReader. I'm using C#.
Thank you.
This C# regex will find all title values:
(?<=\btitle=")[^"]*
The C# code is like this:
Regex regex = new Regex(#"(?<=\btitle="")[^""]*");
Match match = regex.Match(input);
string title = match.Value;
The regex uses positive lookbehind to find the position where the title value starts. It then matches everything up to the ending double quote.
Use the regexp below
title="([^"]+)"
and then use Groups to browse through matched elements.
EDIT: I have modified the regexp to cover the examples provided in comment by #Staffan Nöteberg

Regex ignore first 12 characters from string

I'm trying to create a custom filter in Google Analytic to remove the query parts of the url which I don't want to see. The url has the following structure
[domain]/?p=899:2000:15018702722302::NO:::
I would like to create a regex which skips the first 12 characters (that is until:/?p=899:2000), and what ever is going to be after that replace it with nothing.
So I made this one: https://regex101.com/r/Xgbfqz/1 (which could be simplified to .{0,12}) , but I actually would like to skip those and only let the regex match whatever is going to be after that, so that I'll be able to tell in Google Analytics to replace it with "".
The part in the url that is always the same is
?p=[3numbers]:[0-4numbers]
Thank you
Your regular expression:
\/\?p=\d{3}\:\d{0,4}(.*)
Tested in Golang RegEx 2 and RegEx101
It search for /p=###:[optional:####] and capture the rest of the right side string.
(extra) JavaScript:
paragraf='[domain]/?p=899:2000:15018702722302::NO:::'
var regex= /\/\?p=\d{3}\:\d{0,4}(.*)/;
var match = regex.exec(paragraf);
alert('The rest of the right side of the string: ' + match[1]);
Easily use "[domain]/?p=899:2000:15018702722302::NO:::".substr(12)
You can try this:
/\?p\=\d{3}:\d{0,4}
Which matches just this: ?p=[3numbers]:[0-4numbers]
Not sure about replacing though.
https://regex101.com/r/Xgbfqz/1

Perform Regex on value returned by Regex

This is probably straightforward but I'm not even sure which phrase I should google to find the answer. Forgive my noobiness.
I've got strings (filenames) that look like this:
site12345678_date20160912_23001_to_23100_of_25871.txt
What this naming convention means is "Records 23001 through 23100 out of 25871 for site 12345678 for September 12th 2016 (20160912)"
What I want to do is extract the date part (those digits between _date and the following _)
The Regex: .*(_date[0-9]{8}).* will return the string _date20160912. But what I'm actually looking for is just 20160912. Obviously, [0-8]{8} doesn't give me what I want in this case because that could be confused with the site, or potentially record counts
How can I responsibly accomplish this sort of 'substringing' with a single regular expression?
You just need to shift you parentheses so as to change the capture group from including '_date' in it. Then you would want to look for your capture group #1:
If done in python, for example, it would look something like:
import re
regex = '.*_date([0-9]{8}).*'
str = 'site12345678_date20160912_23001_to_23100_of_25871.txt'
m = re.match(regex, str)
print m.group(0) # the whole string
print m.group(1) # the string you are looking for '20160912'
See it in action here: https://eval.in/641446
The Regex: .*(_date[0-9]{8}).* will return the string _date20160912.
That means you are using the regex in a method that requires a full string match, and you can access Group 1 value. The only thing you need to change in the regex is the capturing group placement:
.*_date([0-9]{8}).*
^^^^^^^^^^
See the regex demo.

regex for repeating values

I am trying to find the correct regex (for use with Java and JavaScript) to validate an array of day-of-week and 24-hour time formats. I figured out the time format but am struggling to come up with the full solution.
The regex needs to validate patterns which include one or more of the following, separated by a comma.
{two-character day} HH:MM-HH:MM
Three examples of valid strings would be:
M 5:30-7:00
M 5:30-7:00, T 5:30-7:00, W 18:00-19:30
F 12:00-14:30, Sa 6:45-8:15, Su 6:45-8:15
This should validate a 24-hour time:
/^((M|T|W|Th|Fr|Sa|Su) ([01]?[0-9]|2[0-3]):[0-5][0-9]-([01]?[0-9]|2[0-3]):[0-5][0-9](, )?)+$/
Credit for the time bit goes to mkyong: http://www.mkyong.com/regular-expressions/how-to-validate-time-in-24-hours-format-with-regular-expression/
you can try this
[A-Za-z]{1,2}[ ]\d+:\d+-\d+:\d+
You could try this: ([MTWFS][ouehra]?) ([0-9]|[1-2][0-9]):([0-6][0-9])-([0-9]|[1-2][0-9]):([0-6][0-9])
I'd go with this:
(((M|T(u|h)|W|F|S(a|u)) ((1*\d)|(2[0-3])):[1-5]\d-((1*\d)|(2[0-3])):[1-5]\d(, )?)+
This should do the trick:
^(M|Tu|W|Th|F|Sa|Su) \d{1,2}:\d{2}-\d{1,2}:\d{2}(, (M|Tu|W|Th|F|Sa|Su) \d{1,2}:\d{2}-\d{1,2}:\d{2})*$
Note that you show T in your example above which is ambiguous. You might want to enforce Tu and Th as shown in my regex.
This will capture all sets in an array. The T in the short day of week list is debatable (tuesday or thursday?).
^((?:[MTWFS]|Tu|Th|Sa|Su)\s(?:[0-9]{1,2}:[0-9]{2})-(?:[0-9]{1,2}:[0-9]{2})(?:,\s)?)+$
The (?:) are non-capturing groups, so your actual matches will be (for example):
M 5:30-7:00
T 5:30-7:00
W 18:00-19:30
But the entire line will validate.
Added ^ and $ for line boundaries and an explicit time-time match because some regular expression parsers may not work with the previous way that I had it.

MATLAB 2012 regular expression

I have a set of strings that I'd like to parse in MATLAB 2012 that all have the following format:
string-int-int-int-int-string
I'd like to pluck out the third integer (the rest are 'don't cares'), but I haven't used MATLAB in ages and need to refresh on regular expressions. I tried using the regular expression '(.*)-(.*)-(.*)-\d-(.*)' but no dice. I did check out the MATLAB regexp page, but wasn't able to figure out how to apply that information to this case.
Anyone know how I might get the desired result? If so, could you explain what the expression you're using is doing to get that result so that others might be able to apply the answer to their unique situation?
Thanks in advance!
str = 'XyzStr-1-2-1000-56789-ILoveStackExchange.txt';
[tok] = regexp(str, '^.+?-.+?-.+?-(\d+?)-.+?-.+?', 'tokens');
tok{:}
ans =
'1000'
Update
Explanation, upon request.
^ - "Anchor", or match beginning of string.
.+? - Wildcard match, one or more, non-greedy.
- - Literal dash/hyphen.
(\d+?) - Digits match, one or more, non-greedy, captured into a token.
^.*?-.*?-.*?-(\d+)-.*?-.*?$
OR
^(?:[^-]*?-){3}(\d+)(?:.*?)$
Group1 now contains your required data