RegEx. Get the value from the quotes and check for the attribute name [duplicate] - regex

What would be a quick way to extract the value of the title attributes for an HTML table:
...
<li>Proclo</li>
<li>Proclus</li>
<li>Ptolemy</li>
<li>Pythagoras</li></ul><h3>S</h3>
...
so it would return Proclo, Proclus, Ptolemy, Pythagoras,.... in strings for each line. I'm reading the file using a StreamReader. I'm using C#.
Thank you.

This C# regex will find all title values:
(?<=\btitle=")[^"]*
The C# code is like this:
Regex regex = new Regex(#"(?<=\btitle="")[^""]*");
Match match = regex.Match(input);
string title = match.Value;
The regex uses positive lookbehind to find the position where the title value starts. It then matches everything up to the ending double quote.

Use the regexp below
title="([^"]+)"
and then use Groups to browse through matched elements.
EDIT: I have modified the regexp to cover the examples provided in comment by #Staffan Nöteberg

Related

RegEx remove part of string and and replace another part

I have a challenge getting the desired result with RegEx (using C#) and I hope that the community can help.
I have a URL in the following format:
https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1
I want make two modifications, specifically:
1) Remove everything after 'value' e.g. '&ida=0&idb=1'
2) Replace 'category' with e.g. 'newcategory'
So the result is:
https://somedomain.com/subfolder/newcategory/?abc=text:value
I can remove the string from 1) e.g. ^[^&]+ above but I have been unable to figure out how to replace the 'category' substring.
Any help or guidance would be much appreciated.
Thank you in advance.
Use the following:
Find: /(category/.+?value)&.+
Replace: /new$1 or /new\1 depending on your regex flavor
Demo & explanation
Update according to comment.
If the new name is completely_different_name, use the following:
Find: /category(/.+?value)&.+
Replace: /completely_different_name$1
Demo & explanation
You haven't specified language here, I mainly work on python so the solution is in python.
url = re.sub('category','newcategory',re.search('^https.*value', value).group(0))
Explanation
re.sub is used to replace value a with b in c.
re.search is used to match specific patterns in string and store value in the group. so in the above code re.search will store value from "https to value" in group 0.
Using Python and only built-in string methods (there is no need for regular expressions here):
url = r"https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1"
new_url = (url.split('value')[0] + "value").replace("category", 'newcategory')
print(new_url)
Outputs:
https://somedomain.com/subfolder/newcategory/?abc=text:value

Regex to replace text between slash and colon in xpath

I have xpaths e.g( "/name:ABC/dep:HR/eid:123" ). I have input string in this format and expecting output data to be in "/ABC/HR/123".
Please share your thoughts how to use regex pattern in scala or Java.
See regex in use here
(?<=/)[^:]*:
See code in use here
object Main extends App {
val xpath = "/name:ABC/dep:HR/eid:123"
val regex = "(?<=/)[^:]*:".r
println(regex.replaceAllIn(xpath, ""))
}
Results in /ABC/HR/123

regular expression to find a repeated code

I am trying to write a reg expression to find match of strings / code in a database.
here is some of the sample code / string which i need to remove using the regular expression.
[b:1wkvatkt]
[/b:1wkvatkt]
[b:3qo0q63v]
[/b:3qo0q63v]
[b:2r2hso9d]
[/b:2r2hso9d]
Anything that match [b:********] and [/b:********]
Anybody please help me out. Thanks in advance.
You can use the following pattern (as stated by LukStorms in the comments):
\[\/?b:[a-z0-9]+\]
If you want to replace [b:********] with <b> (and also the closing one), you can use the following snippet (here in JavaScript, other languages are similar):
var regex = /\[(\/)?b:[a-z0-9]+\]/g;
var testText = "There was once a guy called [b:12a345]Peter[/b:12a345]. He was very old.";
var result = testText.replace(regex, "<$1b>");
console.log(result);
It matches an optional / and puts it into the first group ($1). This group can then be used in the replacement string. If the slash is not found, it won't be added, but if it is found, it will be added to <b>.

REGEX: Extract latitude and longitude from Google Maps static image url

I want latitude and longitude from this string, representing an url (latitude and longitude are the numbers between ?center= and &zoom):
http://maps.google.com/maps/api/staticmap?center=40.390400788244364,-3.689793032995914&zoom=16&size=710x440&maptype=roadmap&sensor=false&markers=color:red%7C40.390400788244364,-3.689793032995915
I'm using this regex:
http:\/\/maps\.google\.com\/maps\/api\/staticmap\?center=(\d),(\d)&zoom=[[:ascii:]]+
But not getting any results. My regex skills are rusty...
Any clues?
Many thanks in advance!
Try this:
center=(-?[\d]*\.[\d]*),(-?[\d]*\.[\d]*)&
Code sample
This simpler version worked for me (used it in javascript) when the map links is shorter as well like this:
https://maps.google.com/maps?q=19.340742111206055%2C-99.21727752685547&z=17&hl=es
I'm using this regex to extract the coordinates set.
[-]?[\d]+[.][\d]*
I end up with this:
19.340742,-99.217278
I would use this to exclude center and the & symbol.
(?<=\=)([\-]?[\d]*\.[\d]*),([\-]?[\d]*\.[\d]*)(?=&)
I should mention in a few languages it's actually this
(?=)([\-]?[\d]*\.[\d]*),([\-]?[\d]*\.[\d]*)(?=&)
This will leave you with
40.390400788244364,-3.689793032995914
without "center=" and without "&" on the end
Your regex didn't match because of (\d),(\d). Each group captures a single digit.
http:\/\/maps\.google\.com\/maps\/api\/staticmap\?center=(\d),(\d)&zoom=[[:ascii:]]+
↑ ↑
But you want to match one or more out of minus, period and digit. Use ([-\d.]+) instead of (\d)
See demo with your updated regex at regex101.
An alternative to Regex in this case would be to simply split the string until you get what you want, since the URL structure stays the same before the latitude value.
In python for example, you can do :
my_url = 'http://maps.google.com/maps/api/staticmap?center=40.390400788244364,-3.689793032995914&zoom=16&size=710x440&maptype=roadmap&sensor=false&markers=color:red%7C40.390400788244364,-3.689793032995915'
lat_long = my_url.split('http://maps.google.com/maps/api/staticmap?center=')[1].split('&zoom')[0].split(',')
print(lat_long)
Out: ['40.390400788244364', '-3.689793032995914']

How to exclude a character in Regex

I have this Regex expression
UriPatternToMatch= new Regex(#"(href|src)=""[\d\w\/:##%;$\(\)~_\?\+\-=\\\.&]*",
RegexOptions.Compiled | RegexOptions.IgnoreCase)
This is working fine to pickup all URLS including http,ftp and others , but it picks up text within "&lt" special characters as URL too
for example it will wrongly pick up the text below as a URL too ( adding a photo instead of text below)
I believe something like ^&lt is what is needed , but where do I add it ?
Thanks
You need to use negative lookahead like this:
(?!.*?<)