Regular Expression help - regex

I'm pretty hopeless with regualr expressions and im struggling with what is probable an increadably simple one!
I have a string which contains many instances of something like this:
<li>STEAK</li>
I know the value of 'STEAK' and im looking for the value of the href attribute.
This value can be anything.
I'm using C# .net 4.0
Thanks for any help

use
HTML agility pack
instead (don't regex html)

Related

Is there a function to create a regex pattern from a string input?

I'm lousy at regular expressions but occasionally they're the only thing that's the right solution for a problem.
Is there something in the .NET framework that allows you to input an unencoded string and get a pattern from it? Which you could then modify as required?
e.g. I want to remove a CDATA section that contains a file from some XML but I can't work out what the right pattern is for <![CDATA[hugepileofrandombinarydataherethatalsoneedstogo]]> and I don't want to ask for help each time I'm stuck on a regex pattern.
Such tools exist, google by "regex generator".
But, as suggested in comments, better learn regex. Simple patterns are easy. Something like <!\[.*?]]>
in your case.
There are Regex Design tools like expresso...
http://www.ultrapico.com/expresso.htm
It's not perfect but as there is no suitable .Net component the text to regex page at txt2re.com is the best I've seen for those people who occasionally need to build a regex to match a string but don't have the time to relearn regex each time they want to use one.

How can I match a tag ignoring its attributes

So say I have some HTML that looks like the following
This text
This Text
This Text
What regex would grab 'this text'? I'm really new to the concept of regex so have tried things like "(.*)" but not really had much success. Can anyone offer advice? :D
You can use the following regular expression
<a\b[^>]*>(.*?)</a>
You can find more examples on this site

Quick regex help: grab text from html

I have the following html snippet:
<h1 class="header" itemprop="name">Some text here<span class="nobr">
I would like to get the text between the html tags, I'm struggling with this for hours now, please help me! What regex would solve my problem?
You should not use regex for that, but some HTML parser. As you didn't specify language, it is hard to help, but you will find it by googling...
If you need it just for this one case, you can use regex />(.*?)</
In Javascript you can access that info via:
document.getElementsByTagName("h1").item(0).textContent
or
document.getElementsByClassName("header").item(0).textContent
Like other's have said - you shouldn't be using regular expressions for parsing HTML. But with that aside the following will grab that text for you:
(?<=\>).+(?=\<)

What is a better way to write this regular expression?

I am converting XML children into the element parameters and have a dirty regex script I used in Textmate. I know that dot (.) doesn't search for newlines, so this is how I got it to resolve.
Search
language="(.*)"
(.*)<education>(.*)(\n)?(.*)?(\n)?(.*)?(\n)?(.*)?</education>
(.*)<years>(.*)</years>
(.*)<grade>(.*)</grade>
Replace
grade="$13" language="$1" years="$11">
<education>$3$4$5$6$7$8$9</education>
I know there's a better way to do this. Please help me build my regex skills further.
Use an xml parser, don't use regex to parse xml.
If there are no other tags inside the <education> element, I would change that part to:
<education>([^<>]*)</education>
If possible, I would use the same technique everywhere else you're using .*. In the case of the language attribute, it would take this form:
language="([^"]*)"

A Regex builder for CSS queries

I've got a problem I need solved using Regex expressions; it involves taking a CSS selector and compiling a regex that matches the string representation of the nodes inside an HTML document. The point is to avoid parsing the HTML as XML and then either making Xpath or DOM queries to apply style attributes.
Does anyone know of a project that already implements something like this in any language? The target platform would be .NET 3.5.
Html Agility Pack
Regular expressions seem like an amazingly bad way of matching those nodes. I'm not sure I follow your problem - why not just use something like jquery to pick out those nodes? eg given a css selector 'div>span.red:first-child',
$('div>span.red:first-child')
would return an array of those matching nodes.
EDIT: Oh, wait - are you trying to do this 'offline', as it were - not in a user's browser? Yeah, ignore my advice. (Even so, I'd still suggest that regular expressions aren't going to help you. Why are you against generating an xml-document representation of the page?)