how can i use RegExp to grap data from this site? - regex

i want to grap data from this site by regexp
http://aymanalrefai.wordpress.com/2010/03/15/596/
i tried this
/<div class="entry">.*?<p class="postinfo">/is
but it didn't give me a correct result
any help ?

Try this :
/<div class="entry">[\s\S]+?</div>/is

Related

Regex for this URL, http://www.chip.de and this domain chip.de

I am trying to create a regex to look for similar URL and domain like this below
*chip.de
http://www.chip.de*
I tried to use the regex expression
http?:\/\/([\w\.-]+)([\/\w \.-]*)
It did not capture the URL.
I tried to use the url, https://www.regextester.com/99497 to test it out and it failed..
What am I missing?
Please create two rules for domain and URL
Thank you
If you're simply looking for regex that will match URLs which include chip.de then please try this and let me know if it is sufficient:
https?\:\/\/www\.chip\.de.*

Remove multi match in regex

I want to make a redirection on an url :
/XX/YY/ZZ%3E%3E%3E%3E%3E%3E%3E%3E%3E => /XX/YY/ZZ
I don't find the good regex to remove multi match "%3E" at the end of the url.
Can you help me please ?
This should work (for actual URLs with the indicated kind of suffix):
x = "https://www.test.com/XX/YY/ZZ%3E%3E%3E%3E%3E%3E%3E%3E%3E"
s.gsub(/(%3E)+$/,"")
Try this pattern:
\/[\w]{2}
You can test it online
Thank you Human and Drux !
You really helped me to find the solution :
r301 %r{^/XX/([\w\/]*)(%3E)+$}, '/XX/$1'

Regular expression to filter URLs?

I need to filter some specific URLs using Regex in Google analytics.
It should only filter the below format URLs from the all URLs recorded:
/job/41-content-verification?action=register
/job/62-data-verification?action=register
/job/33-data-entry?action=register
Like starts with '/job/' then 'some string/data' and ends with '?action=register'
I need the regex to be put in Google analytics filter. Please help.
Try this:
^/job/.+?\?action=register$
^\/job\/.+?action=register$
Try this.See demo.
http://regex101.com/r/sU3fA2/24
^\/job\/[a-z0-9-]+\?action=register$

Regex to check a Valid URL

Problem: I need a Regex which would check a given author URL is valid or not.
Requirement : Author URL is basically a URL from social networking sites/blogs etc having author id (profile id)
For eg .
www.facebook.com/RyanMathews
www.mouthshut.com/zobo.786
The regex as per my understanding would have to accept any string(combination of any characters ) after the sites complete address is followed by a " / " .
Tried Using this regex but doesnt support author ids
var urlregex = /^((https?:\/\/)?((([a-z\d])+(\-)?([a-z\d])+)+)(\.([a-z\d])+(\-)?([az\d])+)?)(\.[a-z]{2,4}?){1,2}$/i;
PS : Please explain the Regex & Logic too :D
it should Help but I will recommend to do little background reading.
What is the best regular expression to check if a string is a valid URL?
Getting parts of a URL (Regex)
Please spend some time to read these links and understand them, hope this helps, cheers!
^(http:\/\/){0,1}(www.[^\W]+.com)(\/[^\W]+)+
maybe this would work

Regexp to simplify Yahoo Answers Feed Title

I am trying to parse the yahoo answers feed - http://answers.yahoo.com/rss/allq
The issue is that the titles have
[ Category ] : Open Question :
in every title that I do not want... I want to write a regexp to remove this...
anything that we can make to remove all the letters in the starting [ and the first : should do it.
there is a space after the : also, we need to remove that too.
Thanks for this in advance, I will also try to find a solution myself.
Have you considered using Yahoo's YQL service to parse this feed (or other web pages)?
Querying html using Yahoo YQL
Yahoo! Query Language
YQL Console
They already have sample queries for you to get at Yahoo Answers data:
answers.getbycategory:
http://developer.yahoo.com/yql/console/#h=select%20*%20from%20answers.getbycategory%20where%20category_id%3D2115500137%20and%20type%3D%22resolved%22
answers.getbyuser:
http://developer.yahoo.com/yql/console/#h=select%20*%20from%20answers.getbyuser%20where%20user_id%3D%22YbaMGtHFaa%22
answers.getquestion:
http://developer.yahoo.com/yql/console/#h=select%20*%20from%20answers.getquestion%20where%20question_id%3D%2220090526102023AAkRbch%22
answers.search:
http://developer.yahoo.com/yql/console/#h=select%20*%20from%20answers.search%20where%20query%3D%22cars%22%20and%20category_id%3D2115500137%20and%20type%3D%22resolved%22
(Just an FYI in case you weren't aware of this convenient service. I use it instead of screen scraping with RegEx's.)
the following regex should do the job:
^\[.*?:
Usage sample in c#:
string resultString = Regex.Replace(subjectString, #"^\[.*?: ", "");
What it does is start with an [ bracket and take any characters until it matches a : and take the follwing space.
Hope this helps,
Tom.
Thanks # cmptrgeekken for pointing the non greedy thing out!