Regex to to select 3 pages + any sub-pages - regex

I need a regex for a google goal that pulls in only 3 pages and their child pages
The pages are:
https://test.com/employers/
https://test.com/employers/any-page-in-employers-folder
https://test.com/partners/
https://test.com/partners/any-page-in-partners-folder
https://test.com/brokers/
https://test.com/brokers/any-page-in-brokers-folder
I appreciate your help

In Analytics you can generically use this:
\/employers\/|\/partners\/|\/brokers\/
If you want it to be more robust and you don't have the domain in the reports you can use this:
^\/employers\/|^\/partners\/|^\/brokers\/
If you also have the domain and the protocol you can use this:
https:\/\/test\.com(\/employers\/|\/partners\/|\/brokers\/)

Related

Regular expression to get specific pages out of a list of landing pages in Google Analytics

In Google Analytics, I need to select landing pages for each hotel my client operates. Hotel pages are identified by the string /hotels-in-XYZ/.
I need to exclude all other pages
I need to exclude sub pages like /hotels-in-XYZ/offer-page/ too
Sample list of hotels:
/XXX-one/login/
/hotels-in-ranthambhore/
/hotels-in-jaipur-resort/
/hotels-in-morocco-marrakech/
/about-us/
/hotels-in-mumbai/
/hotels-in-bengaluru/
/hotels-in-agra-resort/special-offers/extended-stay-offer/
/hotels-in-shimla/amp/
/hotels-in-udaipur-resort/amp/
I'm not that familiar with regex and I've been googling to find a solution. The closest I have is .*?\/hotels(.*)\/.* but it does not exclude page like /hotels-in-shimla/amp/
Your help would be appreciated. Let me know if I need to post any additional information to explain the question better.
Does ^\/hotels-in-[\w\-]+\/$ work for you?
I tested this at https://regex101.com/r/9c2IRC/1/

How can I use a regex to validate slideshare slideshow URLs?

I am using www.slideshare.net to allow my users to display embedded slideshows on their profiles.
I'm using slideshare's api to get the slideshow's id, given the slideshow link that users has to get by clicking 'share' on the slideshow and copy/paste the url:
What I would need is to validate thoroughly the latter url.
Just to further explain my process, when I have the slideshow's id, I compute the embedded code like so :
"<iframe src='https://www.slideshare.net/slideshow/embed_code/" + json.slideshow_id + "' frameborder='0' allowfullscreen webkitallowfullscreen mozillaallowfullscreen></iframe>"
where json is the object returned by slideshare's api.
A basic regex to answer my question would be:
^http\://www\.slideshare\.net/[a-zA-Z0-9\-]+/[a-zA-Z0-9\-]+$
But it feels a little weak to me :
I don't want my users to just copy/paste the url in the navigator address bar
I'm not sure this regex works for all slideshare's slideshows as I'm not a slideshare specialist (does that even exist?)
Ideally I would like to exclude all other regular urls from www.slideshare.net that doesn't point to a slideshow.
EDIT 7/12/2014: rewrite
You can use something like this:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
More example from this website

Yahoo Pipes and Website Name

How do I fetch Page Name with Yahoo Pipes?
I'm making a news / blog aggregator, and need to know the name of the site where the info is coming from (bbc, cnn, fox, etc).
Do I need to do this with REGEX?
Anyone that can help?
You can fetch the page using the XPath Fetch Page or Fetch Feed modules in the Sources menu. Maybe with others too.
After that you can extract the page name itself using the various operators, possibly Regex, or others, depending on the source page you are using and the output you want to get.
In general your question is too broad and difficult to answer. To get you started, I created an example pipe that extracts the title of your question from this post, which is basically the "page name" of the current page.
http://pipes.yahoo.com/pipes/pipe.info?_id=668acf3f807c30d7b75f12459edd3252
I used the XPath Fetch Page with parameters:
URL = this page
Extract using XPath = //div[#id="question-header"]
I got that div path by inspecting the source code of this page, where I saw that div#question-header is the container of a question. I could have selected a deeper inner container or a higher level container. It all depends on the amount of other information you need. The more information you want to you from the page, the higher level container you select.
Next, I used the Create RSS operator to create a proper RSS feed, with parameters:
Title = h1.a
Link = h1.a.href
I chose these elements because in the container I extracted with xpath, the page name is inside h1 a. In Yahoo Pipes you use a dot as the path separator.
I found this sample pipe http://pipes.yahoo.com/pipes/pipe.info?_id=69b5dce1c59501a0c64a660c1cfdb856. The page title included the name of the site too. I am not sure if this what you are looking for.

How to Write Regex for Google Analytics Funnel Containing Dynamic URLS

I'm trying to build a funnel for pages with dynamic URLS. My regex-fu is terrible. I'm trying to see how users do on one of our wizards. The URLS I care about all have each project's name in them.
/projects/<PROJECT_NAME>/wizard_steps/1
/projects/<PROJECT_NAME>/wizard_steps/2
/projects/<PROJECT_NAME>/wizard_steps/3
So I think I need to do something like this in order allow for these dynamic URLs.
/projects/?.*$/wizard_steps/1
/projects/?.*$/wizard_steps/2
/projects/?.*$/wizard_steps/3
Does this looks correct? Any guidance would be deeply appreciated.
Try this regular expression:
/projects/([^/]+)/wizard_steps/.*

Parse exported bookmarks file with ColdFusion

I need to parse a list of bookmarks exported from a browser like Chrome, Firefox and IE. Maybe even google etc.
I played around and did something like this reMatchNoCase("(<h3)(.*?)(</dl>)",myfile1) loop. Then I use reMatchNoCase("(<dt[>])(.*?)(</a>)",i) within the h3/dl
tags, and then a lot of cleanup, but its really not reliable.
The thing is that they have categories using h3 tags surrounded by dl tags and then the bookmarks in that. I can't just parse all URLs since I want to get the categories as in the browser.
Thanks.
if it is XHTML, use XPath
if it is not, it wouldn't be easy. Search https://stackoverflow.com/search?q=parse+html
can you consider using a hybrid approach, parse with jQuery on client side first and post to CF?