I'm trying to filter some urls using gapi.client.analytics. What I want to achive is to create a regex filter that covers a lot of options. The regex should keep only urls that have this structure:
subdomain1.domain.com/some-post/
My problem is that I have some other urls that I don't know how to exclude, like:
subdomain1.domain.com/p/code/
subdomain1.domain.com/
subdomain1.domain.com/some-author/some-name/
subdomain2.domain.com/some-post/
subdomain2.domain.com/p/code/
I tried to use: ga:hostname=#subdomain1.domain.com to get links that contain only subdomain1.
I also tried: ga:hostname=~^[^/]+/?[^/]+/?$ to get only those who have 2 / in url.
Unfortunately I coudn't manage to do what I want.
Following regex should match URLs with exact one trailing directory
^[a-zA-Z0-9_-]+\.domain\.com\/[a-zA-Z0-9_-]+\/$
or
^[a-zA-Z0-9_\-\.]+\/[a-zA-Z0-9_-]+\/$
to match every domain.
You can text google analytics regex on analyticsmarket.com
Related
I'm using a web crawler tool to compare two different website crawls before and after migration and need to map paginated URLs that have changed format.
e.g
Old: https://example.com/page/2/ OR: https://example.com/directory/page/16/
New: https://example.com/?page=2 OR: https://example.com/directory/?page=16
The tool has a regex replace feature for URL maping,
However, I cannot get the regex correct and the end result has an extra forward slash at the end:
https://example.com/?page=2/
What is the correct regex here to get the result I'm looking for?
Regex: /page/([0-9]+)/
Replace: /?page=$1
I want to just see data for URLs which contain collection + category in google analytics so URLs which contain /collections/category example: https://baileynelson.com.au/collections/glasses
However i don't want to see data for products example: https://baileynelson.com.au/collections/glasses/products/adler
The regex i created is: ^/collections/(.*?)$ but it seems to be including product URLs.
Any ideas on how to create regex just so collection pages like https://baileynelson.com.au/collections/glasses, https://baileynelson.com.au/collections/sunglasses - but then product URLs are excluded?
Cheers!
Try using this regex here.
https:\/\/baileynelson\.com\.au\/collections\/[\w]+
The first part: htttps:\/\/baileynelson\.com\.au\/collections\/ This matches the domain and the path collections. The /s and .s are escaped.
Second part: [\w]+ This matches any words (abcde...z), and the + makes it so that is matches any amount.
Basically I want to filter for pages in this format:
/page
but I don't want pages like this:
/dir/page/page
dir/page
Is there any way to accomplish this using the regex filter in Google Analytics?
I tried the following:
/*/ but its not working at all.
Try this (to match pages like /home):
^\/[0-9A-Za-z]+$
If you need to add other characters you can add it in the brackets (i.e. - and . to match pages like /home, /store.html or /page-path):
^\/[0-9A-Za-z\-\.]+$
I'm trying to exclude (in a Goal) a character in a regex in Google Analytics.
Basically, I have two pages with the following URL:
/signup/done/b
/signup/done/bp
Note that both might have UTM parameters after in some results as well
I am trying to measure only /done/b
The Regex I had was the following, but it includes both strings:
(/signup/done/plan/b)
When I changed it (and verified it in an external regex tester) I got 0 results, so the /b/ was also not included.
(/signup/done/plan/b[^p])
This regex would handle the case where the URL ends with /b or if there are query parameters:
/signup/done/b($|\?.*)
So examples of converting URLs would be:
/signup/done/b
/signup/done/b?utm_campaign=test&utm_medium=display
/signup/done/b?query=value
Examples of non-converting URLs would be:
/signup/done/bd
/signup/done/b/something
I want to crawl the pages of Techcrunch uploaded after the 1 Jan of 2013.The website follows the pattern
http://www.techcrunch.com/YYYY/MM/DD
So my question is how to setup the regex in urlfilter in nutch so that i could crawl only pages which i want.
+^http://www.techcrunch.com/2013/dd/dd/([a-z0-9\-A-Z]*\/)*
I don't know nutch but do you try:
+^http://www.techcrunch.com/2013/[0-9]{2}/[0-9]{2}.*$
or
+^http://www.techcrunch.com/2013/[0-9]+/[0-9]+.*$
The following expressions will match the URLs you need:
Without groups
http:\/\/www.techcrunch.com\/\d{4}\/\d{2}\/\d{2}\/\w+
With groups
http:\/\/www.techcrunch.com\/(\d{4})\/(\d{2})\/(\d{2})\/(\w+)
I didn't put anchors (^$), but you can put them if you need them for the filtering.
Try them to see if any of them work.
I don't know how nutch works, but a couple of suggestions about your regex that may apply: the / in the regexp should be escaped; the dd parts should be \d\d so they match two digits.
About setting up the regex, check out this answer to see if it helps you.