Regex disclude folders in Google Analytics - regex

I have a website with multiple subfolders (e.g. /uk/ /australia/ /canada/ etc.), and also the root website.
In Google Analytics, I would like to use Regex to filter out and disclude the subfolders so I can see the stats of the main domain pages only (without the pages from the /uk/ /australia/ /canada/ subfolders). How can I do this?

Use the following regexp for the pagePath to exclude all subfolders:
^/[^/]+$
To exclude specific subfolders, use this:
^(?!/(uk|australia|canada|etc...))/.*$
(?!RE) is a negative lookahead.

Related

Excluding a URL With Google Analytics Regex

I am tracking several urls on my website and I want to count only the ones beginning with /espace-debat
Examples :
/espace-debat/debat
/espace-debat/user/random-number
/espace-debat/debats/random-number
I am creating a goal on analytics to exclude all the others urls.
I am thinking about this Regex
^/(?espace-debat)
I don't know how to test it
Have you tried escaping your expression?
^\/(espace-debat)\/?

Google Analytics - Track Multiple Subdirectories (regex)

I am trying to create a view in Google Analytics to filter out the analytics from multiple subdirectories and all pages in them.
www.example.com/mysite
www.example.com/anothersite
www.example.com/lastsite
This is the regex I have written but when I run it, no results get returned ^/(mysite|anothersite|lastsite)?/*
Any ideas what I am doing wrong?
I was able to find a solution by first adding a trailing slash to the URLs (see: https://www.getelevar.com/how-to/fix-duplicate-url-google-analytics/) and then including a regex pattern of the request URI of ^/(mysite|anothersite|lastsite)/

Jmeter URL patterns to exclude under workbench - not excluding patterns that are giving there

Jmeter URL patterns to exclude under workbench - not excluding patterns that are giving there.
Can we give direct URL's. i have a list of URL that needs to be excluded from the recorded script.
Example:
'safebrowsing.google.com`
'safebrowsing-cache.google.com'
'self-repair.mozilla.org'
i'm giving these directly under patters to exclude. or do i need to give as a regular expression only.
Can someone provide more info whether to use regular expression or direct url can be provided under Requests Filtering in workbench
JMeter uses Perl5-style regular expressions for defining URL patterns to include/exclude so you can filter out all the requests to/from google and mozilla domains by adding the following regular expression to URL Patterns to Exclude input of the HTTP(S) Test Script Recorder:
^((?<google>|mozilla>).)*$
See Excluding Domains From The Load Test article for more details.
If you want any of the patterns to be excluded from recording in the scripts please follow the below pattern and add it in "URL Patterns to Exlcude" it must work.
1. For .html : .*\.html.*
2. For .gif : .*\.gif.* etc

Regarding crawling urls for Google search appliance

We have a requirement where we need to crawl one particular set of URLs.
Say for example we have site abc.com. We need to crawl abc.com/test/needed -- all URL matching this pattern under "needed" folder. But we don't want to crawl rest of the URLs under abc.com/test/.
I guess this will be done using RegEx. Can anyone help me with respect to RegEx?
going from what you said in the comment, a pattern to match things of the form /xyz but not things of the form /xyz/imp:
/xyz(/[^i][^m][^p].*)?|/xyz/.{0,2}
The pattern that can be added to the GSA can be:
abc.com/test/needed
or
contains:abc.com/test/needed
The thing to consider is how the GSA will get this documents. If it can't spider to the folder it won't find the documents.
There are 3 specifications that you are allowed to make, in the GSA.
Start Crawl URLs -- these tell the GSA where to start looking for links.
Follow and crawl only URL patterns -- these tell the GSA which URLs from among those found starting with the "Start Crawl URLs", need to be followed and indexed.
Do not crawl URLS -- these are specifications for URLs patterns that match the above 2 patterns, but should not be crawled.
From as much as has been specified in the question itself, I think all you'll need to do is, put in a "Start crawl" url as: "abc.com/" and put in a "Follow and crawl only" specification as: "abc.com/test/needed/", assuming you need no other path/folder on the site crawled.

Nutch - why are my url exclusions not excluding those urls?

Surprise! I have another Apache Nutch v1.5 question. So in crawling and indexing our site to Solr via Nutch, we need to be able to exclude any content that falls under a certain path.
So say we have our site: http://oursite.com/ and we have a path that we don't want to index at http://oursite.com/private/
I have http://oursite.com/ in the seed.txt file and +^http://www.oursite.com/([a-z0-9\-A-Z]*\/)* in the regex-urlfilter.txt file
I thought that putting: -.*/private/.* also in the regex-urlfilter.txt file would exclude that path and anything under it, but the crawler is still fetching and indexing content under the /private/ path.
Is there some kind of restart I need to do on the server, like Solr? Or is my regex not actually the right way to do this?
thanks
My guess is that the url is accepted by first regex and the second one isn't checked anymore. If you want to deny URLs, put their regexes first in list.