Google Data Studio - Custom Field REGEXP_EXTRACT - regex

I am trying to use the REGEXP_EXTRACT custom field to pull a portion of my URL using the page dimension in Google Data Studio and cannot figure it out. The page url structure is similar to this -
website.forum.com/webforms/great_practiceinfo_part2.aspx?function=greatcoverage
I'd like to only extract the middle section "great_practiceinfo_part2". I've tried many different formulas, but nothing seems to work. Does the page dimension work in this scenario? Any help would be much appreciated.
Thanks

It seemed to work fine in Google Sheets when I =REGEXEXTRACT(A3,B3) using your string, website.forum.com/webforms/great_practiceinfo_part2.aspx?function=greatcoverage for A3 and the regex \/([^\/]*?)\.aspx\? for B3. I'm guessing you just need to learn more about how to make your regex pattern making string.

Related

Regular expression to get specific pages out of a list of landing pages in Google Analytics

In Google Analytics, I need to select landing pages for each hotel my client operates. Hotel pages are identified by the string /hotels-in-XYZ/.
I need to exclude all other pages
I need to exclude sub pages like /hotels-in-XYZ/offer-page/ too
Sample list of hotels:
/XXX-one/login/
/hotels-in-ranthambhore/
/hotels-in-jaipur-resort/
/hotels-in-morocco-marrakech/
/about-us/
/hotels-in-mumbai/
/hotels-in-bengaluru/
/hotels-in-agra-resort/special-offers/extended-stay-offer/
/hotels-in-shimla/amp/
/hotels-in-udaipur-resort/amp/
I'm not that familiar with regex and I've been googling to find a solution. The closest I have is .*?\/hotels(.*)\/.* but it does not exclude page like /hotels-in-shimla/amp/
Your help would be appreciated. Let me know if I need to post any additional information to explain the question better.
Does ^\/hotels-in-[\w\-]+\/$ work for you?
I tested this at https://regex101.com/r/9c2IRC/1/

Analytics Goal Funnel Regex doesn't recognize "example.html?p=2"

I have my goal funnel set up and this is the regex for one of the stages: ^/shop/(.*)
This will match pages such as /shop/collections/art.html but when I look at the goal funnel, it says people are dropping out by going to pages like /shop/collections/art.html?p=2. Notice the ?p=2 is the only difference here.
I tried to do it as ^/shop/((.|\?)*) but I'm not sure that's fixing it.
How do I fix this?

SharePoint-Search 2013 Query Transform keeps appending SPSPeople exclusion

I'm trying to get FQL working in an out-of-the-box Enterprise Search Site Collection in on-premise SharePoint 2013, with no success.
Intended query behavior is to:
- Accept and query search terms
- Limit results to the current subdomain (https://teams.domain.com/...)
- Exclude People from results
Our functioning KQL Query Transform is
{?{searchTerms} {?path:{QueryString.p}} -ContentClass=urn:content-class:SPSPeople}
As instructed in MSDN I copied current Result Source (in Site Collection Administration) and modified the Query Transform to:
andnot((and({?{searchTerms}},{?path:{QueryString.p}})),(filter(contentclass:"urn:content-class:SPSPeople*")))
I tried other variations as well but none work.
Even more puzzling to me when I go from "Basics" tab to "Test" tab and click "Show more", the Query text box is ALWAYS appended with
-ContentClass=urn:content-class:SPSPeople
Since it's not FQL formatted I figure that's why my template won't work. I've been at this all day now... Any suggestions what to do next? How do I get rid of that KQL suffix?
Figured it out... I trusted the FQL Query Tranformation was correct and bypassed the "Launch Query Builder" button altogether, inputting the FQL into Query Transform text box.

Solr search suggestion and results on wrong spelling

I am using solr 4.8.1 with django haystack and indexing across multiple fields - I am seeing a problem with some search queries that are spelt wrong, they are coming up with matches and also being put forward as a spelling suggestion.
Example: I have indexed documents that contain the word 'Berkeley' if I use the Solr admin UI and search for 'berkele' it comes up with the spelling suggestion 'berkelei' and then if i query 'berkelei' it will return 429 results (the same amount if I query 'berkeley')
I am using the example solrconfig.xml that came with solr and just generating the schema.xml using django haystack - has anyone got an idea why this would happen?
Basically I would like it to give the correct spelling suggestion when I query something like 'berkele' rather than another misspelt word
I managed to resolve this issue by removing from the schema.xml file generated by django-haystack.

Cleansing string / input in Coldfusion 9

I have been working with Coldfusion 9 lately (background in PHP primarily) and I am scratching my head trying to figure out how to 'clean/sanitize' input / string that is user submitted.
I want to make it HTMLSAFE, eliminate any javascript, or SQL query injection, the usual.
I am hoping I've overlooked some kind of function that already comes with CF9.
Can someone point me in the proper direction?
Well, for SQL injection, you want to use CFQUERYPARAM.
As for sanitizing the input for XSS and the like, you can use the ScriptProtect attribute in CFAPPLICATION, though I've heard that doesn't work flawlessly. You could look at Portcullis or similar 3rd-party CFCs for better script protection if you prefer.
This an addition to Kyle's suggestions not an alternative answer, but the comments panel is a bit rubbish for links.
Take a look a the ColdFusion string functions. You've got HTMLCodeFormat, HTMLEditFormat, JSStringFormat and URLEncodedFormat. All of which can help you with working with content posted from a form.
You can also try to use the regex functions to remove HTML tags, but its never a precise science. This ColdFusion based regex/html question should help there a bit.
You can also try to protect yourself from bots and known spammers using something like cfformprotect, which integrates Project Honeypot and Akismet protection amongst other tools into your forms.
You've got several options:
"Global Script Protection" Administrator setting, which applies a regular expression against post and get (i.e. FORM and URL) variables to strip out <script/>, <img/> and several other tags
Use isValid() to validate variables' data types (see my in depth answer on this one).
<cfqueryparam/>, which serves to create SQL bind parameters and validate the datatype passed to it.
That noted, if you are really trying to sanitize HTML, use Java, which ColdFusion can access natively. In particular use the OWASP AntiSamy Project, which takes an HTML fragment and whitelists what values can be part of it. This is the same approach that sites like SO and slashdot.org use to protect submissions and is a more secure approach to accepting markup content.
Sanitation of strings in coldfusion and in quite any language is very important and depends on what you want to do with the string. most mitigations are for
saving content to database (e.g. <cfqueryparam ...>)
using content to show on next page (e.g. put url-parameter in link or show url-parameter in text)
saving files and using upload filenames and content
There is always a risk if you follow the idea to prevent and reduce a string by allow basically everything in the first step and then sanitize malicious code "away" by deleting or replacing characters (blacklist approach).
The better solution is to replace strings with rereplace(...) agains regular expressions that explicitly allow only the characters needed for the scenario you use it as an easy solution, whenever this is possible. use cases are inputs for numbers, lists, email-addresses, urls, names, zip, cities, etc.
For example if you want to ask for a email-address, you could use
<cfif reFindNoCase("^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.(?:[A-Z]{5})$", stringtosanitize)>...ok, clean...<cfelse>...not ok...</cfif>
(or an own regex).
For HTML-Imput or CSS-Imput I would also recommend OWASP Java HTML Sanitizer Project.