Sitecore Forward Search generates wrong URLs

Sitecore Forward Search generates wrong URLs - sitecore

I'm using Sitecore Forward Search. Currently I've added a replace into web.config file to replace spaces in item names from one character to another. And now, in item URLs, spaces are replaced by one character and forward search still generates links with another character. How to fix this problem? I've tried to re-index Search index, but it still the same. I appreciate any help you can provide.

First, I would recommend going back to a standard hyphen rather than an em dash. Em dash is a non-standard character. Very few users would even realize the difference between that and a hyphen and even if they did, they wouldn't know how to type it.
Disclaimer aside, if your links are being generated properly when you browse the site, your search engine should be picking them up. Did you just run the indexer or did you do a "Clean re-index"? If it is still not getting the correct URLs after a clean re-index, I would double (or triple) check your configs and then contact Forward Search support.

You might want to try this module I created: SEO-friendly URL module
It implements a custom LinkProvider and ItemResolver that replaces special characters in your URL, one of them being spaces replaced for hyphens (-).
The thing you're doing now is simply replacing spaces for hyphens, but Sitecore is then not able to resolve the items anymore.

Ben is right, a "clean reindex" might do the trick, but it depends on version.
Manual approach:
Delete Index folder(location might depend on configuration)
Delete Storage subfolder(location defaultly as subfolder to ForwardData)
Trigger a full update(Trigger tool or admin client)
At this point Forward Search should index the site in the context of a normal visitor and links should be resolved as such.
Sometimes there are certain navigation elements as breadcrumbs which by mistake generate other/duplicate links; like links with "Sitecore/content" as prefix or a language specification.
This can be circumvented by using canonical or adjusting the exclude pattern.
Best regards
Thomas Jensen, Part of the Forward Search team

Related

Regex for multiple URLs without clear pattern

I'm quite new to using regex so I hope there's someone who can help me out. I want to set up an event on Google Tag Manager through RegEx that fires whenever someone views a page. I'm trying to do this using the Page URL as a parameter so that the event hits, when that URL is visited. Its for around 1400 urls that are in the same sub-folder but have a different page name. For example: https://www.example.com/products/product-name-1, https://www.example.com/products/product-name-2
What would be the best way to group these into one RegEx formula?
I've tried to separate all urls by using the '|' sign without any result. I've also tried this format, without any luck: (^/page-url-1/$|^/page-url-1/$|^/page-url-1/$|^/page-url-1/$)

A couple things are happening with your attempt. First, you aren't escaping the '/'. This is a reserved or special character and you will need to precede it with a \ to tell the engine that you want that specific character. It would look like this:
\/products\/page-url-1
I am assuming you are using a {{Page Path}} so the above would match for any paths that contain /products/page-url-1.
If you want the event to fire on all pages within the /products directory, there is an easier way of doing this.
\/products\/.*
what this will do is match any pages within your /products directory. If you have a landing page on /products, this will be omitted from the firing. The '.' means it will then match any character after the / and '*' means it can do this unlimited times.
EDIT:
Since you aren't looking for all the products pages, you can you a matching group and list them all. I suspect that all the product names will be different enough and not share any common path elements so you will have to list out the ones want.
\/products\/(product-url-1|product-url-2|product-url-3).*

How to use regex to insert dynamic urls in the Analytics conversion funnel?

A have a client who has an ecommerce website working on Wix. She asked me to setup the conversion funnel so she could identify when visitors leave a step, I mean, those who don't get to the order placed page. On Wix, we have 3 steps/urls, as below:
Cart: https://www.easyhomedesign.com.br/cart?appSectionParams=%7B%22origin%22%3A%22cart-popup%22%7D
Checkout: https://www.easyhomedesign.com.br/checkout?appSectionParams=%7B%22a11y%22%3Afalse%2C%22cartId%22%3A%2283476f86-4ac9-44ac-8779-4479dde12cc2%22%2C%22storeUrl%22%3A%22https%3A%2F%2Fwww.easyhomedesign.com.br%2F%22%2C%22isFastFlow%22%3Afalse%2C%22isPickupFlow%22%3Afalse%7D
and the Thank you page: https://www.easyhomedesign.com.br/thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17
After each of those urls, we have dynamic string values, so I need to put into the funnel step, on the url field, those same urls but using a regex that matches to the config "starts with", since we can't know what the values on the end of the urls are and, on the funnel setup section, we don't have that combobox "Starts with". At least, that's the only solution I could think about.
Then, my idea is use something like https://www.easyhomedesign.com.br/thank-you-page/$. I don't know regex, that's only an example of what I thought about, since that part of the url is the fixed one.
Could someone help me? tks.

Although the final goal setup highly depends on your Analyitcs implementation, it is basically possible to cover such funnel with RegEx in goal funnels.
Google Analytics tracks page visits with path, buy default. So https://www.easyhomedesign.com.br/thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17 will become /thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17 in your reports. It can be set up to contain the domain, and the goal flow can be created as well, but it's important to check, how the RegEx should be created.
It is also important, if the above mentioned query parts are affecting the step, whether they are part of the flow or not.
Assuming that the path part is relevant, you can set up something like this.
Cart URL in reports: /cart?appSectionParams=%7B%22origin%22%3A%22cart-popup%22%7D
Cart step RegEx in goal flow: ^\/cart
^ stands for beginning of the string, but you might need to adjust, if host is present in your reports. You can also extend it to ^\/cart\?, if you expect any parameters to be present, to qualify for cart visit.
Checkout URL in reports: /checkout?appSectionParams=%7B%22a11y%22%3Afalse%2C%22cartId%22%3A%2283476f86-4ac9-44ac-8779-4479dde12cc2%22%2C%22storeUrl%22%3A%22https%3A%2F%2Fwww.easyhomedesign.com.br%2F%22%2C%22isFastFlow%22%3Afalse%2C%22isPickupFlow%22%3Afalse%7D
Checkout Cart step RegEx in goal flow: ^\/checkout
The same applies for checkout step for beginning of the string, or for any parameters required.
Thank you URL in reports: /thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17
Thank you RegEx, which is essentially the goal's RegEx: ^\/thank-you-page\/[\w-]+
Here, the [\w-]+ part expects alphanumerical characters, underscore, or hyphen to be present. More precisely, one or more must exist there.
The $ sign, mentioned in the OP, could not be used here, as it indicates the end of the string, and therefore the id at the end of the URL would make it non-matching.

Chrome dev tools: any way to exclude requests whose URL matches a regex?

Unfortunately in the last versions of Chrome the negative network filter doesn't work anymore. I used this filter in order to exclude each http call containing a particular string. I asked a solution in Chrome dev tool forum but at the moment nobody answered.
So I would like to know if there is a way to resolve this problem (and exclude for example each call containing the string 'loadMess') with regex syntax.

Update (2018):
This is an update to my old answer to clarify that both bugs have been fixed for some time now.
Negate or exclude filtering is working as expected now. That means you can filter request paths with my.com/path (show requests matching this), or -my.com/path (show requests not matching this).
The regex solution also works after my PR fix made it in production. That means you can also filter with /my.com.path/ and /^((?!my.com/path).)*$/, which will achieve the same result.
I have left the old answer here for reference, and it also explains the negative lookup solution.
The pre-defined negative filters do work, but it doesn't currently allow you to do NOT filters on the names in Chrome stable, only CONTAINS. This is a bug that has been fixed in Chrome Canary.
Once the change has been pushed to Chrome stable, you should be able to do loadMess to filter only for that name, and -loadMess to filter out that name and leave the rest, as it was previously.
Workaround: Regex for matching a string not containing a string
^((?!YOUR_STRING).)*$
Example:
^((?!loadMess).)*$
Explanation:
^ - Start of string
(?!loadMess) - Negative lookahead (at this cursor, do not match the next bit, without capturing)
. - Match any character (except line breaks)
()* - 0 or more of the preceeding group
$ - End of string
Update (2016):
I discovered that there is actually a bug with how DevTools deals with Regex in the Network panel. This means the workaround above doesn't work, despite it being valid.
The Network panel filters on Name and Path (as discovered from the source code), but it does two tests that are OR'ed. In the case above, if you have loadMess in the Name, but not in the Path (e.g. not the domain or directory), it's going to match on either. To clarify, true || false === true, which means it will only filter out loadMess if it's found in both the Name and Path.
I have created an issue in Chromium and have subsequently pushed a fix to be reviewed. This has subsequently been merged.

This is answered here - for latest Chrome 58.0.3029.110 (Official Build) (64-bit)
https://stackoverflow.com/a/27770139/4772631
E.g.: If I want to exclude all gifs then just type -gif

Negative lookahead is recommended everywhere, but it does not work.
Instead, "-myregex" does work for me. Like this: -/(Violation|HMR)/.

Chrome broswer dev tools support regrex filter not very well.
When I want to hide some requests, it does not work as showed above. But you can use -hide1 -hide2 to hide the request you want.
Just leave a space between the conditions, and this does not match the regrex, I guess it may use string match other than regrex in principle

Filtering multiple different urls
You can negate symbol for filtering the network call.
Eg: -lab.com would filter lab.com urls.
But for filtering multiple urls you can use the | symbol in the regex
Eg: -/lab.com|mini.com/ This will filter lab.com and mini.com as well you can use it to filter many different websites or urls.

You can use "Invert" option to exclude the APIs matching a string in the Filter text box.

On latest chrome version (62) you have to use :
-mime-type:image/gif

Regex for URL routing - match alphanumeric and dashes except words in this list

I'm using CodeIgniter to write an app where a user will be allowed to register an account and is assigned a URL (URL slug) of their choosing (ex. domain.com/user-name). CodeIgniter has a URL routing feature that allows the utilization of regular expressions (link).
User's are only allowed to register URL's that contain alphanumeric characters, dashes (-), and under scores (_). This is the regex I'm using to verify the validity of the URL slug: ^[A-Za-z0-9][A-Za-z0-9_-]{2,254}$
I am using the url routing feature to route a few url's to features on my site (ex. /home -> /pages/index, /activity -> /user/activity) so those particular URL's obviously cannot be registered by a user.
I'm largely inexperienced with regular expressions but have attempted to write an expression that would match any URL slugs with alphanumerics/dash/underscore except if they are any of the following:
default_controller
404_override
home
activity
Here is the code I'm using to try to match the words with that specific criteria:
$route['(?!default_controller|404_override|home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254}'] = 'view/slug/$1';
but it isn't routing properly. Can someone help? (side question: is it necessary to have ^ or $ in the regex when trying to match with URL's?)

Alright, let's pick this apart.
Ignore CodeIgniter's reserved routes.
The default_controller and 404_override portions of your route are unnecessary. Routes are compared to the requested URI to see if there's a match. It is highly unlikely that those two items will ever be in your URI, since they are special reserved routes for CodeIgniter. So let's forget about them.
$route['(?!home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254}'] = 'view/slug/$1';
Capture everything!
With regular expressions, a group is created using parentheses (). This group can then be retrieved with a back reference - in our case, the $1, $2, etc. located in the second part of the route. You only had a group around the first set of items you were trying to exclude, so it would not properly capture the entire wild card. You found this out yourself already, and added a group around the entire item (good!).
$route['((?!home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254})'] = 'view/slug/$1';
Look-ahead?!
On that subject, the first group around home|activity is not actually a traditional group, due to the use of ?! at the beginning. This is called a negative look-ahead, and it's a complicated regular expression feature. And it's being used incorrectly:
Negative lookahead is indispensable if you want to match something not followed by something else.
There's a LOT more I could go into with this, but basically we don't really want or need it in the first place, so I'll let you explore if you'd like.
In order to make your life easier, I'd suggest separating the home, activity, and other existing controllers in the routes. CodeIgniter will look through the list of routes from top to bottom, and once something matches, it stops checking. So if you specify your existing controllers before the wild card, they will match, and your wild card regular expression can be greatly simplified.
$route['home'] = 'pages';
$route['activity'] = 'user/activity';
$route['([A-Za-z0-9][A-Za-z0-9_-]{2,254})'] = 'view/slug/$1';
Remember to list your routes in order from most specific to least. Wild card matches are less specific than exact matches (like home and activity), so they should come after (below).
Now, that's all the complicated stuff. A little more FYI.
Remember that dashes - have a special meaning when in between [] brackets. You should escape them if you want to match a literal dash.
$route['([A-Za-z0-9][A-Za-z0-9_\-]{2,254})'] = 'view/slug/$1';
Note that your character repetition min/max {2,254} only applies to the second set of characters, so your user names must be 3 characters at minimum, and 255 at maximum. Just an FYI if you didn't realize that already.
I saw your own answer to this problem, and it's just ugly. Sorry. The ^ and $ symbols are used improperly throughout the lookahead (which still shouldn't be there in the first place). It may "work" for a few use cases that you're testing it with, but it will just give you problems and headaches in the future.
Hopefully now you know more about regular expressions and how they're matched in the routing process.
And to answer your question, no, you should not use ^ and $ at the beginning and end of your regex -- CodeIgniter will add that for you.
Use the 404, Luke...
At this point your routes are improved and should be functional. I will throw it out there, though, that you might want to consider using the controller/method defined as the 404_override to handle your wild cards. The main benefit of this is that you don't need ANY routes to direct a wild card, or to prevent your wild card from goofing up existing controllers. You only need:
$route['404_override'] = 'view/slug';
Then, your View::slug() method would check the URI, and see if it's a valid pattern, then check if it exists as a user (same as your slug method does now, no doubt). If it does, then you're good to go. If it doesn't, then you throw a 404 error.
It may not seem that graceful, but it works great. Give it a shot if it sounds better for you.

I'm not familiar with codeIgniter specifically, but most frameworks routing operate based on precedence. In other words, the default controller, 404, etc routes should be defined first. Then you can simplify your regex to only match the slugs.

Ok answering my own question
I've seem to come up with a different expression that works:
$route['(^(?!default_controller$|404_override$|home$|activity$)[A-Za-z0-9][A-Za-z0-9_-]{2,254}$)'] = 'view/slug/$1';
I added parenthesis around the whole expression (I think that's what CodeIgniter matches with $1 on the right) and added a start of line identifier: ^ and a bunch of end of line identifiers: $
Hope this helps someone who may run into this problem later.

Writing Regular Expression for URL in Google Analytics

I have a huge list of URL's, in the format:
http://www.example.com/dest/uk/bath/
http://www.example.com/dest/aus/sydney/
http://www.example.com/dest/aus/
http://www.example.com/dest/uk/
http://www.example.com/dest/nor/
What RegEx could I use to get the last three URL's, but miss the first two, so that every URL without a city attached is given, but the ones with cities are denied?
Note: I am using Google Analytics, so I need to use RegEx's to monitor my URL's with their advanced feature. As of right now Google is rejecting each regular expression.

Generally, the best suggestion I can make for parsing URL's with a Regex is don't.
Your time is much much better spent finding a libary that exists for your language dedicated to the task of processing URLs.
It will have worked out all the edge cases, be fully RFC compliant, be bug free, secure, and have a great user interface so you can just suck out the bits you really want.
In your case, the suggested way to process it would be, using your URL library, extract the element s and then work explicitly on them.
That way, at most you'll have to deal with the path on its own, and not have to worry so much wether its
http://site.com/
https://site.com/
http://site.com:80/
http://www.site.com/
Unless you really want to.
For the "Path" you might even wish to use a splitter ( or a dedicated path parser ) to tokenise the path into elements first just to be sure.

tj111's current solution doesn't work - it matches all your urls.
Here's one that works (and I checked with your values). It also matches, no matter if there is a trailing slash or not:
http:\/\/.*dest\/\w+/?$

/http:\/\/www\.site\.com\/dest\/\w+\/?$/i
matches if they're all the same site with the "dest" there. you could also do this:
/\w+:\/\/[^/]+\/dest\/\w+\/?$/i
which will match any site with any protocal (http,ftp) and any site with the /dest/country at the end, and an optional /
Note, that this will only work with a subset of what the urls could legitimately be.

Try this regular expression:
^http://www\.example\.com/dest/[^/]+/$
This would only match the last three URLs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sitecore Forward Search generates wrong URLs - sitecore

Related

Regex for multiple URLs without clear pattern

How to use regex to insert dynamic urls in the Analytics conversion funnel?

Chrome dev tools: any way to exclude requests whose URL matches a regex?

Regex for URL routing - match alphanumeric and dashes except words in this list

Writing Regular Expression for URL in Google Analytics

Categories

Resources