I'm trying to make my site more SEO friendly and I'm noticing that whenever I go to a product through either a tag or a different page (2,3,4 etc) that it adds it to the URL.
For example:
www.domain.com/Guardian-Survival-kit/culinary-can-of-preparedness-seeds.html?page=2
I would like to remove ?page=2 from the path
Opencart 1.5.4
Any help would be greatly appreciated.
You could manually edit the code for each pre-product controller file (such as search pages etc) or you could use something like this modification that will make all of your product URL's consistent throughout your installation. As an example, I think the page you're getting those off is in the product search page, so you would open /catalog/controller/product/search.php
Find this code
'href' => $this->url->link('product/product', $url . '&product_id=' . $result['product_id'])
And change it to
'href' => $this->url->link('product/product', 'product_id=' . $result['product_id'])
Then save. Be sure to make a backup for this file before attempting this change. This will remove any additional parameters to the URL and simply have the product URL
Related
I want to build a Web app with SvelteKit with one page listing all items (with potential search query parameters), and then one page for each individual item. If I had to build this the old school way with everything generated in the backend, my paths would be /items/ for a the listing of items, and /items/123 for item 123, etc. That is, to go to the page of item 123, a link will with href="123" will work no matter if you are currently at the index (/items/) or at the page of one particuler item (/items/[id]).
With SvelteKit, if I create files routes/items/index.svelte and routes/items/[id].svelte, then routes/items/index.svelte will have path /items, without a trailing slash, and as a result a link with href="123" will lead to /123, resulting in a "not found" error.
This same link will work however from the page of an individual item, say, /items/456.
This is radically different from what you would have in the traditional HTML model, where a link from /items/ (or /items/index.html) would work the same as a link from /items/[id].html.
Now in svelte.config.js there is a trailingSlash option you can set to always so that routes/items/index.svelte corresponds to path /items/, but then routes/items/[id].svelte has path /items/[id]/ and we have the same problem again: one href value cannot work from both the index and the page of an individual item.
The only way I see right now is to use absolute path, but it's not very composable. My guess is that there is something I am doing wrong.
You're not missing anything - it's not currently possible in SvelteKit to have a trailing slash for some pages but not for others. There is an open GitHub issue you may be interested in that proposes adding additional trailingSlash options. This issue cites the exact problem you described:
The trailingSlash options introduced in #1404 don't make it straightforward to add trailing slashes to index pages but not to leaf pages. For example you might want to have /blog/ and /blog/some-post rather than /blog and /blog-some-post, since that allows the index page to contain relative links.
Until that feature is added, you'll need to use absolute paths.
I am using www.slideshare.net to allow my users to display embedded slideshows on their profiles.
I'm using slideshare's api to get the slideshow's id, given the slideshow link that users has to get by clicking 'share' on the slideshow and copy/paste the url:
What I would need is to validate thoroughly the latter url.
Just to further explain my process, when I have the slideshow's id, I compute the embedded code like so :
"<iframe src='https://www.slideshare.net/slideshow/embed_code/" + json.slideshow_id + "' frameborder='0' allowfullscreen webkitallowfullscreen mozillaallowfullscreen></iframe>"
where json is the object returned by slideshare's api.
A basic regex to answer my question would be:
^http\://www\.slideshare\.net/[a-zA-Z0-9\-]+/[a-zA-Z0-9\-]+$
But it feels a little weak to me :
I don't want my users to just copy/paste the url in the navigator address bar
I'm not sure this regex works for all slideshare's slideshows as I'm not a slideshare specialist (does that even exist?)
Ideally I would like to exclude all other regular urls from www.slideshare.net that doesn't point to a slideshow.
EDIT 7/12/2014: rewrite
You can use something like this:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
More example from this website
For example, I have "Add comment" form on my django-powered website.
This form have text field with tinymce.
I want user to be able to use only p,strong,i,ul,ol,li tags. Because, result is html-code, I can't use strip_tags on my AddCommentForm.clean_text method. Also, I need to be sure, that result doesn't contain any vulnerabilities (js, iframe, etc)
I believe, that you can advice me a good solution for this))
This can be done at the TinyMCE side via some configuration parameters. While it's not 100% secure fro someone POSTing directly, it's better than nothing.
It should just be a matter of tweaking your valid_elements config in your TinyMCE setup to only allow what you want:
...
valid_elements : "p,strong/b,i/em,ul,ol,li",
...
I'm currently using Nutch 1.7 to crawl my domain. My issue is specific to URLs being indexed as www vs. non-www.
Specifically, after firing the crawl and index to Solr 4.5 then validating the results on the front-end with AJAX Solr, the search results page lists results/pages that are both 'www' and '' urls such as:
www.mywebsite.com
mywebsite.com
www.mywebsite.com/page1.html
mywebsite.com/page1.html
My understanding is that the url filtering aka regex-urlfilter.txt needs modification. Are there any regex/nutch experts that could suggest a solution?
Here is the code on pastebin.
There are at least a couple solutions.
1.) urlfilter-regex plugin
If you don't want to crawl the non-www pages at all, or else filter them at a later stage such as at index time, that is what the urlfilter-regex plugin is for. It lets you mark any URLs matching the regex patterns starting with "+" to be crawled. Anything that does not match a regex prefixed with a "+" will not be crawled. Additionally in case you want to specify a general pattern but exclude certain URLs, you can use a "-" prefix to specify URLs to subsequently exclude.
In your case you would use a rule like:
+^(https?://)?www\.
This will match anything that starts with:
https://www.
http://www.
www.
and therefore will only allow such URLs to be crawled.
Based on the fact that the URLs listed were already not being excluded given your regex-urlfilter, it means either the plugin wasn't turned on in your nutch-site.xml, or else it is not pointed at that file.
In nutch-site.xml you have to specify regex-urlfilter in the list of plugins, e.g.:
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-basic|query-(basic|site|url)|response-(json|xml)|urlnormalizer-(pass|regex|basic)</value>
</property>
Additionally check that the property specifying which file to use is not over-written in nutch-site.xml and is correct in nutch-default.xml. It should be:
<property>
<name>urlfilter.regex.file</name>
<value>regex-urlfilter.txt</value>
<description>Name of file on CLASSPATH containing regular expressions
used by urlfilter-regex (RegexURLFilter) plugin.</description>
</property>
and regex-urlfilter.txt should be in the conf directory for nutch.
There is also the option to only perform the filtering at different steps, e.g., index-time, if you only want to filter than.
2.) solrdedup command
If the URLs point to the exact same page, which I am guessing is the case here, they can be removed by running the nutch command to delete duplicates after crawling:
http://wiki.apache.org/nutch/bin/nutch%20solrdedup
This will use the digest values computed from the text of each indexed page to find any pages that were the same and delete all but one.
However you would have to modify the plugin to change which duplicate is kept if you want to specifically keep the "www" ones.
3.) Write a custom indexing filter plugin
You can write a plugin that reads the URL field of a nutch document and converts it in any way you want before indexing. This would give you more flexible than using an existing plugin like urlnormalize-regex.
It is actually very easy to make plugins and add them to Nutch, which is one of the great things about it. As a starting point you can copy and look at one of the other plugins including with nutch that implement IndexingFilter, such as the index-basic plugin.
You can also find a lot of examples:
http://wiki.apache.org/nutch/WritingPluginExample
http://sujitpal.blogspot.com/2009/07/nutch-custom-plugin-to-parse-and-add.html
I saw this guy approach into organizing PHP projects http://net.tutsplus.com/tutorials/php/organize-your-next-php-project-the-right-way/comment-page-1/#comments and i liked it, but since the head.php will be the same to all pages how can i put JS scripts only in the pages that need them?
Well, first off, putting JavaScript in the bottom of a page tends to yield the best results, but, to answer your question: if you have scripts that are only relevant to one page, you could save that script with the same name as that PHP page (obviously with a .js extension instead), and then inject the file name into the script reference. I'd also maybe add a flag so that you only look for a JS file when the flag is true:
<?php
$usesJS = true;
$filename = "somerandomname";
if($usesJS){
echo( '<script src="/js/' . $filename . '.js"></script>' );
}
?>
This would print out something like <script src="/js/somerandomname.js"></script>
Another option is to create an include file for each page and code your script tags as normal inside the include, and then reference the include. I did something like that for a simple site I did where each page had a different jQuery setup and only one needed a plugin.