How can I find Drupal pages not listed in XML Sitemap module? - drupal-8

I recently inherited a Drupal 8/9 site. I've got the XML Sitemap module deployed. Is there a way for me to find pages that are NOT included in the sitemap? I'm concerned there's a lot of ROT that I can't see.
Thanks!

I can't think of a way to find pages that are not listed in the sitemap.xml.
Maybe, what you can do is just enable sitemap config on every content type, vocabulary, etc. Then you'd queue and generate sitemap again.

Related

Is it possible to list sitemaps for different domains in the same robots.txt file?

We have multiple websites served from the same Sitecore instance and same production web server. Each website has its own primary and Google-news sitemap, and up to now we have included a sitemap specification for each in the .NET site's single robots.txt file.
Our SEO expert has raised the presence of different domains in the same robots.txt as a possible issue, and I can't find any documentation definitely stating one way or the other. Thank you.
This should be OK for Google at least. It may not work for other search engines such as Bing, however.
According to https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt:
sitemap: [absoluteURL]
[absoluteURL] points to a Sitemap, Sitemap Index file or equivalent URL. The URL does not have to be on the same host as the robots.txt file. Multiple sitemap entries may exist. As non-group-member records, these are not tied to any specific user-agents and may be followed by all crawlers, provided it is not disallowed.
The best way to achieve this is to Handle the Robots.txt from Sitecore Content Tree.
We also have similar structure where we are delivering multiple websites from Single sitecore instance.
I have written a blog for such please find it below. It is exactly what you want.
http://darjimaulik.wordpress.com/2013/03/06/how-to-create-handler-in-sitecore/

App with multiple categories in Django CMS

we are trying to migrate a project written with Drupal to Django CMS and we faced a problem with article module. Our site is divided in sections and we have a news module installed in every section with a category, url structure is looking like this:
/section1
/news-category1
/section2
/news-category2
/etc..
This is the same news module, just split in categories (some news articles can pop up in multiple sections, in this case one section is chosen as base to form unique article URL). The only one method I found makes this structure:
/news
/caregory1
/category2
/etc...
Which is not good for us as we would prefer to keep the current URL structure for SEO purposes. Is there a correct way to implement this in Django CMS beside creating each section as a module and plugging in in to a page? Or can I some-how install the same module to multiple pages and pass the section information to it?
One way to do this I found myself would be to plug-in the same module to every page it will be on and then have it to parse the path of the page to figure out it's category. Not super-officiant but might work. Not sure if there is any other way.

How to build a sitemap using Sitecore

I need to create a Sitemap for a Sitecore website. How can I do this?
If you're after a sitemap page to list the pages on your site you should try the Shared Source module 'Sitemap'.
http://marketplace.sitecore.net/en/Modules/Sitemap.aspx
However, if you're after a sitemap for search engine optimization, use Sitemap XML.
http://marketplace.sitecore.net/en/Modules/Sitemap_XML.aspx
It depends on what you need as a Sitemap. Do you want an XML sitemap that goes to Google?
Or do you want a sitemap that shows the structure of your website?
I'd suggest looking at the Sitecore marketplace (http://marketplace.sitecore.net/SearchResults#query=sitemap), possibly downloading the source code to see how it's done. I think there's an example for both.
Otherwise, you can also create your own, but we'll need some more information - do you want to write it in XSLT or using codebehind such as C#?

creating user sitemap

I have got a web site, I need to create a sitemap for users, I mean just creating a page, say sitemap.jsp and create a text link to each page?
is there any useful tool to get all the link, instead of having to put the links manually?
Thanks
Two options are A1 Sitemap Generator (commercial) or GSiteCrawler (free) - you could also search on the net for more alternatives (there are many!)

Some basic questions about Django, Pyjamas and Clean URLs

I am farily new to the topic, but I am trying to combine both Django and Pyjamas. What would be the smart way to combine the two? I am not asking about communication, but rather about the logical part.
Should I just put all the Pyjamas generated JS in the base of the domain, say http://www.mysite.com/something and setup Django on a subdirectory, or even subdomain, so all the JSON calls will go for http://something.mysite.com/something ?
As far as I understand now in such combination theres not much point to create views in Django?
Is there some solution for clean urls in Pyjamas, or that should be solved on some other level? How? Is it a standard way to pass some arguments as GET parameteres in a clean url while calling a Pyjamas generated JS?
You should take a look at the good Django With Pyjamas Howto.
I've managed to get the following to work, but it's not ideal. Full disclosure: I haven't figured out how to use the django's template system to get stuff into the pyjamas UI elements, and I have not confirmed that this setup works with django's authentication system. The only thing I've confirmed is that this gets the pyjamas-generated page to show up. Here's what I did.
Put the main .html file generated by pyjamas in django's "templates" directory and serve it from your project the way you'd serve any other template.
Put everything else in django's "static" files directory.
Make the following changes to the main .html file generated by pyjamas: in the head section find the meta element with name="pygwt:module" and change the content="..." attribute to content="/static/..." where "/static/" is the static page URL path you've configured in django; in the body section find the script element with src="bootstrap.js" and replace the attribute with src="/static/bootstrap.js".
You need to make these edits manually each time you regenerate the files with pyjamas. There appears to be no way to tell pyjamas to use a specific URL prefix when generating together its output. Oh well, pyjamas' coolness makes up for a lot.
acid, I'm not sure this is as much an answer as you would hope but I've been looking for the same answers as you have.
As far as I can see the most practical way to do it is with an Apache server serving Pyjamas output and Django being used as simply a service API for JSONrpc calls and such.
On a side note I am starting to wonder if Django is even the best option for this considering using it simply for this feature is not utilizing most of it's functionality.
The issue so far as I have found with using Django to serve Pyjamas output as Django Views/Templates is that Pyjamas loads as such
Main html page loads "bootstrap.js" and depending on the browser used bootstrap.js will load the appropriate app page. Even if you appropriately setup the static file links using the Django templating language to reference and load "bootstrap.js", I can't seem to do the same for bootstrap.js referencing each individual app page.
This leaves me sad since I do so love the "cruftless URLS" feature of Django.