Rewrite href links in strings - regex

I have a bespoke CMS for a website that stores any uploaded files in the /Assets/ folder.
I'm preparing to move the website to the Azure platform and need some way of rewriting the links within the web pages. Here is what a current link looks like:
link to file
What do you suggest would be the best way to change those links to something like:
link to file
There are 100's of pages with tons of links. Also, to throw a spanner in the works, not all links are in sub folders within the assets folder.
Some links are like:
link to file
Suggestions are welcome, I'm open to anything, regex, htmlagilitypack or plain old string.Replace but I can't seem to get my head around how to do it...

You should take a look at the IIS Url Rewriting module which is installed on Azure web roles.
http://www.iis.net/learn/extensions/url-rewrite-module/using-the-url-rewrite-module
It basically allows you to define patterns using regex and ouput URLs however you like. In your case it would be quite simple to rewrite your embedded links to their locations in blob storage.

It's ok, I have sorted it, thanks for your comments. If anyone is interested I did a search & replace SQL query on the database.

Related

Embedding a functional website inside a Squarespace webpage

First of all, thank you for everything that you do. Without this community, I would hate web design and be reliant on my teacher's outdated, static methods. Much love <3
So, this is a tricky one (maybe).
I want to have, essentially, an iframe on a webpage that contains a website I coded previously. It was a project for school that never went live, but I'd like to include it as part of my portfolio. Problem is, an iframe needs a URL for a source, but I just have the folder with more folders full of code, fonts, and images. How can I tell the browser to populate this box with everything from "name" folder? And then how will it know to run the code instead of just showing a file tree or something?
In the end, I want a page describing a previous web project and let the client experience that project within the one page. And I don't want to get a domain for every project I do.
Maybe there's an easier way I'm not thinking of?
To make it interesting, my new portfolio site is being made in Squarespace...maybe. I bought a domain from them because I had a promo code and wanted to try the platform, but I kind of hate it. I can't change any of the code and it won't maintain a connection to Typekit. So all I can do is change the basic appearance of preexisting elements. It's like WordPress all over again....LAME! Sadly, I already bought the domain.
Can Squarespace just be a host? Is there a way to download the raw code of these templates, edit it, and upload it again?
Thanks for all your help!
I want to have, essentially, an iframe on a webpage that contains a
website I coded previously.
Squarespace's file upload mechanism is very limited. Without using the Developers Platform, there is no effective way to upload many files at once. Furthermore, there is no way to create folders. Therefore, even if you were willing to upload each .html file and each asset one-by-one, there'd be no way to organize the files into folders (assuming that the "tree" you mentioned includes additional sub-folders).
Initially, in order to get the files to be accessible by Squarespace, you'd have to do one of the following:
Use Squarespace Developers Platform (A.K.A. "Developer Mode") and upload your to-be-iframed
(TBI) website files to the "assets" folder using SFTP or Git.
Host your TBI website files somewhere else (a different host
environment, for example) which will maintain your file/folder
structure.
How can I tell the browser to populate this box with everything from
"name" folder? And then how will it know to run the code instead of
just showing a file tree or something?
Assuming that the TBI website has an index.html file or home.html file or similar, and assuming you were to use the Squarespace Developer Platform, you'd insert the iframe either in a Code Block or within a template/.region file directly using something like
<iframe src="/assets/tbiwebsitefolder/index.html"></iframe>
while setting your other iframe attributes (such as height and width) as needed.
Is there a way to download the raw code of these templates, edit it,
and upload it again?
Yes. You select a template and then enable Developer Mode on that template. From there, you use SFTP or Git to download the template files, edit, and reupload.
You may benefit by reviewing some considerations of enabling Developer Mode on a Squarespace Template.
One other idea, to avoid the iframe and Developer Mode entirely, would be to capture images of the TBI website rendered in a browser, and then simply add those images to a gallery block or gallery page. This could allow you to convey the general idea of the project but would of course not capture the full "experience" of it.

Joomla Search issue ".com/index.php/component/users/?view=reset"

I am using Joomla for my website when I search in google for SAPBuddy I always get search result.
Can some one help me I tried to add my side in Google webmaster, but the result is same.
sapbuddy.com/index.php/component/users/?view=reset
Check you've followed these steps.
Create a sitemap. You can use an online tool for this, or a Joomla extension
If you used an online tool to create your sitemap, upload the site the xml file to your server. If you're using an extension, follow their direction. When finished, you should be able to open it with your web browser, e.g. www.domain.com/sitemap.xml . Check if the indexed pages looke good and copy this URL
In webmaster tooks, add your domain then register this sitemap, pasting in your URL
After a short period, check back on Webmaster tools. It will show which pages have been indexed and if there were any errors.
Good luck!

Using fossil embedded documents

I'm using fossil to manage some home projects and keeping notes in the wiki. After running like this for a few months, I'd like to at least try to use embedded documentation; mainly so as to be able easily to go back to previous versions.
I've studied the website page about managing project documentation which confirms that this is a technique I want to follow up, but I can't make out how to do it.
I've cut-and-pasted one of my wiki pages and added it to my fossil repo, but I can't work out where it should go in the directory structure to be accessible as described on the above page.
I've tried in a few places none of which worked. The document is currently %fossil-root%\doc\foo.wiki, (I'm on Windows), where %fossil-root% is the directory holding _ _FOSSIL__ (slighly mangled filename because of markdown), but having started a server with fossil ui, when I point my browser at http://localhost:8080/doc/foo.wiki, fossil presents me with a nicely formatted page saying it can't find index.html. I created /doc/index.html to see what would happen, but it made no difference.
Please can someone help me out, and/or point me to an example repository containing embedded documentation or another "how-to" document.
If your document is located in %fossil-root%\doc\foo.wiki, you can access it at the following URL:
http://localhost:8080/doc/trunk/doc/foo.wiki
This URL breaks down as follows:
http://localhost:8080 is the root URL to access Fossil when you run fossil ui
/doc signals that you want to access embedded documentation
/trunk indicates the checkin containing the documentation you wish to access
/doc/foo.wiki is the path of the document inside the repository
Instead of trunk, you can also specify a tag, or a branch name, or even a hexadecimal checkin identifier.
In the URL you were using, http://localhost:8080/doc/foo.wiki, foo.wiki is interpreted as the checkin name, and no document path is specified, which logically means Fossil won't find anything.
As for an example repository containing embedded documentation, the homepage of the Fossil website itself is a prime example:
https://www.fossil-scm.org/index.html/doc/trunk/www/index.wiki
where
https://www.fossil-scm.org/index.html is Fossil's root URL
/doc indicates a request for embedded documentation
/trunk indicates we want to fetch files from the trunk
/www/ is the path to the requested file inside the repository
index.wiki is the name of the file inside the repository.
So, in the 'trunk' branch of the repository, the file www/index.wiki contains the home page of the Fossil website.
You simply need to put the documentation under the %fossil-root%\www\ directory (or any other directory under version control) in your repository and then you can, for example, add the following line to your header's mainmenu section to link to it:
html "<a href='$home/doc/trunk/www/foo.wiki'>Documentation</a>\n"
As I said, it can be any directory under version control. To test this, pick any file in the repository, let's say a README file at the top level, and go to http://localhost:8080/doc/trunk/README. You should see the README file load up in your browser in a raw text format. By putting wiki or html files under a particular directory such as www you make it easy to organize the files that you specifically want rendered as documentation, which makes it easier to link to them.
http://www.fossil-scm.org/index.html/doc/trunk/www/embeddeddoc.wiki
After fossil 1.33, just prepare your document in the repository.
If the wiki file is put in
/doc/index.wiki
And use web browser to setup -> Admin -> Configuration.
There is a "Index Page" field, fill in your main index.html.
For example:
/doc/trunk/doc/index.wiki
Or if you just want the released version:
/doc/<version>/doc/index.wiki

How to crawl a website to get all the cookies which are set and the pages set them?

I am looking for a program which can crawl through a site and get a list of pages which set cookies. Normal site crawls do not parse JavaScript, so they won't pick up cookies set in this way.
It seems that you don't find sutable tool. The best way is to write own script on GreasyMonkey (or another script engine plugin for browser) that search mask "*cookie*" and analyze results manually. Also you can write auto downloading and parsing external JS files for HTML page.

Data mining to gather a website's details and put in CSV or SQL

I don't know if it's called data mining or something else.
Let's say I have a world business listing site, that list all the shops. And I saw this website ABC that also list shops, but only in Ausralia. They are in page by page, with no ID.
How do I start to write a program, that will crawl their pages, and put in the selective information of a page in the format of CSV, which I can then import it to my website?
At least, where can I learn this? Thank you.
What you are attempting to do is known as "Web Scraping", here's a good starting point for information, including the legal issues
http://en.wikipedia.org/wiki/Web_scraping
One common framework for writing crawlers like this is Scrapy- http://scrapy.org/
Yes, this process called Web Scraping. If you are familiar with java, most useful tools here is HTMLUnit and WEbDriver. You should use headless browser to go though you pages and extract important information using selector(mostly xpath, regexp in html format)