Website downloader library - offline-browsing

I need to put a little project together for myself, and I need some functionality to download a page for offline viewing. Is there a library that will download a given page and its embedded images, and edit the img tags to reflect the local locations of the images.
I know there are a lot of website downloaders out there, but I cant find something that i can use directly in my code.
I have some basic scripts done in python, so Python is very welcome. but pretty much any language will do.

Yes, BeautifulSoup + python urllib module

You're looking for BeautifulSoup.

How about python web crawler?
http://code.google.com/p/pywebcrawler/
OR, Anemone (ruby)?
http://anemone.rubyforge.org/

simplest solution I can think of.
wget -p example.com

Related

How to use a Swagger UI plugin?

I am working on a project that, among other things, automatically generates Swagger APIs for Python projects. One thing that I have noticed is that the curl text exposes passwords if the API requires those. Since there's no way to mask the passwords as of now (https://github.com/swagger-api/swagger-ui/issues/5025), it seems like the easiest thing to do is to simply disable the curl text so that I can screenshare my Swagger API without exposing my password.
In another issue (https://github.com/swagger-api/swagger-ui/issues/5020), a plugin is shown that can disable the curl text.
However, I'm totally stumped on how to actually import and use this plugin. There's lots of documentation about how to write plugins, and none on how to import them. I can see that I can load plugins using the plugins option in https://swagger.io/docs/open-source-tools/swagger-ui/usage/configuration/, but I don't know how where to put the code.
As Helen alludes to, the answer to your question depends on your setup. However, I would hazard to guess that you will need to configure the SwaggerUI object by running an "unbundled" version of the app. You might think of this as creating a custom entrypoint to a docker container, say.
For example, the link you provide shows suggestions for running a customised version of SwaggerUI. Those customisations are written in JavaScript, so any old HTML page with the necessary dependencies in place which loads the script you write to configure SwaggerUI would answer the question of "where to put that code".
The details would depend on any frameworks you are or are not using.

Download all the files in different formats using Python from a website

How to download all the datasets in .csv, .xlsx, .json format from a website using python. I need to download thousands of files, after finding them, to my computer. Could you please help me to automate the process. The data is by city so it can be generalized for further uses.
How about Using MECHANIZE?
I would recommend you to use selenium webdriver, it will be the best way for web scraping without a limit. But you need to be more specific what you ask. If there are links to download only in one url, it is easy. But if all download links are in different url's, it will take a little time, I guess.
import selenium
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(website)
browser.find_element_by_name("file_name_to_download").click()
That's it. Of course, you need to create a simple loop to click all download links one by one.

script for deploying a web app

I was wondering why there are no ftp clients that have the following options usefully for deploying:
minimize .js, .css and .html files
gzip .js, .css and .html
I tried with transmit for mac os x, as well as cyberduck, but non have this functionality. Even springloops don't offer this (see https://twitter.com/springloops/status/469396427660345344)
So my idea was to make a shell script to:
minimize
gzip
transfer via sftp to the server
But I can't imagine there isn't already someone that made this. My problem is: I don't know exactly how this would be called , so it's difficult to search for it.
Does anyone knows such a script? Or why is this functionality not so common?
After a lot of research and trying things out, I found Grunt.js http://gruntjs.com/getting-started
It looks very promising. There are plugins for minimizing, gzip, and sftp (https://www.npmjs.org/package/grunt-sftp-deploy)
The examples of Grunt are very clear to follow, and easy to understand and to adapt.

django pdf image generation

I am migrating a site from php to Django (a framework I am still learning). In the php version I was using ImageMagick to pull the first page (the cover) of a pdf file and display it as an img. I have done a little searching and not turned up anything similar with Django. Does anyone have any suggestions as to how this could be accomplished?
Any help much appreciated.
ImageMagick has Python bindings I think. http://www.imagemagick.org/download/python/

Offline wiki-like authoring tool

Does anyone know if there is a help authoring tool out there that can produce help documentation for a software product that looks like a wiki? We are currently using the Confluence wiki engine, which is absolutely brilliant and we were wondering if there is anything like that but without the need for an Apache server. Something stand-alone that can give our users the help documentation they need. We have used help authoring tools and they all seem so clunky compared to a wiki.
Use Wiki on a Stick.
Its a single .html file written in Javascript/html and saves the changes onto itself.
You don't even need Apache. Awesome tool!
How about Juli? It generates static HTML so you can browse documents by browser only.
It is used for:
Juli documentation itself.
Edgar project documentation (another my OSS project).
My personal wiki/blog. I'll show later since new users can only post two links(stackoverflow limitation)