beautifulsoup for getting all links from base URL in a website

beautifulsoup for getting all links from base URL in a website - python-2.7

I want to get all links/html pages from website base URL. I read documents and got to know it can be achieved using python Beautifulsoup combination. Can you please let me know how to install BeautifulSoup and any other pre-requisite steps.
Once BeautifulSoup is installed, how i can use it in python code. If some one can share python code to achieve this.

The snippet in this question might help you in retrieving all the links from a base url.

Your question provides no information of your OS or specific goal. I'm assuming you already have Python installed.
pip is command line tool to manage python packages. You can run the following command on your terminal
pip install beautifulsoup4
The beautifulsoup documentation is a good place to get started on learning more.

Related

No module named 'nltk.lm' in Google colaboratory

I'm trying to import the NLTK language modeling module (nltk.lm) in a Google colaboratory notebook without success. I've tried by installing everything from nltk, still without success.
What mistake or omission could I be making?
Thanks in advance.
.

Google Colab has nltk v3.2.5 installed, but nltk.lm (Language Modeling package) was added in v3.4.
In your Google Colab run:
!pip install -U nltk
In the output you will see it downloads a new version, and uninstalls the old one:
...
Downloading nltk-3.6.5-py3-none-any.whl (1.5 MB)
...
Successfully uninstalled nltk-3.2.5
...
You must restart the runtime in order to use newly installed versions.
Click the Restart runtime button shown in the end of the output.
Now it should work!
You can double check the nltk version using this code:
import nltk
print('The nltk version is {}.'.format(nltk.__version__))
You need v3.4 or later to use nltk.lm.

Amazon Lambda unable to import [python windows .pyd pip]

I am trying to write to my PostgreSQL database with AWS Lambda using the python2.7 runtime. I care very little about how I do this, so if anyone has a different way that I can understand that works, I'd love to hear it.
The method I'm currently trying is to use psycopg2, as this is the only way I know. In order to do this, I need to upload the psycopg2 module to my environment on AWS Lambda. As per instructions, I've created a directory with my source and psycopg2 using pip install psycopg2 -t ..\my-project, zipped my-project, and uploaded it.
My error message is this from within the AWS Lambda console: Unable to import module 'lambda_function': No module named _psycopg
The code runs on my windows machine. I think the issue is that when I import psycopg2 from my local windows machine, the _psycopg module is being imported from _psycopg.pyd, and .pyd files are windows specific. I may be wrong about this.
I'm really just looking for any way to achieve the desired result described in my first paragraph, but here's a more specific question: How do I tell windows to pip install and compile psycopg2 without using .pyd files? Is this possible? Do I have something completely wrong?
I know the formatting of this question is a little unorthodox, I think I've given all the necessary information, let me know if there's anything else I can provide.

I solved the problem by opening an ubuntu instance on VirtualBox, pip installing the package there, pulling the relevant folders out, and placing them in my-project before zipping and uploading to AWS Lambda.
See these instructions.

Installing Python 2.7 for all users on SLES 11

I installed Python 2.7 on SLES 11 box that previously was running Python 2.6. To do so I used a script described in this post and run it as a root user. Everything went well but when it was done I discovered few issues:
No symbolic links were created and no path updated so I had to manually update the path to link to the new installation bin directory /opt/python2.7/bin
Everything runs good until I switch from root to the normal user at which point Python shell runs but some modules I installed such as PyYAML are missing. Again, these are OK when I run Python as root
As a regular user I'm not able to run pip, easy_install and wheel. For pip I get ImportError: No module named pkg_resources
P.S. Following #user suggestion I tried adding the following path taken from sys.path of the root user to .bashrc which did not fix the problem
export PYTHONPATH=$PYTHONPATH:/opt/python2.7/lib/python27.zip:/opt/python2.7/lib/python2.7:/opt/python2.7/lib/python2.7/plat-linux2:/opt/python2.7/lib/python2.7/lib-tk:/opt/python2.7/lib/python2.7/lib-old:/opt/python2.7/lib/python2.7/lib-dynload:/opt/python2.7/lib/python2.7/site-packages:/opt/python2.7/lib/python2.7/site-packages/PyYAML-3.11-py2.7-linux-x86_64.egg:/opt/python2.7/lib/python2.7/site-packages/pexpect-4.2.0-py2.7.egg:/opt/python2.7/lib/python2.7/site-packages/ptyprocess-0.5.1-py2.7.egg

Credible / official sources: no reply from official forum. Apart from the SO-link you mentioned, there is also https://unix.stackexchange.com/questions/7644/how-to-do-a-binary-install-of-python-2-7-on-suse-linux-enterprise-server-11, which sketches the way to do it described in Installing Python 2.7 on SLES 11 (SO is not official, is it? ;-)
Concerning your problem: both 2. and 3. might be caused by elements lacking in sys.path.
To test this, type
import sys; sys.path
both in user and root python and check for differences. These need to be merged. Try using PYTHONPATH first to test this, but be aware that there are different methods how to adjust sys.path.
If you just need to fix this for normal (non-daemon) users, adjusting the system-wide bash profile would be an easy solution.
(Any questions/feedback is welcome... :-)

NLTK Wordnet Download Out of Date

New to Python, tying to get started with NLTK. After a rough time installing Python on my Windows 7 64-bit system, I am now having a rough time downloading Wordnet and other NLTK data packages located here:
http://nltk.org/nltk_data/
Some packages download, some say "Out of Date"
import nltk
nltk.download()
When I use the above to download, the program doesn't let me cancel if I hit the cancel button.
So, I just shut it down and go directly to the link above to try and download it manually. When I try to download Wordnet for example, the download starts in my browser but stops mid-way through download!
This is very frustrating for me as a beginner. Is there an alternative way to download Wordnet for nltk?

I was facing the same issue. The issue in my case was that when the NLTK downloader started it had the server index as
http://nltk.github.com/nltk_data/
This needs to be changed to
http://nltk.org/nltk_data/
You can change this by going into the NLTK Downloader window and the File->Change Server Index.
Regards,
Bonson

Use of Mechanize Library with Python 2.7 and Django

How do I use the mechanize library with Django?
I read online that I could put it in a directory (e.g. /lib/) and include as needed.
The problem is, the the source I had found didn't show how to use it from configuration to initial use. Unfortunately, I also looked high and low elsewhere on google with nothing to find. I also checked a book I have on django without any info..
Can anyone help me out?
I'm on a local install of django with python 2.7.
Thank you

As per comments, the answer is:
pip install mechanize
then just open a python interpreter and import mechanize to confirm.
That should be it, you can start using mechanize in your Django project.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

beautifulsoup for getting all links from base URL in a website - python-2.7

The snippet in this question might help you in retrieving all the links from a base url.

Related

No module named 'nltk.lm' in Google colaboratory

Amazon Lambda unable to import [python windows .pyd pip]

Installing Python 2.7 for all users on SLES 11

NLTK Wordnet Download Out of Date

Use of Mechanize Library with Python 2.7 and Django

Categories

Resources