Import data to dataframe using SODA API - python-2.7

I'm trying to import data from the link below using a SODA API, and load it in to a dataframe. I've never worked with a SODA API before, can anyone suggest a good module or how to do this?
https://health.data.ny.gov/Health/Medicaid-Potentially-Preventable-Emergency-Visit-P/cr7a-34ka

The code below did the trick:
Code:
import pandas as pd
from sodapy import Socrata
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("health.data.ny.gov", None)
# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
Results = client.get("cr7a-34ka", limit=2000)
# Convert to pandas DataFrame
df = pd.DataFrame.from_records(Results)

For Python, the unofficial sodapy library is a great place to start!

Related

Sentiment Analysis using NLTK and beautifulsoup

I'm working on a personal project where I'm thinking of doing sentiment analysis using NLTK and Vader to compare presidential speeches.
I was able to use beautiful soup to find one of George Washington's speeches and I managed to put the speech in a list. But after that, I'm not really sure the best way to go further. It seems that it's typical for the file to be read from a text file but I have the brackets that have the list which make it difficult. I'm not sure if I should store the web scraped speech in a file or just work at from the list. Or maybe I should put the speech into a dataframe already? I'm not too sure.
from bs4 import BeautifulSoup
import requests
import spacy
import pandas as pd
page_link = 'https://www.ourdocuments.gov/doc.php?flash=false&doc=11&page=transcript'
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
textContent = []
for i in range(0, 7):
paragraphs = page_content.find_all("p")[i].text
textContent.append(paragraphs)
toWrite = open('washington.txt', 'w')
line = textContent
toWrite.write(str(line))
toWrite.close()
Any help or pointers would be greatly appreciated.
You can seek help from this article...Do check.
https://towardsdatascience.com/basic-binary-sentiment-analysis-using-nltk-c94ba17ae386

How to Read the Barcode in image using Python 3.6 without using Zbar

I am New to Barcode Reader i found some Tutorial about Zbar and which seems to not support in ZBAR. I like to raed the Barcode in a Image and extract the Data in it.
This is What is Actually Tried.
def detect_barcode(request):
try:
import pyqrcode
import zbarlight
qr = pyqrcode.create("HORN O.K. PLEASE.")
qr.png("download.png", scale=6)
import qrtools
from qrtools.qrtools import QR
from qrtools.qrtools import Image,BOM_UTF8
qr = qrtools.QR()
qr.decode("download.png")
True
print(qr.data);
qr.data
except Exception as e:
test = str(e)
I Need to Decode the Barcode and extract the Data. I don't like to use Zbar.
If it helps. We have a web browser that reads barcodes. No changes to your page are needed.
You can embed barcode scanning in your page and handle the result with JavaScript if you need more control.
Scan to Web

How to crawl multiple domains using single crawler?

How can I crawl data from multiple domains using a single crawler. I have done crawling of single sites using beautiful soup but couldn't figure out how to create a generic one.
Well this question is flawed, sites that you want to scrape have to have something in common for instance.
from bs4 import BeautifulSoup
from urllib import request
import urllib.request
for counter in range(0,10):
# site = input("Type the name of your website") Python 3+
site = raw_input("Type the name of your website")
# Takes the website you typed and stores it in > site < variable
make_request_to_site = request.urlopen(site).read()
# Makes a request to the site that we stored in a var
soup = BeautifulSoup(make_request_to_site, "html.parser")
# We pass it through BeautifulSoup parser in this case html.parser
# Next we make a loop to find all links in the site that we stored
for link in soup.findAll('a'):
print link['href']
As mentioned, each site has their own distinct setup for selectors (, , etc). A single generic crawler won't be able to go into a url and intuitively understand what to scrape.
BeautifulSoup might not be the best choice for this type of request. Scrapy is another web crawler library that's a bit more robust that BS4.
Similar question here on stackoverflow: Scrapy approach to scraping multiple URLs
Scrapy Documentation:
https://doc.scrapy.org/en/latest/intro/tutorial.html

Read multilanguage strings from html via Python 2.7

I am new in python 2.7 and I am trying to extract some info from html files. More specifically, I wand to read some text information that contains multilanguage information. I give my script hopping to make things more clear.
import urllib2
import BeautifulSoup
url = 'http://www.bbc.co.uk/zhongwen/simp/'
page = urllib2.urlopen(url).read().decode("utf-8")
dom = BeautifulSoup.BeautifulSoup(page)
data = dom.findAll('meta', {'name' : 'keywords'})
print data[0]['content'].encode("utf-8")
the result I am taking is
BBCϊ╕φόΨΘύ╜ΣΎ╝Νϊ╕╗ώκ╡Ύ╝Νbbcchinese.com, email news, newsletter, subscription, full text
The problem is in the first string. Is there any way to print what exactly I am reading? Also is there any way to find the exact encoding of the language of each script?
PS: I would like to mention that the site selected totally randomly as it is representative to the problem I am encountering.
Thank you in advance!
You have problem with the terminal where you are outputting the result. The script works fine and if you output data to file you will get it correctly.
Example:
import urllib2
from bs4 import BeautifulSoup
url = 'http://www.bbc.co.uk/zhongwen/simp/'
page = urllib2.urlopen(url).read().decode("utf-8")
dom = BeautifulSoup(page)
data = dom.findAll('meta', {'name' : 'keywords'})
with open("test.txt", "w") as myfile:
myfile.write(data[0]['content'].encode("utf-8"))
test.txt:
BBC中文网,主页,bbcchinese.com, email news, newsletter, subscription, full text
Which OS and terminal you are using?

Python library to access a CalDAV server

I run ownCloud on my webspace for a shared calendar. Now I'm looking for a suitable python library to get read only access to the calendar. I want to put some information of the calendar on an intranet website.
I have tried http://trac.calendarserver.org/wiki/CalDAVClientLibrary but it always returns a NotImplementedError with the query command, so my guess is that the query command doesn't work well with the given library.
What library could I use instead?
I recommend the library, caldav.
Read-only is working really well with this library and looks straight-forward to me. It will do the whole job of getting calendars and reading events, returning them in the iCalendar format. More information about the caldav library can also be obtained in the documentation.
import caldav
client = caldav.DAVClient(<caldav-url>, username=<username>,
password=<password>)
principal = client.principal()
for calendar in principal.calendars():
for event in calendar.events():
ical_text = event.data
From this on you can use the icalendar library to read specific fields such as the type (e. g. event, todo, alarm), name, times, etc. - a good starting point may be this question.
I wrote this code few months ago to fetch data from CalDAV to present them on my website.
I have changed the data into JSON format, but you can do whatever you want with the data.
I have added some print for you to see the output which you can remove them in production.
from datetime import datetime
import json
from pytz import UTC # timezone
import caldav
from icalendar import Calendar, Event
# CalDAV info
url = "YOUR CALDAV URL"
userN = "YOUR CALDAV USERNAME"
passW = "YOUR CALDAV PASSWORD"
client = caldav.DAVClient(url=url, username=userN, password=passW)
principal = client.principal()
calendars = principal.calendars()
if len(calendars) > 0:
calendar = calendars[0]
print ("Using calendar", calendar)
results = calendar.events()
eventSummary = []
eventDescription = []
eventDateStart = []
eventdateEnd = []
eventTimeStart = []
eventTimeEnd = []
for eventraw in results:
event = Calendar.from_ical(eventraw._data)
for component in event.walk():
if component.name == "VEVENT":
print (component.get('summary'))
eventSummary.append(component.get('summary'))
print (component.get('description'))
eventDescription.append(component.get('description'))
startDate = component.get('dtstart')
print (startDate.dt.strftime('%m/%d/%Y %H:%M'))
eventDateStart.append(startDate.dt.strftime('%m/%d/%Y'))
eventTimeStart.append(startDate.dt.strftime('%H:%M'))
endDate = component.get('dtend')
print (endDate.dt.strftime('%m/%d/%Y %H:%M'))
eventdateEnd.append(endDate.dt.strftime('%m/%d/%Y'))
eventTimeEnd.append(endDate.dt.strftime('%H:%M'))
dateStamp = component.get('dtstamp')
print (dateStamp.dt.strftime('%m/%d/%Y %H:%M'))
print ('')
# Modify or change these values based on your CalDAV
# Converting to JSON
data = [{ 'Events Summary':eventSummary[0], 'Event Description':eventDescription[0],'Event Start date':eventDateStart[0], 'Event End date':eventdateEnd[0], 'At:':eventTimeStart[0], 'Until':eventTimeEnd[0]}]
data_string = json.dumps(data)
print ('JSON:', data_string)
pyOwnCloud could be the right thing for you. I haven't tried it, but it should provide a CMDline/API for reading the calendars.
You probably want to provide more details about how you are actually making use of the API but in case the query command is indeed not implemented, there is a list of other Python libraries at the CalConnect website (archvied version, original link is dead now).