How can I load historical stock indices data in my script without yahoo or google finance using pandas_datareader? - python-2.7

Sorry for my noobish question as I'm trying to use Python for my finance class in grad school.
I am currently stuck trying to load historical stock indices of DOW JONES, S&P 500 and NASDAQ. Sadly, Google Finance is useless with stock indices so I need help circumventing this obstacle.
Here is the the line of my code that handles the loading process:
import pandas as pd
from pandas_datareader import data as web
import matplotlib.pyplot as plt
ticker = ['^DJI', '^INX', '^IXIC']
ind_data = pd.DataFrame()
for i in ticker:
ind_data[i] = web.DataReader(i, data_source='google', start='2000-1-1')['Close']
Thanks in advance.

You can use quandl. It is a third party package you will need to install.
pip install quandl
Go to their website, make an account and get an api key. Then search the site to find the right code to type in for each query.
You can pass the quandl.get function a variety of different parameters to get exactly what you need. The below retrieves the dow jones industrial average and outputs it as a pandas dataframe.
quandl.get("BCB/UDJIAD1")
Output
...
2016-04-06 17716.05
2016-04-07 17541.96
2016-04-08 17576.96
2016-04-11 17556.41
2016-04-12 17721.25
2016-04-13 17908.28
2016-04-14 17926.43
2016-04-15 17897.46

Related

Power BI Run Python Script Error - We Couldn't Convert to Number - Why does this happen?

More of a general question than a singular specific code example, because I keep getting this error with different scripts.
What happens is I will test a simple script in Jupyter Notebook and it works, only for it to fail with the 'We couldn't convert to Number' error in Power BI. What are the underlying reasons this happens so I know what I need to look out for?
Here is some code just in case it helps. Again, it works in Jupyter Notebook but fails in Power BI.
import pandas as pd
import datetime
from time import strptime
date=datetime.date.today()
year=date.strftime('%Y')
month=pd.to_numeric(date.strftime('%m'))
dataset['Year']=dataset['Year'].astype(str).str.strip()
dataset['Month']=dataset['Month'].astype(str).str.strip()
dataset['MonthNumber'] = [strptime(str(x), '%B').tm_mon for x in
dataset['Month']]
dataset['MonthNumber']=pd.to_numeric(dataset['MonthNumber'])
dataset['Year']=dataset['Year'].astype(str).str.strip()
dataset['ThisMonth']=dataset['MonthNumber']==(month-1)
dataset['ThisYear']=dataset['Year']==year

Choropleth in Folium - having issues

I am quite new to coding and need to create some type of map to display my data.
I have data in an excel sheet showing a list of countries and a number showing amount of certain crimes. I want to show this in a choropleth map.
I have seen lots of different ways to code this but can't seem to get any to correctly read in the data. Will I need to get import country codes into my df?
I have my world map from github and downloaded it in its raw format to my computer and into jupyter notebook.
My dataframe is also loaded, as the excel sheet, in jupyter notebook.
What are the first steps i need to take to load this into a map?
This is the code I have had the most success with:
import pandas as pd
import folium
df = pd.read_excel('UK - Nationality and Type.xlsx')
state_geo = 'countries.json'
m1 = folium.Map(location=[55, 4], zoom_start=3)
m1.choropleth(
geo_data=state_geo,
data=df,
columns=['Claimed Nationality', 'Labour Exploitation'],
key_on='feature.id',
fill_color='YlGn',
fill_opacity=0.5,
line_opacity=0.2,
legend_name='h',
highlight=True
)
m1.save("my_map.html")`
But i just get a big world map, all in the same shade of grey
this is what the countries.json looks like

PyTrends Historical Hourly Data

I am new to Python and coding in general. Thank you for your patience.
Using PyTrends I am trying to get hourly results for Google Trends for a single search term for an entire year . I see that the Python Software foundation (https://pypi.org/project/pytrends/ )states"'now 1-H' " "Seems to only work for 1, 4 hours only" . I have tried some examples of people trying to get custom hourly searches but none work for me. I am wondering is it no longer possible to get Google Trends historical hourly data and I should just stop looking?
Maybe a little bit late already, but in case anybody finds this question in the future, you can find the most recent way of how to get historical hourly Google Trends data on the github of the PyTrends developer:
https://github.com/GeneralMills/pytrends#historical-hourly-interest
A short snippet with the respective function applied:
# Since pytrends is returning a DataFrame object, we need pandas:
import pandas as pd
# Import of pytrends (needs to be pip installed first):
from pytrends.request import TrendReq
pytrends = TrendReq(hl='en-US', tz=360)
kw_list = ['search_term_1', 'search_term_2']
search_df = pytrends.get_historical_interest(kw_list, year_start=2019,
month_start=5, day_start=1,
hour_start=0, year_end=2019,
month_end=7, day_end=31, hour_end=0,
cat=0, geo='', gprop='', sleep=60)
Replace the respective parameters with the desired ones and there you go - hourly historical Google Trends data.
Hope it helped!
Best,
yawicz

bokeh - plotting shapefile map using datashader

Initially, I created an interactive map of the UK Postcode area where an individual area is color represented based on its value (e.g. population in that post code area) as following.
from bokeh.plotting import figure
from bokeh.palettes import Viridis256 as palette
from bokeh.models import LinearColorMapper
from bokeh.models import ColumnDataSource
import geopandas as gpd
shp = 'file_path_to_the_downloaded_shapefile'
#read shape file into dataframe using geopandas
df = gpd.read_file(shp)
def expandMultiPolygons(row, geometry):
if row[geometry].type = 'MultiPolygon':
row[geometry] = [p for p in row[geometry]]
return row
#Some rows were in MultiPolygons instead of Polygons.
#Expand MultiPolygons to multi rows of Polygons
df = df.apply(expandMultiPolygons, geometry='geometry', axis=1)
df = df.set_index('Area')['geometry'].apply(pd.Series).stack().reset_index()
#Visualize the polygons. To visualize different colors for different post areas, I added another column called 'value' which has some random integer value.
p = figure()
color_mapper = LinearColorMapper(palette=palette)
source = ColumnDataSource(df)
p.patches('x', 'y', source=source,\
fill_color={'field': 'value', 'transform': color_mapper},\
fill_alpha=1.0, line_color="black", line_width=0.05)
where df is a dataframe of four columns : post code area, x-coordinate, y-coordinate, value (i.e. population).
The above code creates an interactive map on a web browser which is great but I noticed the interactivity is not very smooth in speed. If I zoom in or move the map, it renders slowly. The size of the dataframe is only 1106 rows, so I'm quite confused why it is so slow.
As one of the possible solutions, I came across with datashader (https://datashader.readthedocs.io/en/latest/) but I find the example script is quite complicated and most of them are with holoview package on Jupyter notebook but I want to create a dashboard using bokeh.
Does anyone advise me in incorporating datashader into the above bokeh script? Do I need a different function within datashader to create the shape map instead of using bokeh's patches function?
Any suggestion would be highly appreciated!!!
Without the data file involved, I can't answer your question directly, but can offer some observations:
Datashader is unlikely to be of value for this purpose, because datashader does not currently have any support for rendering polygons. As a rule of thumb, Datashader is designed to aggregate your data, and if it's already aggregated, Datashader won't normally be of help. Here your data is aggregated by postcode, which datashader can't process, but if you had the original data per person it would be happy to render it.
If you prefer working with Bokeh directly rather than via the higher-level HoloViews/GeoViews interface, I'd recommend folllwing Matt Rocklin's work on accelerating geopandas; his approach should be very fast for your purpose.
All that said, HoloViews, and GeoViews should be a convenient way to work with Bokeh in general, whether or not you want to create a dashboard. E.g. the 2017 JupyterCon tutorial shows how to make a simple Bokeh dashboard using both libraries. It doesn't cover shape files, but those are covered in other GeoViews examples.
As mentioned in my comment, I believe that the complexity of your polygons might cause your problem. The file you linked to contains several shapefile of different sizes and complexities. You can simplify those, i.e. reduce the number of points for each polygon. This can change how they look. It can range from almost no difference over a bit more "edginess" to an angular appearance. This depends on the level of simplification you chose. Depending on your needs you can chose different levels of simplicity.
I know of three easy options to get this done:
GUI: Try QGis. It is a great opensource tool for geospatial data processing. Load your Shapefile as a new layer. Then use the "Simplify Geometries" tool under the Vector menu.
Command-Line: GDAL is an open-source library. It comes with an useful command-line tool. You can use it like this: ogr2ogr outfile.shp infile.shp -simplify 0.000001
Online: Visit mapshader. Import your file. Select simplify and chose your level. Then, export the result. What I really like here is that your file is rendered instantly. Hence, you can immediately see the result of your simplification.
Other than that, you should also update your bokeh version. It gets updated regularly and there have been some performance improvements since.
Using HoloViews or GeoViews will not positively affect your performance. Thus, it is not related to your issues. I guess #James A. Bednar was just giving some side advice there.
I found a way to speed up the interactive visualization of the UK map as I move the slider.
I created individual image (in 2D) for a different value of slider first and updated the map using the 2D images instead of using bokeh's patches function.
Since the images are in array format, it is much faster to update the image while changing the values in the slider. one downside in this method is that I can no longer use hover function on the UK map.
I referred to the following url to convert polygon information into arrays: https://gist.github.com/brendancol/db030013e981c46acb2886060dde607e#file-rasterio_datashader_polygons-py-L35

Parallel excel sheet read from dask

Hello All the examples that I came across for using dask thus far has
been multiple csv files in a folder being read using dask read_csv
call.
if I am provided an xlsx file with multiple tabs, can I use anything
in dask to read them parallely?
P.S. I am using pandas 0.19.2 with python 2.7
For those using Python 3.6:
#reading the file using dask
import dask
import dask.dataframe as dd
from dask.delayed import delayed
parts = dask.delayed(pd.read_excel)(excel_file, sheet_name=0, usecols = [1, 2, 7])
df = dd.from_delayed(parts)
print(df.head())
I'm seeing a 50% speed increase on load on a i7, 16GB 5th Gen machine.
A simple example
fn = 'my_file.xlsx'
parts = [dask.delayed(pd.read_excel)(fn, i, **other_options)
for i in range(number_of_sheets)]
df = dd.from_delayed(parts, meta=parts[0].compute())
Assuming you provide the "other options" to extract the data (which is uniform across sheets) and you want to make a single master data-frame out of the set.
Note that I don't know the internals of the excel reader, so how parallel the reading/parsing part would be is uncertain, but subsequent computations once the data are in memory would definitely be.