Power BI Run Python Script Error - We Couldn't Convert to Number - Why does this happen? - powerbi

More of a general question than a singular specific code example, because I keep getting this error with different scripts.
What happens is I will test a simple script in Jupyter Notebook and it works, only for it to fail with the 'We couldn't convert to Number' error in Power BI. What are the underlying reasons this happens so I know what I need to look out for?
Here is some code just in case it helps. Again, it works in Jupyter Notebook but fails in Power BI.
import pandas as pd
import datetime
from time import strptime
date=datetime.date.today()
year=date.strftime('%Y')
month=pd.to_numeric(date.strftime('%m'))
dataset['Year']=dataset['Year'].astype(str).str.strip()
dataset['Month']=dataset['Month'].astype(str).str.strip()
dataset['MonthNumber'] = [strptime(str(x), '%B').tm_mon for x in
dataset['Month']]
dataset['MonthNumber']=pd.to_numeric(dataset['MonthNumber'])
dataset['Year']=dataset['Year'].astype(str).str.strip()
dataset['ThisMonth']=dataset['MonthNumber']==(month-1)
dataset['ThisYear']=dataset['Year']==year

Related

Choropleth in Folium - having issues

I am quite new to coding and need to create some type of map to display my data.
I have data in an excel sheet showing a list of countries and a number showing amount of certain crimes. I want to show this in a choropleth map.
I have seen lots of different ways to code this but can't seem to get any to correctly read in the data. Will I need to get import country codes into my df?
I have my world map from github and downloaded it in its raw format to my computer and into jupyter notebook.
My dataframe is also loaded, as the excel sheet, in jupyter notebook.
What are the first steps i need to take to load this into a map?
This is the code I have had the most success with:
import pandas as pd
import folium
df = pd.read_excel('UK - Nationality and Type.xlsx')
state_geo = 'countries.json'
m1 = folium.Map(location=[55, 4], zoom_start=3)
m1.choropleth(
geo_data=state_geo,
data=df,
columns=['Claimed Nationality', 'Labour Exploitation'],
key_on='feature.id',
fill_color='YlGn',
fill_opacity=0.5,
line_opacity=0.2,
legend_name='h',
highlight=True
)
m1.save("my_map.html")`
But i just get a big world map, all in the same shade of grey
this is what the countries.json looks like

PyTrends Historical Hourly Data

I am new to Python and coding in general. Thank you for your patience.
Using PyTrends I am trying to get hourly results for Google Trends for a single search term for an entire year . I see that the Python Software foundation (https://pypi.org/project/pytrends/ )states"'now 1-H' " "Seems to only work for 1, 4 hours only" . I have tried some examples of people trying to get custom hourly searches but none work for me. I am wondering is it no longer possible to get Google Trends historical hourly data and I should just stop looking?
Maybe a little bit late already, but in case anybody finds this question in the future, you can find the most recent way of how to get historical hourly Google Trends data on the github of the PyTrends developer:
https://github.com/GeneralMills/pytrends#historical-hourly-interest
A short snippet with the respective function applied:
# Since pytrends is returning a DataFrame object, we need pandas:
import pandas as pd
# Import of pytrends (needs to be pip installed first):
from pytrends.request import TrendReq
pytrends = TrendReq(hl='en-US', tz=360)
kw_list = ['search_term_1', 'search_term_2']
search_df = pytrends.get_historical_interest(kw_list, year_start=2019,
month_start=5, day_start=1,
hour_start=0, year_end=2019,
month_end=7, day_end=31, hour_end=0,
cat=0, geo='', gprop='', sleep=60)
Replace the respective parameters with the desired ones and there you go - hourly historical Google Trends data.
Hope it helped!
Best,
yawicz

How can I load historical stock indices data in my script without yahoo or google finance using pandas_datareader?

Sorry for my noobish question as I'm trying to use Python for my finance class in grad school.
I am currently stuck trying to load historical stock indices of DOW JONES, S&P 500 and NASDAQ. Sadly, Google Finance is useless with stock indices so I need help circumventing this obstacle.
Here is the the line of my code that handles the loading process:
import pandas as pd
from pandas_datareader import data as web
import matplotlib.pyplot as plt
ticker = ['^DJI', '^INX', '^IXIC']
ind_data = pd.DataFrame()
for i in ticker:
ind_data[i] = web.DataReader(i, data_source='google', start='2000-1-1')['Close']
Thanks in advance.
You can use quandl. It is a third party package you will need to install.
pip install quandl
Go to their website, make an account and get an api key. Then search the site to find the right code to type in for each query.
You can pass the quandl.get function a variety of different parameters to get exactly what you need. The below retrieves the dow jones industrial average and outputs it as a pandas dataframe.
quandl.get("BCB/UDJIAD1")
Output
...
2016-04-06 17716.05
2016-04-07 17541.96
2016-04-08 17576.96
2016-04-11 17556.41
2016-04-12 17721.25
2016-04-13 17908.28
2016-04-14 17926.43
2016-04-15 17897.46

Parallel excel sheet read from dask

Hello All the examples that I came across for using dask thus far has
been multiple csv files in a folder being read using dask read_csv
call.
if I am provided an xlsx file with multiple tabs, can I use anything
in dask to read them parallely?
P.S. I am using pandas 0.19.2 with python 2.7
For those using Python 3.6:
#reading the file using dask
import dask
import dask.dataframe as dd
from dask.delayed import delayed
parts = dask.delayed(pd.read_excel)(excel_file, sheet_name=0, usecols = [1, 2, 7])
df = dd.from_delayed(parts)
print(df.head())
I'm seeing a 50% speed increase on load on a i7, 16GB 5th Gen machine.
A simple example
fn = 'my_file.xlsx'
parts = [dask.delayed(pd.read_excel)(fn, i, **other_options)
for i in range(number_of_sheets)]
df = dd.from_delayed(parts, meta=parts[0].compute())
Assuming you provide the "other options" to extract the data (which is uniform across sheets) and you want to make a single master data-frame out of the set.
Note that I don't know the internals of the excel reader, so how parallel the reading/parsing part would be is uncertain, but subsequent computations once the data are in memory would definitely be.

What is the syntax for HiveThriftContext's get_partitions_by_filter command?

Goal
I'm trying to check if a partially-specified partition exists in a hive table.
Details
I have a table with two partition keys, source and date. Before a task can execute, I need to check and see if any partition exists for a certain date (source is not specified).
Attempts
I can do this easily with luigi's built-in hive partition target and the default client:
>>> import luigi.hive as hive
>>> c = hive.HivePartitionTarget('data',{"date":"2016-03-31"})
>>> c.exists()
True
>>> c = hive.HivePartitionTarget('data',{"date":"2016-03-32"})
>>> c.exists()
False
But the default client is really, really slow because it is spinning up a command-line instance of hive and running a query. So I tried to swap the default client out for the thrift one, and this happend:
>>> d = hive.HivePartitionTarget('data',{"date":"2016-03-31"}, client=hive.MetastoreClient())
>>> d.exists()
False
It appears that the two clients interpret partially-specified partitions differently.
I have already written my own client that inherits from MetastoreClient and adds some additional functions I needed in the past, so I don't mind adding a partially-specified partition check of my own design. And it looks like the client has the functions I need:
>>> from pprint import pprint
>>> import luigi.hive as hive
>>> client = hive.HiveThriftContext().__enter__()
>>> pprint([command for command in dir(client) if 'partition' in command])
[ # Note: I deleted the irrelevant commands, this was a really long list
'get_partition',
'get_partition_by_name',
'get_partition_names',
'get_partition_names_ps',
'get_partition_with_auth',
'get_partitions',
'get_partitions_by_filter',
'get_partitions_by_names',
'get_partitions_ps',
'get_partitions_ps_with_auth',
'get_partitions_with_auth',
# Even more commands snipped here
]
It looks like the command get_partitions_by_filter might do exactly what I want, but I can't find any documentation for it anywhere aside from auto-generated lists of the types it expects. And I've run in to similar problems with the simpler functions: when I fully-specify partitions that I know exist, I can't get get_partition or get_partition_by_name to find them. I am sure this is because I am not providing arguments in the right format, but I don't know what the correct format is and my patience has run out with regards to guessing.
What is the syntax for HiveThriftContext's get_partitions_by_filter command?
Follow up question: How did you figure this out?