PyTrends Historical Hourly Data - google-trends

I am new to Python and coding in general. Thank you for your patience.
Using PyTrends I am trying to get hourly results for Google Trends for a single search term for an entire year . I see that the Python Software foundation (https://pypi.org/project/pytrends/ )states"'now 1-H' " "Seems to only work for 1, 4 hours only" . I have tried some examples of people trying to get custom hourly searches but none work for me. I am wondering is it no longer possible to get Google Trends historical hourly data and I should just stop looking?

Maybe a little bit late already, but in case anybody finds this question in the future, you can find the most recent way of how to get historical hourly Google Trends data on the github of the PyTrends developer:
https://github.com/GeneralMills/pytrends#historical-hourly-interest
A short snippet with the respective function applied:
# Since pytrends is returning a DataFrame object, we need pandas:
import pandas as pd
# Import of pytrends (needs to be pip installed first):
from pytrends.request import TrendReq
pytrends = TrendReq(hl='en-US', tz=360)
kw_list = ['search_term_1', 'search_term_2']
search_df = pytrends.get_historical_interest(kw_list, year_start=2019,
month_start=5, day_start=1,
hour_start=0, year_end=2019,
month_end=7, day_end=31, hour_end=0,
cat=0, geo='', gprop='', sleep=60)
Replace the respective parameters with the desired ones and there you go - hourly historical Google Trends data.
Hope it helped!
Best,
yawicz

Related

Anomaly detection in production

I am trying to search for suggestions and solutions, but I am unable to find any.
After reading blogs, I am able to build a time series anomaly detection using BigQuery ML (Arima Plus).
My question is: how do I put such a model in production?
Probably I need to:
program the re-training of the model every X days
check whether there are new anomalies on the object table every X hours
record those anomalies in another table
But I also accept other suggestion on how to proceed.
Is there anyone out there that can give me any hint?
Thank you!
The best way I found is to create "scheduled queries":
schedule a query for re-training of the model every X days:
CREATE OR REPLACE MODEL mymodel
OPTIONS( model_type='arima_plus',
TIME_SERIES_DATA_COL='events',
TIME_SERIES_TIMESTAMP_COL='approx_hour',
HOLIDAY_REGION = 'GLOBAL',
CLEAN_SPIKES_AND_DIPS = FALSE,
DECOMPOSE_TIME_SERIES=TRUE)
AS (SELECT 
TIMESTAMP_TRUNC( PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%E*SZ',start_time), hour) as approx_hour, 
COUNT(1) AS events 
FROM  `mytable`
GROUP BY approx_hour);
 
schedule a query to perform anomaly detection on the latest events, and eventually write them on a table:
insert into `events_anomalies_table`
SELECT approx_hour as hour,
cast(events as int64) as actual_events,
cast(lower_bound as int64) as expected_min_events,
cast(upper_bound as int64) as expected_max_events,
current_timestamp() as execution_timestamp
FROM ML.DETECT_ANOMALIES(
MODEL`my_model`,
STRUCT (0.98 AS anomaly_prob_threshold),
( SELECT
TIMESTAMP_TRUNC( PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%E*SZ',start_time), hour) as approx_hour, 
COUNT(1) AS events 
FROM  `my_table`
WHERE PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%E*SZ',start_time) > TIMESTAMP_SUB(CURRENT_TIMESTAMP() , INTERVAL 1 HOUR)
GROUP BY approx_hour
LIMIT 1))
WHERE is_anomaly = True

Power BI Run Python Script Error - We Couldn't Convert to Number - Why does this happen?

More of a general question than a singular specific code example, because I keep getting this error with different scripts.
What happens is I will test a simple script in Jupyter Notebook and it works, only for it to fail with the 'We couldn't convert to Number' error in Power BI. What are the underlying reasons this happens so I know what I need to look out for?
Here is some code just in case it helps. Again, it works in Jupyter Notebook but fails in Power BI.
import pandas as pd
import datetime
from time import strptime
date=datetime.date.today()
year=date.strftime('%Y')
month=pd.to_numeric(date.strftime('%m'))
dataset['Year']=dataset['Year'].astype(str).str.strip()
dataset['Month']=dataset['Month'].astype(str).str.strip()
dataset['MonthNumber'] = [strptime(str(x), '%B').tm_mon for x in
dataset['Month']]
dataset['MonthNumber']=pd.to_numeric(dataset['MonthNumber'])
dataset['Year']=dataset['Year'].astype(str).str.strip()
dataset['ThisMonth']=dataset['MonthNumber']==(month-1)
dataset['ThisYear']=dataset['Year']==year

Why Amazon Forecast cannot train the predictor?

While training my predictor I came across this error and I got stuck how to fix it.
I have two data-series, a "Target time-series data" with 9234 rows and a single "item_id" and a second one that is "Related time-series data" with the same number of rows as I only have a single id.
I'm setting de data with a window of 180 days, what is exactly the difference between the second and the first number that has appeared on the error, 9414 - 9234 = 180.
We were unable to train your predictor.
Please ensure there are no missing values for any items in the related time series, All items need data until 2020-03-15 00:00:00.0. For example, following items have missing data: item: brl only has 9234/9414 required datapoints starting 1994-06-07 00:00:00.0, please refer to documentation for additional details.
Once my data don't have missing data and it's on a daily basis why is it returning this error?
My data starts on 1994-06-07 and ends on 2019-09-17. Why should I have 9414 data points rather than 9234?
Should I take out 180 days in my "Target time-series data"?
The future values of the related time-series data must be known.
Example of a good related-time series: You know past and future days in which marketing has or will send email newsletters promoting the product you're forecasting. You can use this data as a related-time series.
Example of a bad related-time series: You notice that Google searches for your brand correlated with the sale of your product. As a result you want to use it as a related-time series. Since you don't know how many searches will occur in the future, so you can't use this as a related time series.
In you case, You have TARGET_TIME_SERIES data for 9414 days and you want to predict demand for the next 180 days. That means your RELATED_TIME_SERIES data should be 9594 days.
Edit: I have not tested this with amazon's forecasting product. I'm basing my answer on working with Facebook Prophet (which is one of the models amazon forcast uses). Please let me know if my solution worked.

How can I load historical stock indices data in my script without yahoo or google finance using pandas_datareader?

Sorry for my noobish question as I'm trying to use Python for my finance class in grad school.
I am currently stuck trying to load historical stock indices of DOW JONES, S&P 500 and NASDAQ. Sadly, Google Finance is useless with stock indices so I need help circumventing this obstacle.
Here is the the line of my code that handles the loading process:
import pandas as pd
from pandas_datareader import data as web
import matplotlib.pyplot as plt
ticker = ['^DJI', '^INX', '^IXIC']
ind_data = pd.DataFrame()
for i in ticker:
ind_data[i] = web.DataReader(i, data_source='google', start='2000-1-1')['Close']
Thanks in advance.
You can use quandl. It is a third party package you will need to install.
pip install quandl
Go to their website, make an account and get an api key. Then search the site to find the right code to type in for each query.
You can pass the quandl.get function a variety of different parameters to get exactly what you need. The below retrieves the dow jones industrial average and outputs it as a pandas dataframe.
quandl.get("BCB/UDJIAD1")
Output
...
2016-04-06 17716.05
2016-04-07 17541.96
2016-04-08 17576.96
2016-04-11 17556.41
2016-04-12 17721.25
2016-04-13 17908.28
2016-04-14 17926.43
2016-04-15 17897.46

Yahoo! Finance API DOW [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Until now, I've been using the INDU ticker to follow the DOW with the Yahoo! API. For whatever reason you were unable to directly follow ^dji ^djia or any other reasonable combination. Up until yesterday, INDU was working fine. However now I receive no data when requesting indu.
What other ticker can I use with the Yahoo! finance API that will return the DJIA?
This index is not available under any other name.
However, this problem was just a temporary glitch, now resolved by Yahoo. Unfortunately, their financial data availability is very flaky lately. E.g. data available on the web page, but CSV downloads give "N/A" for all fields, etc. There were similar incidents in recent months, with stock prices for random stocks given wrong values, and more.
So, if you're building a new service around these Yahoo services, be aware that:
These services are not reliable.
You're breaking Yahoo ToS, so there's nothing you can do if they are broken / not working, you cannot even complain to Yahoo in good faith.
According to Yahoo (post by Yahoo Developer Network Community Manager Robyn Tippins on Yahoo developer forums):
The reason for the lack of documentation is that we don't have a Finance API. It appears some have reverse engineered an API that they use to pull Finance data, but they are breaking our Terms of Service (no redistribution of Finance data) in doing this so I would encourage you to avoid using these webservices.
The formula for the DJIA isn't very complicated. If you are still able to pull quotes from individual stocks, you could use your code to pull the prices of the existing 30 components of the DJIA, add them up and divide by the current divisor. Of course, this has several disadvantages.
You need to make 30 requests instead of one.
You will have to adjust the divisor if there is a stock-split.
You will have to change the the queries when the components
change.
The components of the DJIA are
AA AXP BA BAC CAT CSCO CVX DD DIS GE HD
HPQ IBM INTC JNJ JPM KFT KO MCD MMM MRK
MSFT PFE PG T TRV UTX VZ WMT XOM
The current divisor is 0.132129493.
The divisor changes whenever there is a stock split in on of the components. The components of the DOW changed 48 times from 1896-2009.
It seems like Yahoo Finance does not support the web service to query ^DJI or INDU.
Check out this discussion:
http://developer.yahoo.com/forum/General-Discussion-at-YDN/Dow-Jones-Industrial-Average-Quote-Error/1317052217631-f9173931-04fd-4519-b1b3-efb65d7ff8fa/1317065435082
Assuming that your application does not need to be real time market data (to the second), you can use the RAW data that is provided to build the interactive graph on yahoo. This data is comma separated and updates about once every minute. The downside: it will include all the data from the trading day. The time given is in Unix time so a conversion would be needed. I tried this out for the ticker symbols you listed and the only one I was able to get data with was ^dji. Hopefully this is what you are looking for!
You can mess with the link and see what happens to the data. For example you can change the amount of days.
http://chartapi.finance.yahoo.com/instrument/1.0/%5Edji/chartdata;type=quote;range=1d/csv/
I think Yahoo Finance All Currencies quote API Documentation will help you.
I found a Yahoo forum answer that says we cannot download CSV data for ^DJI.
Check also YQL console. This console will fetch values in JSON format.
The DIA ticker (SPDR Dow Jones Industrial Average) closely imitates the Dow.