WEKA Forecasting Plugin - weka

My question is related to Weka Forecasting Plugin. I'm trying to predict the value of sin(x) in a prediction horizon of 6 using Weka Forecaster, the algorithm is SMOReg, and I'm trying to use different lags ( 1 - 24 ). I'm using a loop for that. The program I wrote is working perfectly but the problem is the results I obtain from my program are different than using Weka GUI directly and I don't know why? Any ideas?
I made sure to use the same "options" of the algorithm but still results are different.
Here's the code:
import java.io.*;
import java.util.*;
import java.util.List;
import weka.core.Instances;
import weka.classifiers.Evaluation;
import weka.classifiers.functions.*;
import weka.classifiers.rules.*;
import weka.classifiers.meta.*;
import weka.classifiers.trees.*;
import weka.classifiers.evaluation.NumericPrediction;
import weka.classifiers.timeseries.WekaForecaster;
import weka.classifiers.timeseries.core.TSLagMaker;
import weka.classifiers.timeseries.core.TSLagMaker.Periodicity;
import weka.classifiers.timeseries.eval.TSEvaluation;
import weka.classifiers.timeseries.eval.ErrorModule;
import weka.core.converters.ConverterUtils.DataSource;
public class TimeSeriesSin {
static Scanner console = new Scanner(System.in);
public static void main(String[] args) throws FileNotFoundException{
try {
String pathToSinData = "C:/Users/khouloud/Desktop/AUSmaster/thesis/summer/chapter1/sin_10.arff";
PrintWriter out = new PrintWriter("C:/Users/khouloud/Desktop/AUSmaster/thesis/summer/chapter1/try_sin_10.txt");
DataSource source = new DataSource(pathToSinData);
Instances diab = source.getDataSet();
for(int i = 1; i <= 24; i++) {
WekaForecaster forecaster = new WekaForecaster();
forecaster.setFieldsToForecast("sin");
SMOreg tr = new SMOreg();
String options = "weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I \"weka.classifiers.functions.supportVector.RegSMOImproved -L 0.001 -W 1 -P 1.0E-12 -T 0.001 -V\" -K \"weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0\"";
tr.setOptions(weka.core.Utils.splitOptions(options));
forecaster.setBaseForecaster(tr);
forecaster.getTSLagMaker().setMinLag(1);
forecaster.getTSLagMaker().setMaxLag(i);
forecaster.buildForecaster(diab, System.out);
forecaster.primeForecaster(diab);
List<List<NumericPrediction>> forecast = forecaster.forecast(6, System.out);
TSEvaluation eval = new TSEvaluation(diab, 0.3);
eval.setHorizon(6);
eval.setEvaluateOnTestData(true);
eval.setEvaluateOnTrainingData(false);
eval.evaluateForecaster(forecaster, System.out);
out.println("lag= " + i);
out.println(eval.toSummaryString());
}
out.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}

I got the same problem for forecasting three different variables. I used MLPRegressor and SMOreg.
I checked the package version of the pdm-timseriesforecasting package from WEKA-GUI in the Package Manager and put the same package versions in my WEKA-code (I was using a maven repository for my code implementation) and the problem was solved.

I faced the same problem.
I think that it is about periodicity configuration (object TSLagMaker).
If you run the code from here,
it will generate the same results as the GUI. If you remove following lines, results will be different:
// add a month of the year indicator field
forecaster.getTSLagMaker().setAddMonthOfYear(true);
// add a quarter of the year indicator field
forecaster.getTSLagMaker().setAddQuarterOfYear(true);
I think that GUI detects it automatically somehow.
Hope that it helped,
Honza

Related

Failed to find data source: delta in Python environment

Following: https://docs.delta.io/latest/quick-start.html#python
I have installed delta-spark and run:
from delta import *
builder = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
spark = spark = configure_spark_with_delta_pip(builder).getOrCreate()
However when I run:
data = spark.range(0, 5)
data.write.format("delta").save("/tmp/delta-table")
the error states: delta not recognised
& if I run
DeltaTable.isDeltaTable(spark, "packages/tests/streaming/data")
It states: TypeError: 'JavaPackage' object is not callable
It seemed that I could run these commands locally (such as unit tests) without Maven or running it in a pyspark shell? It would be good to just see if I am missing a dependency?
You can just install delta-spark PyPi package using pip install delta-spark (it will pull pyspark as well), and then refer to it.
Or you can add a configuration option that will fetch Delta package. It's .config("spark.jars.packages", "io.delta:delta-core_2.12:<delta-version>"). For Spark 3.1 Delta versions is 1.0.0 (see releases mapping docs for more information).
I have an example of using Delta tables in unit tests (please note, that import statement is in the function definition because Delta package is loaded dynamically):
import pyspark
import pyspark.sql
import pytest
import shutil
from pyspark.sql import SparkSession
delta_dir_name = "/tmp/delta-table"
#pytest.fixture
def delta_setup(spark_session):
data = spark_session.range(0, 5)
data.write.format("delta").save(delta_dir_name)
yield data
shutil.rmtree(delta_dir_name, ignore_errors=True)
def test_delta(spark_session, delta_setup):
from delta.tables import DeltaTable
deltaTable = DeltaTable.forPath(spark_session, delta_dir_name)
hist = deltaTable.history()
assert hist.count() == 1
environment is initialized via pytest-spark:
[pytest]
filterwarnings =
ignore::DeprecationWarning
spark_options =
spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.jars.packages: io.delta:delta-core_2.12:1.0.0
spark.sql.catalogImplementation: in-memory

Wolframalpha in python with GTTS

I am trying to make a Friday like virtual assistant using this code
import os
from gtts import gTTS
import time
import playsound
import speech_recognition as sr
while True:
def speak(text):
tts = gTTS(text=text, lang="en")
filename = "voice.mp3"
tts.save(filename)
playsound.playsound(filename)
def get_audio():
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)
said = ""
try:
said = r.recognize_google(audio)
print(said)
except Exception as e:
print("Exception: " + str(e))
return said
text = get_audio()
if "who are you" in text:
speak(" I am Monday the virtual assistant")
And i was wondering how to put wolfram alpha in it so i would, say search for ..., then it would speak the answer from wolfram alpha.
Any help would be amazing :)
Install wolframalpha
Then add the following to your code:
import wolframalpha
if 'search for ' in text:
text = text.replace("search for ", "")
client = wolframalpha.Client(app_id)
res = client.query(text)
print(next(res.results).text)
speak(next(res.results).text)
To use the API, you have to go to the homepage, sign up for an account, create an app and get an app id.
To avoid getting any errors, keep the indentation in your 'speak' function uniform.

cx_Freeze fails to freeze script due to mpl_toolkits

I am trying to freeze this program Using the following setup script:
import cx_Freeze
import sys
import os
base = None
if sys.platform == 'win32':
base = "Win32GUI"
executables = [cx_Freeze.Executable("Electric Field API.py", base=base, icon=os.getcwd()+"\\bin\\EFAPIicon.ico")]
cx_Freeze.setup(
name = "Electric Field API",
options = {"build_exe": {'includes': ['numpy.core._methods','numpy.lib.format','tkFileDialog','FileDialog'], 'packages': ["matplotlib",'Tkinter','FileDialog','tkFileDialog'], "include_files":[os.getcwd()+"\\bin\\EFAPIicon.ico"]}},
version = "1.3",
description = "Electric Field Visualization",
executables = executables
)
Unfortunately, when running this, I receive the following error:
When these imports are listed in the setup.py file, I receive the following error from powershell:
If anyone has a way to solve this issue, it would be greatly appreciated.
Apparently mpl_toolkits is a namespace package (no 'init'), hence it has to be treated differently. ( I read a little about this on bitbucket(thanks D. Reaver)
Try adding the following to your build_exe in the options:
'namespace_packages': ['mpl_toolkits']

Saving data from traceplot in PyMC3

Below is the code for a simple Bayesian Linear regression. After I obtain the trace and the plots for the parameters, is there any way in which I can save the data that created the plots in a file so that if I need to plot it again I can simply plot it from the data in the file rather than running the whole simulation again?
import pymc3 as pm
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,9,5)
y = 2*x + 5
yerr=np.random.rand(len(x))
def soln(x, p1, p2):
return p1+p2*x
with pm.Model() as model:
# Define priors
intercept = pm.Normal('Intercept', 15, sd=5)
slope = pm.Normal('Slope', 20, sd=5)
# Model solution
sol = soln(x, intercept, slope)
# Define likelihood
likelihood = pm.Normal('Y', mu=sol,
sd=yerr, observed=y)
# Sampling
trace = pm.sample(1000, nchains = 1)
pm.traceplot(trace)
print pm.summary(trace, ['Slope'])
print pm.summary(trace, ['Intercept'])
plt.show()
There are two easy ways of doing this:
Use a version after 3.4.1 (currently this means installing from master, with pip install git+https://github.com/pymc-devs/pymc3). There is a new feature that allows saving and loading traces efficiently. Note that you need access to the model that created the trace:
...
pm.save_trace(trace, 'linreg.trace')
# later
with model:
trace = pm.load_trace('linreg.trace')
Use cPickle (or pickle in python 3). Note that pickle is at least a little insecure, don't unpickle data from untrusted sources:
import cPickle as pickle # just `import pickle` on python 3
...
with open('trace.pkl', 'wb') as buff:
pickle.dump(trace, buff)
#later
with open('trace.pkl', 'rb') as buff:
trace = pickle.load(buff)
Update for someone like me who is still coming over to this question:
load_trace and save_trace functions were removed. Since version 4.0 even the deprecation waring for these functions were removed.
The way to do it is now to use arviz:
with model:
trace = pymc.sample(return_inferencedata=True)
trace.to_netcdf("filename.nc")
And it can be loaded with:
trace = arviz.from_netcdf("filename.nc")
This way works for me :
# saving trace
pm.save_trace(trace=trace_nb, directory=r"c:\Users\xxx\Documents\xxx\traces\trace_nb")
# loading saved traces
with model_nb:
t_nb = pm.load_trace(directory=r"c:\Users\xxx\Documents\xxx\traces\trace_nb")

Importing CSV to Django and settings not recognised

So i'm getting to grips with Django, or trying to. I have some code that isn't dependent on being called by the webpage - it's designed to populate the database with information. Eventually it will be set up as a cron job to run overnight. This is the first crack at it, which is to do an initial population (once I have that working, I'll move to an add structure, where only new records are pushed.) I'm using Python 2.7, Django 1.5 and Sqlite3. When I run this code, I get
Requested setting DATABASES, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
That seems fairly obvious, but I've spent a couple of hours now trying to work out how to adjust that setting. How do I call / open a connection / whatever the right terminology is here? I have a number of functions like this that will be scheduled jobs, and this has been frustrating me all afternoon.
import urllib2
import csv
import requests
from django.db import models
from gmbl.models import Match
master_data_file = urllib2.urlopen("http://www.football-data.co.uk/mmz4281/1213/E0.csv", "GET")
data = list(tuple(rec) for rec in csv.reader(master_data_file, delimiter=','))
for row in data:
current_match = Match(matchdate=row[1],
hometeam=row[2],
awayteam = row [3],
homegoals = row [4],
awaygoals = row[5],
homeshots = row[10],
awayshots = row[11],
homeshotsontarget = row[12],
awayshotsontarget = row[13],
homecorners = row[16],
awaycorners = row[17])
current_match.save()
I had originally started out with http://django-csv-importer.readthedocs.org/en/latest/ but I had the same error, and the documentation doesn't make much sense trying to debug it. When I tried calling settings.configure in the function, it said it didn't exist; presumably I had to import it, but couldn't make that work.
Make sure Django, and your project are in PYTHONPATH then you can do:
import urllib2
import csv
import requests
from django.core.management import setup_environ
from django.db import models
from yoursite import settings
setup_environ(settings)
from gmbl.models import Match
master_data_file = urllib2.urlopen("http://www.football-data.co.uk/mmz4281/1213/E0.csv", "GET")
data = list(tuple(rec) for rec in csv.reader(master_data_file, delimiter=','))
# ... your code ...
Reference: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
Hope it helps!