Determining labels for timeseries - dl4j

I have csv data from various weather sensors like [real data is far more complex from many sensors]:
TimePeriod Temp Pressure WindSpeed
1, 16, 100, 57
2, 18, 96, 71
There is also the "result": Snow, Rain, Sunny, Wind Shear, etc for each time period.
I want to train a Neural Network to learn from the data to predict the "result". I have sufficient data to create both training and test datasets.
I am stuck at "go" on the Vectorization! I think I simply have one large CSV file for a period of several months as training data.
Would that limit me to a single Label?
For time-series predictions such as this, I haven't been able to find a good example about setting up such training data.

Isn't the Iris dataset somewhat similar to what you have?
https://github.com/deeplearning4j/dl4j-examples/blob/7b4d76c9ff8de7697a1ff97ee917a10f5f3873e3/dl4j-examples/src/main/java/org/deeplearning4j/examples/dataexamples/CSVExample.java

Related

How to forecast in Power BI using DAX

I have a chart in excel I wish to replicate in PBI.
My Excel chart is a bar chart dated axis with 2 series, running up march, but with data only up until now.
Each bar series has a trendline, which forecasts a trend up to the end of the financial year.
In power BI I have tried to replicate, but I cannot seem to add 'forecast' from the analytics tab unless my chart is a line graph.
So I now have 2 line series in a chart like so:
I have added trendline to both but there is no option to add the forecast line unless I only have 1 data series.
As I've removed a series I can now toggle on the forecast line in the analytics tab.
So I now have 1 line series in my line graph like so:
I actually need to have the data and the corresponding forecast for both data series, so I would have series 1 & 2 plotted like above, on the same graph together.
Is there a way to do this on the analytics tab??
If not, how should I go about it? I was thinking I could use DAX to forecast until March 2023 instead and then drag the line into the line graph and format it to dashed?
Thanks
Here is some sample data step by step for what i'm looking for (very simple thing to do in excel!):
This is the same data, pivoted with the 2 series mapped out on a line graph, and trendline added for each:
I just want to be able to show a trendline extending and forecasting forward, past the months I already have like I have done in Excel.
Yeah i think here your best solution would be to use the "what if" parameter in Dax to get the result you need
For a forecasting line to be performed on a line chart, Below requirements should be met. Otherwise You can't see this option on the analytics tab:
You need to change your datasets, and datatypes accordingly.
The last one is especially valid for you, because there is not a one-day period between your data values, in fact monthly organized.

Google AutoML tables

I'm new to google automl tables and have a basic question about which data is worthwhile including in the training of my model.
I have a dataset of golfers and will be looking at the averages of scores over different periods. For example, average over the past 3 months, 6 months, 1 year etc.
My question is, is it worthwhile also including the sample size for each date range for each player. For example, over the past 3 months, some players will have a sample size of 28 while some will only have 2. Those players that have 28 rounds will have more accurate averages than those with 2. However, I didn't know whether google automl tables would pick up this link automatically, whether I could create a different weighting/reliability variable, or whether there's a way to specify a link between columns? Or if this automated type of automl isn't really suitable?
Thanks in advance

Amazon forecast takes long time to forecast

Like the title says the forecast generating takes a long time. I am updating the data (target and related data) and i already have pretrained a predictor. The target dataset is relatively small and it has granularity of 1 hour.
On each forecast generation it seems like the predictor retrains before it makes inference. I am trying to forecast 1 month ahead and the only solution I found is to reupload the data and make new forecasts with 24 hour forecast horizon( this is used for training the predictor).
The upload/import jobs/forecast generating and export take close to 1hour all together.
Is there anyway i can update the data and generate forecasts faster without letting the forecasting retrain on the newly added data?
CreateForecast does not retrain the forecasting model. It just creates the forecast.
https://docs.aws.amazon.com/forecast/latest/dg/howitworks-forecast.html

Import 500 GB of Data into Power BI

I want to import a 500 GB dataset into Power BI, but Power BI is limited 1 GB. How can I get the data into Power BI?
Thanks.
For 500GB I'd definitely recommend Direct Query mode (as Joe recommends) or a live connection to a SSAS cube. In these scenarios, the data model is hosted in a separate location (such as a database server) and Power BI sends its queries to that location and displays the returned results.
However, I'll add that the 1GB limit is the limit after compression. (Meaning you can fit more than 1GB of uncompressed data into the advertised 1GB dataset limit.)
While it would be incredibly difficult to reduce a 500GB dataset to 1GB (even with compression), there are things you can do once you understand how the compression works in Power BI.
In Power BI, compression is done by columns, not rows. So a column that has 800 million rows with identical values can see significant compression. Likewise, a column with a different value in every row cannot be compressed much at all.
Therefore:
Do not import columns you do not absolutely need for analysis (particularly identity columns, GUIDs, free-form text fields, or binary data such as images)
Look at columns with a high degree of variability and see if you can also eliminate them.
Reduce the variability of a column where possible. E.g. if you only need a date & not a time, do not import the time. If you only need the whole number, do not import 7 decimal places.
Bring in less rows. If you cannot eliminate high-variability columns, then importing 1 year of data instead of 17 (for example) will also reduce the data model size.
Marco Russo & the SQLBI team have a number of good resources for further optimizing the size of a data model (SSAS tabular, Power Pivot & Power BI all use the same underlying modelling engine). For example: Optimizing Multi-Billion Row Tables in Tabular
If possible given your source data, you could use Direct Query mode. The 1 GB limit does not apply to Direct Query. There are some limitations to Direct Query mode, so check the documentation to make sure that it will meet your needs.
Some documentation can be found here.
1) make Aggregation on data on sql side __reduce size
2) import only useful column____________reduce size

Doing predictions on Test data in SAS

I am doing multiple linear regression using SAS. I have divided the data into train and test in the ratio 70 % and 30 %. I have used proc reg to build model on the training data. I want to use this model to get predicted values on the test data. How would I do that ?