Hi My dataset contains only quantitative data(numerical). It doesn't have any class attributes. The dataset contains with sales of different years. I need to analyze the data in different ways. Can I use WEKA for this analysis? I tried to use WEKA tool. But it seemed I cannot proceed with WKA unless I have class variables for the dataset. Please kindly give me a hint.
Related
I'm new to PowerBI, and am working on a large database. I am attempting to prepare the data in the PowerQuery Editor.
I would like to code as many steps as possible, as analysing each column manually is extremely time consuming.
My coding goals (in order of priority):
For each query I would like to get their column quality.
Ideally, I would like to export the header names with the column quality, so that I can determine which are relevant. Furthermore, I can also use the column names to determine which column relationships might be relevant. The database is huge, so simply just importing all the data and trying to work with it from their is not feasible, in fact PowerBI comes up with the error that I don't have enough free memory.
I have VBA and some SQL experience.
I know I have a lot to learn w.r.t. PowerBI, and I am working on it, but need some guidance and direction, also on what is possible/feasible.
Any contructive hints, advice, or feedback would be appreciated - thank you!
Use Table.Profile() on each table and load to the data model.
https://learn.microsoft.com/en-us/powerquery-m/table-profile
Basically I have a big Excel dataset about 500x500 with economic information from various companies.
Each row is representing a different company and in columns we have the information. A little bit of it is qualitative like ZIP code, type, etc. But most of it is quantitative. For each of the quantitative info, we have info for 5 years, so we have one column for each year and for each information i.e. Debt 2019, Debt 2020, etc.
So my question is which is the best way to preprocess this data to work with it and how should it be done. Either doing the preprocessing with Excel, running a Script on PowerBI, using Query, SQL, ...
The objective is to have a report which will be accessible online and the user will type the name of the company and it will show them the dashboard with the information of that company (only that one), so they can navigate through it.
The structure and which information is shown is the same for each company, the only thing that changes is the "numbers" that each company has. So it has to be possible to change which data is showing (to use the one from the company they want).
It also needs to be able to show comparative data to other groups of companies or to the total.
I want to have it right from the start, because then changes get complicated.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info). But I am not really sure.
I know how to use Power BI but I have never used it for something this big. I would like to know which way to organize this data is better and some info on how to do it.
Many thanks to everyone.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info).
Yes, do that.
General guidance is to use Power Query in PowerBI to transform the data into a star schema model. See Understand star schema and the importance for Power BI
So that would typically result in one table that has the "dimension" data for each company, a date table, and a "fact" table at the grain of (CompanyId,Date) with the quantitative data.
I'd like to build a tree model with cross-validation. I'd like to assign observations to different cross-validation groups by myself instead of random sampling. I am not sure how SAS can do this. It would be helpful if anybody can share some example SAS code.
Many thanks in advance
Guys this is a bit of newbie question. Ive tried to google it and understand how they work but im not having much luck. I have a datasets created by colleague that connect to one of our systems.
I want to look at using it and trying to make some changes. I can see Its create a .pibx file when i saved a copy of the dataset. i want to look at model section and see if i can pull some further fields(column) into table on the dataset that already links some corresponding data from two other tables. Id like to add more fields(columns) that are not currently in that table
However i don't want to affect other datasets and or the data on the system it is communicating with.
Can anyone advise me if this is the case.
As i really only want to test things for now and not make any changes that might affect other people
You can create duplicate for original table( Query ) then play with that duplicat
Amazon Machine Learning works with CSV files of data. It doesn't appear to have any ability to work with relational data to represent one-to-many relationships.
How should I transform a relational dataset so that it can be used for machine learning?
Would it be best to denormalize the dataset or am I thinking about this the wrong way?
Your best bet would be to denormalize the dataset, so that each input observation has all the attributes (columns) needed to make a prediction. If you can provide a few example data rows, even using made-up data values, I'd be happy to help more.