I have just started using WEKA and get some problems with file converting
Below I have a test table in .csv
The problem is that I need 4 attributes (4 columns): arr/number/airport/code.But WEKA recognise the table as one united column, so it is impossible to analyze it.
Arr number airport code
departure 221 SVO VQBOQ
departure 222 SLY VQBOE
departure 223 AER VQBOT
arrival 224 SLY VQBOT
departure 225 DME VQBOU
departure 226 SLY VQBOM
How to adjust the headers to get separate columns in WEKA?
Convert the .csv file back into excel(.xlsx) or open office(.odt/.calc). Then, in the Excel sheet, make sure the first row is denoting the different attributes (no similarly named attributes are allowed) where the very last column is the class labels. After this stage, save the Excel sheet as .csv and make sure to choose text delimiter as shown in this figure:
Everything should work now.
Related
I am trying to display the outcome scores on one Excel sheet into another Excel sheet based on the outcome name and course.
If the text in Sheet1!C2=communication and Sheet1!E2=Comm 2010, then display Sheet1!D2 on Sheet2!B3.
If the text in Sheet1!C4=information* and Sheet1!E4=Commm 3000, then display Sheet1!1D4 on Sheet2!C5.
Need to be able to use Wildcard when checking the text.
If the text in Sheet1!C6=communication and Sheet1!E6=Comm2010, but there is no number in Sheet1!D6, leave Sheet2!B5 blank
I have played around with a few different IF AND formulas, but I can't get the data displayed correctly.
Right now, I am building a pivot table from the data in Sheet1, then taking the table and formatting it to match the table on Sheet1 then using =IF(Pivot!C7="","",Pivot!C7). This works, but building a pivot table for each student and then formatting it to match Sheet1 is a time drain.
I'm really hoping there is a better way to do this.
Thank you!
Since you are compiling outcomes on a per-student basis and not in total it is safe to use the SUMPRODUCT() function:
The formula below is used in B3
=SUMPRODUCT((Sheet1!$E$2:$E$6=Sheet2!B$1)*(Sheet1!$C$2:$C$6=Sheet2!$A3)*(Sheet1!$D$2:$D$6))
and can be copied across and down throughout B3:C4
The formula used in B5 is different, because of the 'wildcard criterion'
=SUMPRODUCT((Sheet1!$E$2:$E$6=Sheet2!B$1)*(LEFT(Sheet1!$C$2:$C$6,11)="Information")*(Sheet1!$D$2:$D$6))
(unless you are using Microsoft 365, having the formula directly suppress 0 values essentially entails doubling it in length so, as an alternative, given the small output range, a custom-number format has been implemented, which effectively doesn't display 0 in a cell where that is the formula result)
Background:
Have a monitoring script that is run 3 times a day and outputs a .csv file to a SharePoint folder. Each time the script is run, the new csv contains an update on the various processes run. Currently able to get all of csv files back as a series of rows in the transformation.
Question:
Is there a way to limit the amount of rows for each day to just the Top 1 row so that the dashboard being created shows the most up-to-date information for each particular day. Would like to do this at the Transform stage so don't have to load any unnecessary data.
Eg. Example data in tranformation:
Filename
Extension
Date created
Keep in Transformation?
file9
.csv
29/04/2021 07:52:41
KEEP
file8
.csv
28/04/2021 16:52:14
KEEP
file7
.csv
28/04/2021 11:52:20
[redundant]
file6
.csv
28/04/2021 07:52:49
[redundant]
file5
.csv
27:04/2021 16:51:41
KEEP
file4
.csv
27/04/2021 11:52:21
[redundant]
file3
.csv
27/04/2021 07:52:03
[redundant]
file2
.csv
26/04/2021 16:52:43
KEEP
file1
.csv
26/04/2021 11:52:20
[redundant]
Feels weird to answer my own question, but thought I would post, just in case someone has the same question...
The steps to get the latest row for each day are:
Ensure that the dataset is ordered by the Date created column in descending order.
Duplicate the Date created column to perform transformations on. It might create a new column called Date created - Copy.
Highlight the Date created - Copy column, and then select Split Column by Delimiter. As it's a Date/Time column, I split the column by the Space delimiter. This will create 2 new columns, Date created - Copy.1 and Date created - Copy.2.
Highlight the new Date column Date created - Copy.1 and then select Remove Rows - Remove Duplicates.
At this point you should only see the latest row of data for each day.
Remove the 2 split columns to tidy up the dataset.
I have a set of data ~36 000 rows from which in one column there are numbers and numbers with text (100567563; WT1632366; 3275-2422 etc.) I need it to show the data as it is. It's not an error and I have tried changing what the data is (text numbers general in excel and in Power bi with no success. Any tips?
In power bi ensure that you have the datatype of that column as text, It works absolutely fine for me.
Power Query previews the dataset, and determines the datatype from the first 1000 rows. From your question I'm going to assume that the first lot of previewed rows are numerical.
You can order your Excel file to have the first rows as 'WT1632366' then when it loads the data it will convert it to the text/string type, and load the numerical columns as text.
If you look in the query editor, you will see a 'changed type', you can see the column name and the format. In the below image I have a column called 'Data' you can change it from:
"Data", Int64.Type
to
"Data", type text
And it should load.
Note: If you insert a step after it that does this, it may still not load and error
In the image, the first 1200 rows are the number 1, the 1201 row is the text 'ABC' this will fail on load unless you change formating to text. You can do this by clicking on the column and clicking on the '123' and change it from the selection to text. If it asked to replace current step use that option.
Once the datatypes are set it will not reavelate them on later loads, so you don't have to worry about data type changes
I am currently trying to read an xls file in pandas with multiple worksheets(each month has 1 worksheet per day). I don't need all worksheets just the sheets that are named 1 to 31 depending on which month. How would I go about just joining only those dataframes into 1 data frame.
I tried to hard-code the sheetnames in but got errors on days with only 28 or 30 days.
Is there a way to just read the sheetname if it is an int as the sheets I dont need are usually named 'Sheet1' etc
We read SAS xpt files to load data in .net. Everything works fine but recently we have encountered a problem where the customer has stored date as a numeric value in a column and provided a Format in the file header. The SAS viewer can display that data correctly using the given format but we have to load that data in .net in our program and we don't require SAS.
I recently found out that you can use the SaS LocalProvider with OLEDB but it turns out that it does not support Numeric formatting. So we are ending up with the wrong data in columns where data is stored as a numeric value with a format provided for it.
Can anyone please help me understand and resolve the issue with probably some code sample. I have looked around for code samples in .Net but with no luck so far for this issue.
Thanks in advance.
Regards,
Nasir
SAS Date values are stored as the number of days since Jan 1, 1960.
122
123 data _null_;
124 x=today();
125 put x=;
126 run;
x=19410
Today (2/21/2013) for example is 19410 days since 1/1/1960. Assuming you know your own software's date format (probably some number of days since some other date), you can perform the transformation on your own.
If it's relevant, SAS datetime values are # of seconds since 1/1/1960 00:00:00 .
128 data _null_;
129 x=datetime();
130 put x=;
131 run;
x=1677052885.5
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
Again, that's the time as of 08:00 2/21/2013.