Weka continually throws error "Unable to determine structure as arff" - weka

I'm trying to import an arff file into weka and I am continually getting the following error
Unable to determine structure as arff (Reason: java.io.IOException: } expected at end of enumeration, read Token[EOL], line 20
The closing bracket } is present and I can't find any other errors with line 20. In fact the error reappears after I've deleted line 20. I've attached a link to the arff with a couple lines of data: Link

I see three issues here. FIRST: In the file you linked to, the last attribute is los_category, which is NUMERIC:
#ATTRIBUTE los_category NUMBERIC
But the last variable in your data line is clearly not numeric (Low).
21,22,165315,4/9/2196__12:26:00_PM,4/10/2196__3:54:00_PM,?,EMERGENCY,EMERGENCY_ROOM_ADMIT,DISC-TRAN_CANCER/CHLDRN_H,Private,?,UNOBTAINABLE,MARRIED,WHITE,4/9/2196__10:06:00_AM,4/9/2196__1:24:00_PM,BENZODIAZEPINE_OVERDOSE,0,1,1.144444444,Low
You've defined with #ATTRIBUTE statements 20 variables (lines 3-22) but in fact your data lines have 21 variables.
SECOND, you have time variables (e.g. admittime) as numeric; but they clearly have non-numeric characters. I know there's a specific format that ARFF files want date/time in, but I'm not an expert in that and can't be definitive about a fix. This is definitely a problem, though. When I create a file with just your first 3 variables, it loads fine. When I add the fourth (#ATTRIBUTE admittime NUMERIC) I get the same error as you report.
THIRD, that line 19 (#ATTRIBUTE diagnosis) is hundreds of characters long. You might want to treat that as a STRING variable type for now, just to be sure you aren't overloading the read buffer with that huge line.

Related

reading in a file from fortran

I am reading in code from a file that looks like this "01/06/2009,Tom Sanders„264,220,73,260" I want to skip to the first comma and then start with the name. However when I read it in I am only getting the dates. I have used a format(T9,'(A)'), but it comes out in columns and is not what I want. How should I approach this problem?

index a text file (lines with different size) in c++

I have to extract information from a text file.
In the text file there is a list of strings.
This is an example of a string: AAA101;2015-01-01 00:00:00;0.784
The value after the last ; is a non integer value, which changes from line to line, so every line has different lenght of characters.
I want to map all of these lines into a structured vector as I can access to a specific line anytime I need without scan the whole file again.
I did some research and I found some threads about a command called, which permit me to reach a specific line of a text file but I read it only works if any line has the same characters lenght of the others.
I was thinking about converting all the lines in the file in a proper format in order to be able to map that file as I want but I hope there is a better and quick way
You can try TStringList*. It creates a list of AnsiStrings. Then each AnsiString can be accessed via ->operator [](numberOfTheLine).

Fortran 90: reading a generic string with enclosed some "/" characters

Hy everybody, I've found some problems in reading unformatted character strings in a simple file. When the first / is found, everything is missed after it.
This is the example of the text I would like to read: after the first 18 character blocks that are fixed (from #Mod to Flow[kW]), there is a list of chemical species' names, that are variables (in this case 5) within the program I'm writing.
#Mod ID Mod Name Type C. #Coll MF[kg/s] Pres.[Pa] Pres.[bar] Temp.[K] Temp.[C] Ent[kJ/kg K] Power[kW] RPM[rad/s] Heat Flow[kW] METHANE ETHANE PROPANE NITROGEN H2O
I would like to skip, after some formal checks, the first 18 blocks, then read the chemical species. To do the former, I created a character array with dimension of 18, each with a length of 20.
character(20), dimension(18) :: chapp
Then I would like to associate the 18 blocks to the character array
read(1,*) (chapp(i),i=1,18)
...but this is the result: from chapp(1) to chapp(7) are saved the right first 7 strings, but this is chapp(8)
chapp(8) = 'MF[kg '
and from here on, everything is leaved blank!
How could I overcome this reading problem?
The problem is due to your using list-directed input (the * as the format). List-directed input is useful for quick and dirty input, but it has its limitations and quirks.
You stumbled across a quirk: A slash (/) in the input terminates assignment of values to the input list for the READ statement. This is exactly the behavior that you described above.
This is not choice of the compiler writer, but is mandated by all relevant Fortran standards.
The solution is to use formatted input. There are several options for this:
If you know that your labels will always be in the same columns, you can use a format string like '(1X,A4,2X,A2,1X,A3,2X)' (this is not complete) to read in the individual labels. This is error-prone, and is also bad if the program that writes out the data changes format for some reason or other, or if the labes are edited by hand.
If you can control the program that writes the label, you can use tab characters to separate the individual labels (and also, later, the labels). Read in the whole line, split it into tab-separated substrings using INDEX and read in the individual fields using an (A) format. Don't use list-directed format, or you will get hit by the / quirk mentioned above. This has the advantage that your labels can also include spaces, and that the data can be imported from/to Excel rather easily. This is what I usually do in such cases.
Otherwise, you can read in the whole line and split on multiple spaces. A bit more complicated than splitting on single tab characters, but it may be the best option if you cannot control the data source. You cannot have labels containing spaces then.

Mergecom Tags not in order (MC_OUT_OF_ORDER_TAG) issue

While using the MC_Open_File API of MERGECOM,
MC_Open_File( applID, msgID, &cbInfo, MediaToFileObj );
The following error occurred. How to resolve this/ overcome this issue?
(5124) 03-09 15:01:10.39 MC3 E: Tags not in ascending order: (0010,0010) found after (696c,6e6f)
(5124) 03-09 15:01:10.39 MC3 W: Error with tag (0010,0010) at byte offset 704 when parsing file
The same file works fine with MC_Stream_To_Message_With_Offset and MC_Stream_To_Message. Since am not aware of the MC_ATT_TRANSFER_SYNTAX_UID am not able to use those two API's.
Kindly help me to overcome this.
MC_Open_File expects that the file you're reading is a DICOM file with a 128 byte preamble, 'DICM' prefix, then the group 0x0002 elements, followed by the dataset itself.
The error you are seeing looks suspiciously like a parsing error when reading the file. The tag number (696c,6e6f) is obvious ASCII characters that it looks like the parser attempted to parse as a DICOM tag.
So it looks like you might have either an invalidly formatted file, or you're trying to read in a file that is not in the DICOM File Format. Note that MergeCOM-3 APIs do not attempt to parse and determine the format of the file (whether the file is a DICOM file or stream), they just assume the format for the function being used. I'd suggest looking alittle deeper at the binary content of the file to determine the format and if you're using the right function to read the file.

Weka: Convert Nominal to Numeric

When I imported a CSvfile in Weka, it reads some numeric variable as Nominal Type. I would like to convert them to Numeric but Im not seeing any option in Weka.
I tried to open the .arff file using Notepad and Notepad++. I remove the variables and change it to numeric
example:
#attribute thours {' ',18,4,48,42,56,35,40,30,14,54,24,36,20,77,25,70,0,16,34,60,64,21,32,6,84,23,31,52,28,50,66,45,12,10,33,11,22,98,8,3,65,72,9,26,15,63,5,27,51,39,105,7,2,58,43,90,68,46,44,47,112,49,91,37,1,41,104,78,96,75,74,62,71,76,89,13,38,19,29,59,92,81,55,57,53,67,80,102,100,17}
to
#attribute thours numeric
and save the file. when i imprted the fiel again, Im getting an error
"...not recognized as an 'Arff data files' file. reason: numebr expected, read Token ], line 78"
Any help is greatly appreciated. Thanks.
Dixi
I believe the reason for your error is that one or more entries of the variable, "thours", is missing. This is represented in the attribute description as the single quotes. If those values are indeed supposed to be missing, you should change it to the format Weka expects in a ".arff", which is a question mark "?".
This link provides a very detailed description of ".arff" files, and what is expected in them.