Split attribute labels with delimiter for processing - weka

I opened a csv file in Weka 3.8 and selected an attribute/column (picture below). The labels are delimited by a pipe character. There should be 23 distinct labels but Weka displays 914. Thus, Weka cannot visualize for too many values. Action is one label, adventure is another one, etc. Basically there can be more than one label per row.
For processing (eg. classification), How can separate those values so Weka can read them?
This question is similar to this. But the question asks about the date attribute (eg. "dd-MM-yyyy HH:mm"). This asks about a character-separated value (eg. "Action|Adventure|Drama")
Edit:
The data is taken from kaggle.

Ah, I had run into this problem too.
Firstly, ensure that the Genres attribute is recognised as a String type. If you are only using the GUI, go to Open File... and open the file (I presume it's a .dat file. If you've renamed it to .csv hit the check box which says "Invoke options dialog").
In the Generic Object Editor window, enter the index of the Genres attribute (here it's last).
Doing that will cause the attribute to look like this in the GUI.
Now choose the filter called StringToWordVector (weka.filters.unsupervised.attribute.StringToWordVector). Now under the Editor window, find the Tokenizer entry, click on its field, and under delimeters remove the defaults and add the pipe character. You may optionally edit the attribute prefix field as well.
Hit apply and find the required genres added in as numeric attributes, set to 0 for cases where the genre was not present in the original string, 1 otherwise.
StringToWordVector is a pretty useful filter, and there's much more to it in the docs: http://weka.sourceforge.net/doc.dev/weka/filters/unsupervised/attribute/StringToWordVector.html.

Related

How to save the result of feature selection in Weka?

I’m trying to use InfoGainAttributeEval in Weka for feature selection, how to save the result? I try to save it but seems like my weka just save my input data, not the result of feature selection.
Welcome to SO. As far as I understand you want to get the ranked values of the attributes. To do this, right click on the "Ranker + InfoGainAttributeEval" statement in the "Result List" section. Select "Save result buffer". You can see the results in programs such as notepad. You can also import it into "Excel" and create it in the chart. I think you selected "Ranker" in the Search Method section. I think it is an image as seen in the figure below.
After selecting and running "InfoGainAttributeEval" and "Ranker" it will give you a "ranked" list (Use full training set). Right click and select "Save Reduced Data" then save. Open the file in notepad as well. Open in Weka too. Select the ones whose Rank value is 0 in Weka and delete them with "Remove". Let those with rank value be left. Now you can get the same result reduced with these features. Save in .arff format. Now you have acquired Reduced data.
If "Save Reduced Data" is not working for you, here is another approach.
Attribute selection can be accomplished in the Preprocess tab.
There is a bar near the top for Filtering the data. Click the
"Choose" button. Under Filters->Supervised->Attribute you will
find AttributeSelection. Select that.
Once it says "AttributeSelection" in the Filter bar, you can click
on the bar to pick a selection method and a search method as well as
set the parameters for those choices.
Once you have made your choices for the feature selection algorithm,
click Apply to the right of the filter bar so that the filter is
actually applied to the data. The data should now have the reduced
feature set. So all you need to do is save it by clicking on the
Save button at the top right.
This should save the reduced data set.

Sales list force change of column in lines

I'm using the page below a POS sales list. Here the user can use the barcode pistol and pass the article and the code is translated into the item no.
The problem is when they use the pistol and end to pick a item and want to pass to next one the line go automatically to the first column (Item type) and my goal was to force to go into the second column (Item no), because the Item type is by default the type "product".
Only change the order of columns of Item no to Item product is not enough in this case.
Since ACTIVATE is not supported for controls in RTC.
Not many good options here.
Try using QuickEntry Property. Set it to false for all controls on subpage except No..
Create custom page with as less fields as possible, use it as buffer to scan all items and create sales lines upon closing of this new page. You can implement desired behavior on this page and keep original page almost unmodified
Create add-in that will intercept scanner output somehow.

Exporting from pgadmin reads line breaks in field cells and creates unreadable Excel

I'm new to this, so I am sure it is a silly question, but I have read through every question related on the site and can't find anything!
I am exporting from pgadmin. A few of the columns have line breaks within the cells, so the exported data is very choppy. Does anyone know how to fix this? Is there a way to make it so the line breaks within cells are not read?
I know I am doing the right settings for exporting, but basically what happens is that the header names are there, along with one row of content for each column and then Column A will have 20 more rows beneath it because of line breaks from the first cell in column E.
Any help would be much appreciated!
I assume that you're referring to the Query --> Execute to file command in the Query window. I don't think it's a bug that pgAdmin doesn't escape line breaks within strings in its csv output, but Excel can read it correctly anyway.
In the export options, please make sure that you use commas as column separators and double quotes as quote chars. Here are my settings:
Additionally, when you load your CSV into Excel, please don't use Data -> From Text. This one doesn't parse CSV with line breaks correctly. Just open the file directly in Excel (via Open within Excel, or by right clicking it in Windows Explorer and choosing Open With -> Microsoft Excel).

SP 2013 - Quick edit with Managed Meta Data columns, copy and paste from excel

I'm trying to migrate a meta data from an excel spreadsheet to a SP 2013 document library. The columns are managed meta data columns with pre defined terms matching the data in the excel spreadsheet.
However I cannot copy and paste data from excel via Quick Edit in the doucment library without getting the following error "The data returned from the tagging UI was not formatted correctly"
This happens even when I remove all formatting or paste to notepad first.
Are there any simple solutions to this issue?
http://i.imgur.com/1bqpMPA.jpg
Thanks,
Any metadata fields are in fact foreign keys, as it were, to a dynamic, hidden table (or 'list', whatever you want to call it) within SharePoint. To paste a value into a metadata column, you need to know your element's guid (as in, within the term set) and then append that to each metadata element you're pasting in as a <name>|<guid> pair.
Getting the GUID for an element within your term set
Browse to [site-root]/TaxonomyHiddenList/AllItems.aspx and create a new view (or edit the default one) to display the field 'IdForTerm'.
Where you have a term 'apple', your IdForTerm may look like '1288beaf-82e0-4d81-b9de-ad5ad8382938'. Take a note of the guid for each term which appears within your input data.
Edit your input to correctly reference each term
Let's say you're importing your data from an Excel spreadsheet. Or from a CSV. It doesn't really matter. What you need to do is, basically, a find and replace down each managed metadata column, replacing 'term' with 'term|guid'. So our example from earlier, with the apple, would become 'apple|1288beaf-82e0-4d81-b9de-ad5ad8382938'.
Finally, assuming your view is set up in exactly the same order as your input data, you should be able to 'edit list' from within the browser, hit the leftmost side of your first input row (to select the entire row) and CTRL+V all of your data at the same time.
Note there appears to be a limit to the number of entries you can make at the same time. It appears to sit at around 5,000 elements.
Adding on to #rmacd's answer, you can also get the GUID for a given MMS term by first manually entering the value(s) you need in a Quick Edit cell, then copy and paste the same value(s) from SharePoint to Excel. The pasted value will appear with the full term|guid that you need to complete the bulk copy/paste.

How to determine if a given word or phrase from a list is within an anchor tag?

We have a ColdFusion based site that involves a large number of 'document authors' that have little or no knowledge of HTML. The 'documents' they create are comprised of HTML stored in a table in the database. They use a CKEDITOR interface. The content that they create is output into specific area of the page. The document frequently has tons of technical terms that readers may not be familiar with that we would like to have tooltips automatically show up for.
I and the other programmer want to have some code insert 'tooltip' code into the page based on a list of words in a table on our SQL server. The 'dictionary' table in our database has a unique ID, the word/phrase we will look for and a corresponding definition that would be displayed in the tooltip.
For instance, one of the word/phrases we will be looking for is 'Scrum Master'. If it occurs in the document area, we need to insert code around the words to create a tooltip. To do that, we need to see if certain conditions exist. Are the words within an anchor tag? If yes, is there already a title value for the tag (title is used to contain the info to be displayed in a tooltip)? If a title tag exists, don't do anything. If the words are not in an anchor tag, then we would put anchor tags around the words along with the title that will contain the definition.
The tooltip code we use is via jQuery (http://jqueryui.com/tooltip/). It is quick and simple to use. We just need to figure out how to use it dynamically based on our dictionary table.
Do you have any suggestions of how to go about this?
I was hoping that jSoup might have a function that I could use, but that doesn't seem to be the right technology for what I want to do, but I could be wrong and I am happy to be corrected!
We have a large number of these documents and so manually inserting and maintaining the tooltip code is just not an option.
Update you content with something like:
strOut = ReplaceList(strIn, ValueList(qryTT.find), ValueList(qryTT.replace));
Since words are delimited by spaces, the qryTT.find needs to have spaces. The replace column is going to need to include some of the original content. You are going to have to be careful with words followed by a comma or a period too.
I would cache the results because I would expect it to be memory intensive.