How to create effective forest models(RF-PCT) using Clus - weka
I try to use Clus and my training data to create a Random Forest-Predictive Clustering Tree model. Why it always return me a 0 model no mater which training set I use?
The model is constructed using Clus
Here are the result model:
Default Model
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] [77037.0,74642.0,72382.0,75705.0,78171.0,59793.0,78443.0,75270.0,80386.0,75289.0,80333.0,77827.0,72847.0,79762.0,77135.0,58949.0,80452.0,51354.0,72968.0,69297.0,78558.0,78889.0,75170.0,80212.0,77237.0,80300.0,79275.0,79068.0]: 80653
Original Model
Forest with 0 models
Here are parts of my data:
#attribute Sci-Fi {0,1}
#attribute Crime {0,1}
#attribute Romance {0,1}
#attribute Animation {0,1}
#attribute Music {0,1}
#attribute Comedy {0,1}
#attribute War {0,1}
#attribute Horror {0,1}
#attribute Film-Noir {0,1}
#attribute Adventure {0,1}
#attribute News {0,1}
#attribute Western {0,1}
#attribute Thriller {0,1}
#attribute Adult {0,1}
#attribute Mystery {0,1}
#attribute Short {0,1}
#attribute Talk-Show {0,1}
#attribute Drama {0,1}
#attribute Action {0,1}
#attribute Documentary {0,1}
#attribute Musical {0,1}
#attribute History {0,1}
#attribute Family {0,1}
#attribute Reality-TV {0,1}
#attribute Fantasy {0,1}
#attribute Game-Show {0,1}
#attribute Sport {0,1}
#attribute Biography {0,1}
#attribute V1 numeric
#attribute V2 numeric
#attribute V3 numeric
#attribute V4 numeric
#attribute V5 numeric
...
#attribute V501 numeric
#data
0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,4,0,0,0,0,0,0,14,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,3,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,4,0,0,0,0,0,0,5,0,0,0,0,0,0,15,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,1,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,3,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,27,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,2,0,3,1,0,0,0,0,0,0,1,1,0,1,0,1,0,0,2,0,3,0,0,0,0,0,0,0,0,1,0,0,0,0,4,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,4,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,6,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,7,2,0,0,2,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,1,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,3,0,0,0,0,0,0,11,7,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,10,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,4,0,0,0,0,0,0,0,0,2,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,5,0,0,0,0,0,4,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,2,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,7,5,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
Related
RegEx to get the alphabets or alphabets including the last digit with space
I have following strings: Hyderabad RTC K1 1991-1998 Hyderabad RTC KK 1876-1897 Al Test K5 1876-9876 So, I want to get only the first part other than numbers from the above strings like: Hyderabad RTC K1 Hyderabad RTC KK Al Test K5 I have tried this ^[a-z]*([0-9] )?$ RegEx thinking that there will be any no of characters followed by a number and space or not.
You could use a capturing group to match any number of words ending with LetterNumber or both Letter followed by the digits-digits part: ^([A-Za-z]+\d*(?: [A-Za-z]+\d*)+) \d+-\d+$ Regex demo Without the necessity to match the digits separated by a hyphen at the end, you can omit the capture group and the last part: ^[A-Za-z]+\d*(?: [A-Za-z]+\d*)* Regex demo
RegEx for passing punctuation
I am using: (.*) CO\s?[\(.*\)|\[.*\]|\{.*\}|''.*''|".*"](.*) to represent 3M CO 'A'(MINNESOTA MINING AND MANUFACTURING COMPANY). However, the first Single quotation mark cannot be covered by the regex code. Could you please tell me why? s/(.*) CO\s?[\(.*\)|\[.*\]|\{.*\}|''.*''|".*"](.*)/$1 CO $2 I expect to get: 3M CO 'A'(MINNESOTA MINING AND MANUFACTURING COMPANY) but I get 3M CO A'(MINNESOTA MINING AND MANUFACTURING COMPANY)
I'm guessing that here we wish to design an expression and match our inputs, part by part, such as: (.+?)\s+CO\s+(['"].+?['"])([(\[{]).+?([)\]}]) We have added extra boundaries, which can be reduced, if not desired. We are having three main capturing groups: (.+?) # anything before Co; (['"].+?['"]) # the quotation part; and ([(\[{]).+?([)\]}]) # inside various brackets included those, which we can escape, if required. RegEx Circuit jex.im visualizes regular expressions: DEMO Demo This snippet just shows that how the capturing groups work: const regex = /(.+?)\s+CO\s+(['"].+?['"])([(\[{]).+?([)\]}])/mg; const str = `3M CO 'A'(MINNESOTA MINING AND MANUFACTURING COMPANY) 3M CO 'A'[MINNESOTA MINING AND MANUFACTURING COMPANY] 3M CO 'A'{MINNESOTA MINING AND MANUFACTURING COMPANY} 3M CO "A"{MINNESOTA MINING AND MANUFACTURING COMPANY}`; let m; while ((m = regex.exec(str)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } // The result can be accessed through the `m`-variable. m.forEach((match, groupIndex) => { console.log(`Found match, group ${groupIndex}: ${match}`); }); } RegEx If this expression wasn't desired, it can be modified/changed in regex101.com.
Your regex should be expressed to /(.*)\sCO\s?(\(.+\).*|".+".*|'.+'.*|{.+}.*|\[.+\].*)/ (.*) First capture group will capture starting group ("3M" in your example) \sCO\s Then looks for a whitespace followed by CO followed by a whitespace (".+".* etc.) Second capture group that looks for starting quote or bracket followed by at least one character of anything followed by closing quote, then followed by any number of any character Why Original Regex Didn't Work In the original regex, [\(.*\)|\[.*\]|\{.*\}|''.*''|".*"] can be simplified into [''.*''] (for the string you provided). I realize that for other strings, you might want to look for (.*) or [.*] or {.*} or ".*", but for the "3M" string, only the [''.*''] is relevant so we'll just look at this. So [''.*''] just means: match any character in the list inside [], in any order. In this case, there are three unique characters in the list: ', . and * (although you did repeat ' 3 times). So it matched the first '. But since this match is outside your capture group (), this first ' is not included in your capture group response. So the next match with (.*) matches everything else that comes after the first ' and includes them in the second matching group, i.e. A'(MINNESOTA MINING AND MANUFACTURING COMPANY) without the ' in front. Does that make sense? Demo If you wanted to ensure the format includes 'A' or [A] or "A" or {A} or (A), then this is what you want: let regex = /(.*)\sCO\s?(\(.+\)|".+".*|'.+'.*|{.+}.*|\[.+\].*)/; [pattern, match1, match2] = "3M CO 'A'(MINNESOTA MINING AND MANUFACTURING COMPANY)".match(regex); console.log(match1 + " CO " + match2); //3M CO 'A'(MINNESOTA MINING AND MANUFACTURING COMPANY) [pattern, match1, match2] = '3M CO (A)(MINNESOTA MINING AND MANUFACTURING COMPANY)'.match(regex); console.log(match1 + " CO " + match2); //3M CO (A)(MINNESOTA MINING AND MANUFACTURING COMPANY) [pattern, match1, match2] = '3M CO "A"(MINNESOTA MINING AND MANUFACTURING COMPANY)'.match(regex); console.log(match1 + " CO " + match2); //3M CO "A"(MINNESOTA MINING AND MANUFACTURING COMPANY) [pattern, match1, match2] = "3M CO [A](MINNESOTA MINING AND MANUFACTURING COMPANY)".match(regex); console.log(match1 + " CO " + match2); //3M CO [A](MINNESOTA MINING AND MANUFACTURING COMPANY) [pattern, match1, match2] = "3M CO {A}(MINNESOTA MINING AND MANUFACTURING COMPANY)".match(regex); console.log(match1 + " CO " + match2); //3M CO {A}(MINNESOTA MINING AND MANUFACTURING COMPANY)
The ' does not match because in the second capturing group because you use a character class which can be written as CO\s?[(.*)|[\]{}'"] and then it will match CO ' So your pattern actually looks like: (.*) CO\s?[.*()|[\]{}'"](.*) ^ ^ ^ group 1 Char class group 2 What you might do to get those matching in 2 groups is to use: (.*?)CO\s?((?:(['"]).*?\3|\(.*?\)|\[.*?\]|\{.*?\}).*) Explanation (.*?) Capturing group 1, match any char except newline non greedy CO\s? Match CO and optional whitespace char ( Capturing group 2 (?: Non capturing group, match any of the options (['"]).*?\3 Match ' or " and use a backreference to what is captured | Or \(.*?\) Match (....) | Or \[.*?\] Match [....] | Or \{.*?\} Match {....} ) Close non capturing group .* Match any char until the end of the string ) Close group 2 Regex demo Note that the .*? is non greedy to prevent unnecessary backtracking and over matching.
How to select the last 3 words from a string in PL\SQL?
I have a column with data like these: Professor Dr. Eigen Foster Criminalist Student Natalie Portman Journalist Victor Morgan Dentist Swiss Based Dr. M. Muriel Bayes Jorunalist What I want to see is: Eigen Foster Criminalist Natalie Portman Journalist Victor Morgan Dentist Muriel Bayes Jorunalist I know I should do it somehow with regexp_substr, but I dont know how to use the $ for the starting position. So there could different number of words in a string, but i always need the last 3 words.
Try this regex [^ ]+ [^ ]+ [^ ]+$ SELECT regexp_substr('Professor Dr. Eigen Foster Criminalist', '[^ ]+ [^ ]+ [^ ]+$')FROM dual; $ - means end of the string [^ ]+ - means at least one character but no spaces
Find and replace spaces by commas in multiple text files after a certain keyword
I have a folder with text files that have this structure: #ATTRIBUTE dynamics_rms_Mean NUMERIC #ATTRIBUTE dynamics_rms_Std NUMERIC #ATTRIBUTE dynamics_rms_Slope NUMERIC #ATTRIBUTE dynamics_rms_PeriodFreq NUMERIC #ATTRIBUTE dynamics_rms_PeriodAmp NUMERIC #ATTRIBUTE dynamics_rms_PeriodEntropy NUMERIC #DATA 9.749956e-01 2.257056e-01 1.667380e-01 NaN NaN 9.706193e-01 I want to use Sublime Text to find and replace for all these text files the "space" characters BELOW #DATA, but preserving the spaces in the lines starting with #ATTRIBUTE. I have found a way to select all the spaces in the text file, but I don't know how to select only the ones after #DATA and replace them in every file (using ctrl+shift+f). I would appreciate any suggestions on how to solve this, thanks in advance.
You can use this search/replace: search: (?:\G(?!^)|^#DATA\R)[^ ]+\K[ ] replace: , pattern details: (?: # non-capturing group \G # anchor for the position of the last match (?!^) # not at the start of a line | # OR ^ # start of a line #DATA \R # any newline sequence ) [^ ]+ # one or more characters that are not a space \K # discards all on the left from the match result [ ] # a space To deal with several consecutive lines of datas and to stop at the first blank line, you need to change the pattern to: (?:\G(?!^)|^#DATA\R)[^\n ]+(?:\R[^\n ]+)?\K[ ]
You can use the following regex replacement (please note there is a literal space at the end of the pattern): (?:^#DATA|\G(?!^))\K([\s\S]*?) (?!$) And replace with ,$1 And after replacement:
Regex for number up to 16 digits with optional decimal any place within number
I'm trying to find the correct regex pattern to match any number up to 16 digits max, with an optional decimal point anywhere in the number. Here are some examples. Valid: 9999999999999999 0.000000000000001 3.24 1.2 0.00003 Invalid: 12345678910111213 59.492.5
based on the comment above that .0 0000000000 and 0000003 are not valid, use this pattern ^(?!0\d|\.|.*?\..*?\.)(?=(?:\.?\d){1,16}$)(.*)$ Demo
i assumed that you don't want to allow .number (ie, the below regex won't match the numbers like .67, .08), ^(?:(?=.{3,17}$)\d+\.\d+|\d{1,16})$ DEMO