I'm working to programmatically clean up a field in my dataset by using a Helper column that I will later filter on and remove the 'junk' records. The junk records are ID's, and the valid records are full names (in the format of "Tom Jones"). Almost all (there is a valid name value of "University") junk records do not contain a space. The pseudo code would read
Set Helper_IsName? = True
WHERE ValueField CONTAINS " " unless ValueField = "University"
ELSE False
Here is the M code excerpt that is getting me 95% of the way there:
Helper_IsName? = Text.Contains([OldValue]," ")
All results are good, except when the formula reads "University", it sets the value as FALSE, when I need it to equal TRUE.
I think you can just add that condition with an or:
Helper_IsName? = Text.Contains([OldValue]," ") or [OldValue] = "University"
Related
Goal: I have a bunch of keywords I'd like to categorise automatically based on topic parameters I set. Categories that match must be in the same column so the keyword data can be filtered.
e.g. If I have "Puppies" as a first topic, it shouldn't appear as a secondary or third topic otherwise the data cannot be filtered as needed.
Example Data: https://docs.google.com/spreadsheets/d/1TWYepApOtWDlwoTP8zkaflD7AoxD_LZ4PxssSpFlrWQ/edit?usp=sharing
Video: https://drive.google.com/file/d/11T5hhyestKRY4GpuwC7RF6tx-xQudNok/view?usp=sharing
Parameters Tab: I will add words in columns D-F that change based on the keyword data set and there will often be hundreds, if not thousands, of options for larger data sets.
Categories Tab: I'd like to have a formula or script that goes down the columns D-F in Parameters and fills in a corresponding value (in Categories! columns D-F respectively) based on partial match with column B or C (makes no difference to me if there's a delimiter like a space or not. Final data sheet should only have one of these columns though).
Things I've Tried:
I've tried a bunch of things. Nested IF formula with regexmatch works but seems clunky.
e.g. this formula in Categories! column D
=IF(REGEXMATCH($B2,LOWER(Parameters!$D$3)),Parameters!$D$3,IF(REGEXMATCH($B2,LOWER(Parameters!$D$4)),Parameters!$D$4,""))
I nested more statements changing out to the next cell in Parameters!D column (as in , manually adding $D$5, $D$6 etc) but this seems inefficient for a list thousands of words long. e.g. third topic will get very long once all dog breed types are added.
Any tips?
Functionality I haven't worked out:
if a string in Categories B or C contains more than one topic in the parameters I set out, is there a way I can have the first 2 to show instead of just the first one?
e.g. Cell A14 in Categories, how can I get a formula/automation to add both "Akita" & "German Shepherd" into the third topic? Concatenation with a CHAR(10) to add to new line is ideal format here. There will be other keywords that won't have both in there in which case these values will just show up individually.
Since this data set has a bunch of mixed breeds and all breeds are added as a third topic, it would be great to differentiate interest in mixes vs pure breeds without confusion.
Any ideas will be greatly appreciated! Also, I'm open to variations in layout and functionality of the spreadsheet in case you have a more creative solution. I just care about efficiently automating a tedious task!!
Try using custom function:
To create custom function:
1.Create or open a spreadsheet in Google Sheets.
2.Select the menu item Tools > Script editor.
3.Delete any code in the script editor and copy and paste the code below into the script editor.
4.At the top, click Save save.
To use custom function:
1.Click the cell where you want to use the function.
2.Type an equals sign (=) followed by the function name and any input value — for example, =DOUBLE(A1) — and press Enter.
3.The cell will momentarily display Loading..., then return the result.
Code:
function matchTopic(p, str) {
var params = p.flat(); //Convert 2d array into 1d
var buildRegex = params.map(i => '(' + i + ')').join('|'); //convert array into series of capturing groups. Example (Dog)|(Puppies)
var regex = new RegExp(buildRegex,"gi");
var results = str.match(regex);
if(results){
// The for loops below will convert the first character of each word to Uppercase
for(var i = 0 ; i < results.length ; i++){
var words = results[i].split(" ");
for (let j = 0; j < words.length; j++) {
words[j] = words[j][0].toUpperCase() + words[j].substr(1);
}
results[i] = words.join(" ");
}
return results.join(","); //return with comma separator
}else{
return ""; //return blank if result is null
}
}
Example Usage:
Parameters:
First Topic:
Second Topic:
Third Topic:
Reference:
Custom Functions
I've added a new sheet ("Erik Help") with separate formulas (highlighted in green currently) for each of your keyword columns. They are each essentially the same except for specific column references, so I'll include only the "First Topic" formula here:
=ArrayFormula({"First Topic";IF(A2:A="",,IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) & IFERROR(CHAR(10)®EXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))))})
This formula first creates the header (which can be changed within the formula itself as you like).
The opening IF condition leaves any row in the results column blank if the corresponding cell in Column A of that row is also blank.
JOIN is used to form a concatenated string of all keywords separated by the pipe symbol, which REGEXEXTRACT interprets as OR.
IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) will attempt to extract any of the keywords from each concatenated string in Columns B and C. If none is found, IFERROR will return null.
Then a second-round attempt is made:
& IFERROR(CHAR(10)®EXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>"")))))
Only this time, REGEXREPLACE is used to replace the results of the first round with null, thus eliminating them from being found in round two. This will cause any second listing from the JOIN clause to be found, if one exists. Otherwise, IFERROR again returns null for round two.
CHAR(10) is the new-line character.
I've written each of the three formulas to return up to two results for each keyword column. If that is not your intention for "First Topic" and "Second Topic" (i.e., if you only wanted a maximum of one result for each of those columns), just select and delete the entire round-two portion of the formula shown above from the formula in each of those columns.
Background
I have a list of "bad words" in a file called bad_words.conf, which reads as follows
(I've changed it so that it's clean for the sake of this post but in real-life they are expletives);
wrote (some )?rubbish
swore
I have a user input field which is cleaned and striped of dangerous characters before being passed as data to the following script, score.py
(for the sake of this example I've just typed in the value for data)
import re
data = 'I wrote some rubbish and swore too'
# Get list of bad words
bad_words = open("bad_words.conf", 'r')
lines = bad_words.read().split('\n')
combine = "(" + ")|(".join(lines) + ")"
#set score incase no results
score = 0
#search for bad words
if re.search(combine, data):
#add one for a hit
score += 1
#show me the score
print(str(score))
bad_words.close()
Now this finds a result and adds a score of 1, as expected, without a loop.
Question
I need to adapt this script so that I can add 1 to the score every time a line of "bad_words.conf" is found within text.
So in the instance above, data = 'I wrote some rubbish and swore too' I would like to actually score a total of 2.
1 for "wrote some rubbish" and +1 for "swore".
Thanks for the help!
Changing combine to just:
combine = "|".join(lines)
And using re.findall():
In [33]: re.findall(combine,data)
Out[33]: ['rubbish', 'swore']
The problem with having the multiple capturing groups as you originally were doing is that re.findall() will return each additional one of those as an empty string when one of the words is matched.
Can I select a certain row/column combination in coldfusion without doing a query of queries? For example:
Some Query:
ValueToFind | ValueToReturn
String 1 | false
String 2 | false
String 3 | true
Can I somehow do #SomeQuery["ValueToFind=String 3"][ValueToReturn]# = true without doing a query of queries ? I know there's code out there to get a certain row by id, but I'm not sure how or if I can do it when I need a string as the ID
If this can't be done, is there a short hand way to set up a coldfusion function so I can use something like FindValue(Query, "String 3") and not have to use ?
You can treat a query column as an array.
yourRow = ArrayFind(queryName['columnName'], "'the value you seek'");
If you get a zero, the value you seekis not there.
Edit starts here:
For values of other columns in that row, simply use that variable.
yourOtherValue = queryName.otherColumnName[yourRow];
A small modification to Dan's code, you can find the column value using the code below
yourVaue = SomeQuery["ValueToReturn"][ArrayFind(SomeQuery['ValueToFind'], "String 3")]
Universe dynamic array relative operation (using -1) works in a strange way.
Operation below does not add new element in position <1,1,5> as I expected instead adds '1,1,5' to DYNAMIC.ARRAY<1,1,1>.
DYNAMIC.ARRAY = ' '
DYNAMIC.ARRAY<1,-1,5> = '1,1,5' ; *Adds to 1,1,1 not 1,1,5 when DYNAMIC.ARRAY contains only whitespaces before this operation
However same operation works as expected if dynamic array contains non empty value. Final result after executing code below will be DYNAMIC.ARRAY<1,1,1> = '1,1,1' and DYNAMIC.ARRAY<1,2,5> = '1,2,5'.
DYNAMIC.ARRAY = ' '
DYNAMIC.ARRAY<-1> = '1,1,1'
DYNAMIC.ARRAY<1,-1,5> = '1,2,5' ; *Adds to right position 1,2,5 when DYNAMIC.ARRAY is initialised to non empty value before this operation
Is this an expected behaviour in Universe?
When you use -1 it should be on the deepest level of nesting value.
The way multivalued fields work, what you want to do doesn't really make sense.
Say your record is is a reflection of things that customer bought, your dictionary might be something like
D1: CustomerName
D2: OrderNumber
D3: PartNumber
#ID 1234
0001:John Doe
0002:72832#VM83782#VM84783
0003:232-A#SVM2394-R#SVM3321-B#VM232-F#VM2342
CustomerName is a Single valued field. This is associated with entire record.
OrderNumber is a Value delimited list of Orders associated with at customer. In the SQL world this would be a child table.
PartNumber is a SubValue delimited list of Parts that is associated with each order. In the SQL world this would be a Child table of the Order Child Table.
Framing the logic like this, it really doesn't make any sense to say that you want to assign the 5th item on the next order the customer buys to be part "12345678" because you haven't got an order to associate with a part yet.
I believe there are some dictionary directives that you might be able to use to bypass this, but generally just know that it is bad form to create a sub-valued field without establishing an associated value first. When you start ignoring this you have to start validating for empty strings at ever turn. Down this road lies madness.
Hope that helps.
To summarize, you can't add a specific Sub Value to an unknown value. You have to first determine which value you want the subvalue mark to be in and then specify the subvalue.
From your code snippet
DYNAMIC.ARRAY = ' '
DYNAMIC.ARRAY<1,-1,5> = '1,1,5' ;* Adds to 1,1,1 not 1,1,5 when DYNAMIC.ARRAY contains only whitespaces before this operation
There are a number of things to be aware of
1) The white spaces have no bearing on what happens, as long as there are no reserved characters in the ASCII string (#FM,#AM,#VM,#SVM characters) the result will be the same.
2) The '-1' option should always be the last option and putting it in the second last parameter position will not work.
What you are trying to achive can be performed in many different ways
DYNAMIC.ARRAY<1,-1> = #SVM:#SVM:#SVM:#SVM:"1,1,5" ;* appends subvalued string as last value
or
DYNAMIC.ARRAY<1,-1> = STR(#SVM,4):"1,1,5" ;* appends subvalued string as last value
or
TEMP = "" ;* needs to be initialised
TEMP<1,1,5> = "1,1,5" ;* puts string in 5th subvalue position
DYNAMIC.ARRAY<1,-1> = TEMP ;* appends TEMP string as last value
or
VAL.POS = DCOUNT(DYNAMIC.ARRAY<1,1>,#VM) ;* find next value position
DYNAMIC.ARRAY<1,VAL.POS,5> = "1,1,5" ;* insert string into subvalue 5 of value
The '-1' use in inserting data into a dynamic array is a special notation, with its own rules. Using -1 essentially means "insert after last attribute, value or sub-value" (depending on where you have the -1 in your expression).
In your first example:
DYNAMIC.ARRAY = ' '
DYNAMIC.ARRAY<1,-1,5> = '1,1,5'
You are saying put the string '1,1,5' in the first attribute, AFTER THE LAST MULTIVALUE, as the 5th sub-value.
I would expect this to place the string '1,1,5' in position <1,2,5> because the -1 in the 'value' position says "put after last value" and because your initial array value was a single string of spaces, you already have something in array location <1,1,1> so the -1 causes a new value position to be added, and the 5 defines the subvalue position. So, result is a value placed into <1,2,5>
In your second example:
DYNAMIC.ARRAY = ' '
DYNAMIC.ARRAY<-1> = '1,1,1'
DYNAMIC.ARRAY<1,-1,5> = '1,2,5'
You start with the first line setting the array to a single attribute containing a string of spaces. The next line (with the <-1>) is saying "add a new attribute with the value '1,1,1'" which means you now have an array with 2 attributes. The third line (with the <1,-1,5>) means insert the string '1,2,5' in the first attribute, AFTER THE LAST VALUE, as the 5th sub-value, so I would again expect the result to be the string '1,2,5' to be in <1,2,5>
My comments are based on what I'd expect to see using R83 Pick, you do not say what version or 'account flavour' of UniVerse you are using, so perhaps that is part of the issue here.
It may be that the initial array of whitespace is being seen as an 'empty/null attribute' by UniVerse. I assume if you change the whitespace value in your first example to say 'ABC' then it all works as expected?
I'm having some trouble with displaying numbers in apex, but only when i fill them in through code. When numbers are fetched through an automated row fetch, they're fine!
Leading Zero
For example, i have a report where a user can click a link, which runs a javascript function. There i get detailed values for that record through an application process. The returned values are in JSON. Several fields are number fields.
My response looks as follows (fe):
{"AVAILABLE_STOCK": "15818", "WEIGHT": ".001", "VOLUME": ".00009", "BASIC_PRICE": ".06", "COST_PRICE": ".01"}
Already the numbers here 'not correct': values less than one do not have a zero before the .
I kind of hoped that the format mask on the items would catch this. If i specify FM999G990D000 for the item weight, i'd expect it to show '0.001' .
But okay, i suppose it only works that way when it comes through session state, and not when you set an item value through $("#").val() ?
Where do i go wrong? Is my only option to change my select in the app process?
Now:
SELECT '"AVAILABLE_STOCK": "' || AVAILABLE_STOCK ||'", '||
'"WEIGHT": "' || WEIGHT ||'", '||
'"VOLUME": "' || VOLUME ||'", '||
'"BASIC_PRICE": "' || BASIC_PRICE ||'", '||
Do i need to provide my numberfields a to_char with the format mask here (to_char(available_stock, 'FM999G990D000')) ?
Right now i need to put my numbers between quotes ofcourse, or i get invalid json when i parse it.
Trailing Zero
I have an application process on a page on the after header point, right after an automated row fetch. Several fields are calculated here (totals). The variables used are all specified as number(10, 2). All values are correct and rounded to 2 values after the comma. My format masks on the items are also specified as FM999G999G990D00.
However, when one of the calculated values has only one meaningfull value after the comma, the trailing zeros get dropped. Instead of '987.50', it is displayed as '987.5'.
So, i have a number variable, and assign it like this: :P12_NDB_TOTAL_INCL := v_totI;
Would i need to convert my numbers here too, with format mask?
What am i doing wrong, or what am i missing?
If you aren't doing math on it and are more concerned with formatting, I suggest treating it as a varchar/string instead of as a number wherever you can.