Regex validation on Account Address field - regex

We sync our Salesforce accounts and opportunities to QuickBooks (QB), but QB has character limits on its fields. Street lines have a 41 character limit per line and I'm trying to have regex control and limit this, but it isn't working on the Address field type. I am using the very simple conditional formula:
REGEX(BillingStreet, '.{42,}')
which matches any non-linebreak character and if it's 42 characters or more, trigger the validation. The problem is that it ignores this rule. I know this formula works because if I apply it to another text field, it works how it's supposed to. Here's an example of how it should work: https://www.regexpal.com/99217. If there's a match anywhere, it should throw the validation error.
Any ideas?

I ended up not using Regex because it doesn't seem to work well. Instead I used formulas to make sure it follows the validation. Since we have a limit of two lines, it wasn't too bad to do this the long way.
AND(
IF(
//Look for a line break. If there is one, split and compare lengths separately.
FIND(MID( $Setup.Global__c.CLRF__c ,3,1), BillingStreet ) > 0
,
IF(
//If the first line is over the limit, return true to trigger validation.
LEN(LEFT(BillingStreet, FIND(MID( $Setup.Global__c.CLRF__c ,3,1), BillingStreet )-2))>41
,
TRUE
,
//If first line is fine, check second line and since this is a condition, it will return true/false automatically.
LEN(MID(BillingStreet, FIND(MID( $Setup.Global__c.CLRF__c ,3,1), BillingStreet )+1,LEN(BillingStreet))) > 41
)
,
//If there is no line break (one line) check the total length.
LEN(BillingStreet) > 41
)
,
//There is a validation for having more than 2 lines. Without this, it will combine lines 2 and above and check that length and will be confusing to users when it's > 41.
NOT(REGEX( BillingStreet , '(.*\r?\n.*){2,}'))
,
//Ignore this rule if the user has this flag active. Useful for bulk updating and don't have to worry about import errors.
$User.Bypass_Validation__c = False
)
This piece MID( $Setup.Global__c.CLRF__c ,3,1) represents a line break in Salesforce. Found out how it works from find line break in formula field. I would have liked to use Regex, but it just doesn't work if you ask me. Except for checking 2+ lines like in the above code.

Related

compare two dictionary, one with list of float value per key, the other one a value per key (python)

I have a query sequence that I blasted online using NCBIWWW.qblast. In my xml blast file result I obtained for a query sequence a list of hit (i.e: gi|). Each hit or gi| have multiple hsp. I made a dictionary my_dict1 where I placed gi| as key and I appended the bit score as value. So multiple values for each key.
my_dict1 = {
gi|1002819492|: [437.702, 384.47, 380.86, 380.86, 362.83],
gi|675820360| : [2617.97, 2614.37, 122.112],
gi|953764029| : [414.258, 318.66, 122.112, 86.158],
gi|675820410| : [450.653, 388.08, 386.27] }
Then I looked for max value in each key using:
for key, value in my_dict1.items():
max_value = max(value)
And made a second dictionary my_dict2:
my_dict2 = {
gi|1002819492|: 437.702,
gi|675820360| : 2617.97,
gi|953764029| : 414.258,
gi|675820410| : 450.653 }
I want to compare both dictionary. So I can extract the hsp with the highest score bits. I am also including other parameters like query coverage and identity percentage (Not shown here). The finality is to get the best gi| with the highest bit scores, coverage and identity percentage.
I tried many things to compare both dictionary like this :
First code :
matches[]
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score
else:
matches = matches[hit_id], bit_score
Second code:
if hit_id not in matches.keys():
matches[hit_id]= bit_score
else:
matches = matches[hit_id], bit_score
Third code:
intersection = set(set(my_dict1.items()) & set(my_dict2.items()))
Howerver I always end up with 2 types of errors:
1 ) TypeError: list indices must be integers, not unicode
2 ) ... float not iterable...
Please I need some help and guidance. Thank you very much in advance for your time. Best regards.
It's not clear what you're trying to do. What is hit_id? What is bit_score? It looks like your second dict is always going to have the same keys as your first if you're creating it by pulling the max value for each key of the first dict.
You say you're trying to compare them, but don't really state what you're actually trying to do. Find those with values under a certain max? Find those with the highest max?
Your first code doesn't work because I'm assuming you're trying to use a dict key value as an index to matches, which you define as a list. That's probably where your first error is coming from, though you haven't given the lines where the error is actually occurring.
See in-code comments below:
# First off, this needs to be a dict.
matches{}
# This will never happen if you've created these dicts as you stated.
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score # Not clear what bit_score is?
else:
# Also not sure what you're trying to do here. This will assign a tuple
# to matches with whatever the value of matches[hit_id] is and bit_score.
matches = matches[hit_id], bit_score
Regardless, we really need more information and the full code to figure out your actual goal and what's going wrong.

Rows not being caught by regular expression

I write my PL SQL code in TOAD V10.6 which is then run on Oracle servers, and I believe is 11g.
Because I am working with client adrs information, I cant actually post any results.
The goal of my program is to parse address data into its correct fields. Its not the whole address thankfully. The pieces of information it does contain are building number, street name, street type, direction, and sub-unit. The information is not always in the same presentation and I have worked my way around that by the sequence in which I parse the information out.
The way I go about parsing the address field
I load the address data into a new table
I delete duplicate rows
I mark key adrs patterns as errors (such as not enough fields since an address needs at least 3 to be valid)
I extract sub-unit which can appear anywhere in the adrs
I extract the direction which can appear anywhere in the adrs
I extract the building number and make sure its only numbers
I check to see if an apartment was hyphened onto the building number
I check to make sure there is still enough information for a valid address as I still need a street type and name
I extract the street type
Whatever remains is considered the street name
I have 27,000 which are being correctly parsed, about 3000 which contain errors and are excluded, and 2200 which are not handled correctly but do not trigger any errors, this is the second last step.
UPDATE TEMP_PARSE_EXIST
SET V_STREET_TYPE = REGEXP_SUBSTR(ADRS, '\w+.$')
WHERE ADT_ACT IS NULL;
UPDATE TEMP_PARSE_EXIST
SET ADT_ACT = 'EMPTY STREET TYPE'
WHERE V_STREET_TYPE IS NULL AND ADT_ACT IS NULL;
I had an almost identical issue before during the parsing of the sub-units. I never figured out what caused it or why moving the regular expression from the where clause to a different part corrected it.
UPDATE TEMP_PARSE_EXIST
SET ADT_ACT = 'PARSE ERROR: TOO MANY S_COM_RES_TYPE '
WHERE ADT_ACT IS NULL AND V_SECOND_LINE IS NULL
AND REGEXP_COUNT(ADRS, '\s' || S_COM_RES_TYPE || '.+\s.+' || S_COM_RES_TYPE , 1, 'i') > 1;
--this looks for a space before and after the sub-unit, then anything between another example
--the space before and after are to prevent STE and FL from being matched with valid street names
--the second one is less strict about that
--if there starts to be an issue then a space before can be added
--however, adding a space after would having it miss cases where there is no space after for the unit number
--the block of code below is suspected of being where the error is happening
--the error in question is where suite is not being noticed and extracted from the adrs line
--however there are many more similar examples being correctly handled
UPDATE TEMP_PARSE_EXIST
SET V_SECOND_LINE = REGEXP_SUBSTR(ADRS, S_COM_RES_TYPE || '(\s?\w+|$)', 1, 1, 'i')
--'(\s\w+|$)' was the original expression, but the ? was added in to account for there not being a space
--so the pattern grabs the sub-unit, and allows for a possible space between it and the number, or allows the end of string as there are some cases of that
WHERE ADT_ACT IS NULL AND V_SECOND_LINE IS NULL AND REGEXP_COUNT(ADRS, S_COM_RES_TYPE, 1, 'i') = 1;
--this removes v_second_line from the adrs
UPDATE TEMP_PARSE_EXIST
SET ADRS = TRIMMER(REPLACE(ADRS, V_SECOND_LINE))
WHERE V_SECOND_LINE IS NOT NULL;
The following code doesnt have the same error as above
UPDATE TEMP_PARSE_EXIST
SET ADT_ACT = 'PARSE ERROR: TOO MANY S_COM_RES_TYPE '
WHERE REGEXP_like(adrs, '\s' || S_COM_RES_TYPE || '\s(|.+)' || S_COM_RES_TYPE , 'i');
--this looks for a space before and after the sub-unit, then anything between another example
--the space before and after are to prevent STE and FL from being matched with valid street names
--which is a common issue if I am not so strict about it
UPDATE TEMP_PARSE_EXIST
SET V_SECOND_LINE = trimmer(REGEXP_substr(adrs, '\s' || S_COM_RES_TYPE || '\s\w+',1,1 ,'i'))
WHERE ADT_ACT IS NULL AND V_SECOND_LINE IS NULL;
--this removes v_second_line from the adrs, this is done for both parts
UPDATE TEMP_PARSE_EXIST
SET ADRS = TRIMMER(REPLACE(ADRS, V_SECOND_LINE))
WHERE V_SECOND_LINE IS NOT NULL;
I havent been able to figure out why this happening.
I am on an irregular project in my area, and the people I work with do not need to use regular expressions and have been unable to help me.
So the question is, why are there valid address's making it past the regular expression?
Update:
Here are examples of adrs which are correctly handled and all pieces are successfully parsed
Full example adrs Dirn Sub-unit number type name
100 Street1 Dr E E 100 Dr Street1
1000 1st Ave Suite 501 Suite 501 1000 Ave 1st
1000 100th St 1000 St 100th
1000 1st Ave N Unit 7 N Unit 7 1000 Ave 1st
Here are examples which are getting past the expression
Full example adrs Dirn Sub-unit number type name
1000 1st Avenue E E 1000 1st Avenue
1000 Street1 Road 1000 Street1 Road
1000 Street2 Street 1000 Street2 Street
1000 Street3 Drive 1000 Street3 Drive
100 1st Avenue S Unit 100 S Unit 100 100 1st Avenue
All the example address listed above were real (I changed the building numbers and names) and come from the same data set. There are no extra characters missing such as whitespace or special characters.
Jorge Campos is kind of correct that this was an XY problem.
The problem end up being a piece of code that I had not included, because it was so simple I didnt think it could be the cause. I have a case statement correcting the abbreviations of the street types to full names, with no else statement. So when a correct name was there, it got nulled out, because there were only correction statements.

IndexError: list index out of range for list of lists in for loop

I've looked at the other questions posted on the site about index error, but I'm still not understanding how to fix my own code. Im a beginner when it comes to Python. Based on the users input, I want to check if that input lies in the fourth position of each line in the list of lists.
Here's the code:
#create a list of lists from the missionPlan.txt
from __future__ import with_statement
listoflists = []
with open("missionPlan.txt", "r") as f:
results = [elem for elem in f.read().split('\n') if elem]
for result in results:
listoflists.append(result.split())
#print(listoflists)
#print(listoflists[2][3])
choice = int(input('Which command would you like to alter: '))
i = 0
for rows in listoflists:
while i < len(listoflists):
if listoflists[i][3]==choice:
print (listoflists[i][0])
i += 1
This is the error I keep getting:
not getting inside the if statement
So, I think this is what you're trying to do - find any line in your "missionPlan.txt" where the 4th word (after splitting on whitespace) matches the number that was input, and print the first word of such lines.
If that is indeed accurate, then perhaps something along this line would be a better approach.
choice = int(input('Which command would you like to alter: '))
allrecords = []
with open("missionPlan.txt", "r") as f:
for line in f:
words = line.split()
allrecords.append(words)
try:
if len(words) > 3 and int(words[3]) == choice:
print words[0]
except ValueError:
pass
Also, if, as your tags suggest, you are using Python 3.x, I'm fairly certain the from __future__ import with_statement isn't particularly necessary...
EDIT: added a couple lines based on comments below. Now in addition to examining every line as it's read, and printing the first field from every line that has a fourth field matching the input, it gathers each line into the allrecords list, split into separate words as a list - corresponding to the original questions listoflists. This will enable further processing on the file later on in the code. Also fixed one glaring mistake - need to split line into words, not f...
Also, to answer your "I cant seem to get inside that if statement" observation - that's because you're comparing a string (listoflists[i][3]) with an integer (choice). The code above addresses both that comparison mismatch and the check for there actually being enough words in a line to do the comparison meaningfully...

Coldfusion - Checking for all lowercase or uppercase

I have been given the daunting task of sifting through a database of over 30,000 registrants and correcting the letter casing of names and addresses where needed. I am trying to write a program that will search for names and addresses in our database that are either all lowercase or all uppercase and output these mishaps in a webpage for me to review and correct more efficiently. I was informed that I could utilize Regular Expressions to find fields that adhere to my criteria, only I am new to programming and I am unfamiliar with the syntax of RegEx.
If anyone could provide me with some pointers as how to use RegEx to query for these inconsistencies, it would be greatly appreciated.
Thank you.
strComp should work
SELECT col
FROM table
WHERE strComp(col, lcase(col), 0) = 0 --all lower case
OR strComp(col, ucase(col), 0) = 0 --all upper case
The first two arguments are the columns to compare. The 3rd argument says to do a binary comparison. If the two strings are equal 0 is returned.
How will you accurately correct the data? If you see a last name of "MACGUYVER" should it change to Macguyver or MacGuyver? If you see a last name of "DE LA HOYA" will it become de la Hoya, De La Hoya, or something else? This task seems a bit dangerous.
If your plan is basically to just do initial capitalization then I suggest that you run an update first before doing any manual review.
You could run something like this to change your name fields to initial capital letters:
update yourTable
set lname = StrConv(lname,3)
where StrComp(lname, StrConv(lname,3), 0) <> 0
and StrComp(mid(lname,2,len(lname)), lcase(mid(lname,2,len(lname))), 0) = 0;
Where "lname" above is your last name column, for example.
The above would have to be run for each name field.
Note that this will not update names that legitimately have multiple capital letters, like MacGuyver or O'Connor, which need manual review.
Also note that it will update last names that start with van, von, de la, and others that may intentionally be lowercase.
You could then query for just the names that need manual review, which I assume will be a much smaller subset:
select *
from yourTable
where StrComp(lname, StrConv(lname,3), 0) <> 0;
Addresses are tougher. To find just those that are either all lowercase or all uppercase you can do this:
select *
from yourTable
where strComp(address1, lcase(address1), 0) = 0;
select *
from yourTable
where strComp(address1, ucase(address1), 0) = 0;
Obviously this won't catch address lines like "123 New YORK AveNUE".
Consider asking for permission to just set all address values to uppercase.
You'll save yourself a lot of trouble.

Format mask for number field items: trailing and 'leading' zero

I'm having some trouble with displaying numbers in apex, but only when i fill them in through code. When numbers are fetched through an automated row fetch, they're fine!
Leading Zero
For example, i have a report where a user can click a link, which runs a javascript function. There i get detailed values for that record through an application process. The returned values are in JSON. Several fields are number fields.
My response looks as follows (fe):
{"AVAILABLE_STOCK": "15818", "WEIGHT": ".001", "VOLUME": ".00009", "BASIC_PRICE": ".06", "COST_PRICE": ".01"}
Already the numbers here 'not correct': values less than one do not have a zero before the .
I kind of hoped that the format mask on the items would catch this. If i specify FM999G990D000 for the item weight, i'd expect it to show '0.001' .
But okay, i suppose it only works that way when it comes through session state, and not when you set an item value through $("#").val() ?
Where do i go wrong? Is my only option to change my select in the app process?
Now:
SELECT '"AVAILABLE_STOCK": "' || AVAILABLE_STOCK ||'", '||
'"WEIGHT": "' || WEIGHT ||'", '||
'"VOLUME": "' || VOLUME ||'", '||
'"BASIC_PRICE": "' || BASIC_PRICE ||'", '||
Do i need to provide my numberfields a to_char with the format mask here (to_char(available_stock, 'FM999G990D000')) ?
Right now i need to put my numbers between quotes ofcourse, or i get invalid json when i parse it.
Trailing Zero
I have an application process on a page on the after header point, right after an automated row fetch. Several fields are calculated here (totals). The variables used are all specified as number(10, 2). All values are correct and rounded to 2 values after the comma. My format masks on the items are also specified as FM999G999G990D00.
However, when one of the calculated values has only one meaningfull value after the comma, the trailing zeros get dropped. Instead of '987.50', it is displayed as '987.5'.
So, i have a number variable, and assign it like this: :P12_NDB_TOTAL_INCL := v_totI;
Would i need to convert my numbers here too, with format mask?
What am i doing wrong, or what am i missing?
If you aren't doing math on it and are more concerned with formatting, I suggest treating it as a varchar/string instead of as a number wherever you can.