regexp_similar '^.$' issues in teradata - regex

For data scrubbing I have lot of hard coded values in my program. I am trying to put those values into a table. One of the conditions for this scrubbing is to find the length of the character and code (character_length(name) = 1).
But when I try to emulate the this by using ^.$, it is not catching values like ¿, ¥, Ã
please let me know if I am doing something wrong .
When I run below code and I see this 3 values ¿, ¥, Ã
select name from email_table
where character_length(name) = 1
and name not in
(select name from email_table
where regexp_similar(translate(name USING LATIN_TO_UNICODE WITH ERROR),'^.$', 'i') = 1)

It seems like the issue is due to version.
We have TD14 and TD 15 on different servers and I did following query
select case when regexp_similar('¥','^.$', 'i')=1
then 'Y'
else 'N'
end as output;
In case of TD 14, I get output as 'N' and in case of TD 15 answer is 'Y'.

Related

Truncation when using CASE in SQL statement in SAS (Enterprise Guide)

I am trying to manipulate some text files in SAS Enterprise Guide and load them line by line in a character variable "text" which gets the length 1677 characters.
I can use the Tranwrd() function to create a new variable text21 on this variable and get the desired result as shown below.
But if I try to put some conditions on the execution of exactly the same Tranwrd() to form the variable text2 (as shown below) it goes wrong as the text in the variable is now truncated to around 200 characters, even though the text2 variable has the length 1800 characters:
PROC SQL;
CREATE TABLE WORK.Area_Z_Added AS
SELECT t1.Area,
t1.pedArea,
t1.Text,
/* text21 */
( tranwrd(t1.Text,'zOffset="0"',compress('zOffset="'||put(t2.Z,8.2)||'"'))) LENGTH=1800 AS text21,
/* text2 */
(case when t1.type='Area' then
tranwrd(t1.Text,'zOffset="0"',compress('zOffset="'||put(t2.Z,8.2)||'"'))
else
t1.Text
end) LENGTH=1800 AS text2,
t1.Type,
t1.id,
t1.x,
t1.y,
t2.Z
FROM WORK.VISSIM_IND t1
LEFT JOIN WORK.AREA_Z t2 ON (t1.Type = t2.Type) AND (t1.Area = t2.Area)
ORDER BY t1.id;
QUIT;
Anybody got a clue?
This is a known problem with using character functions inside a CASE statement. See this thread on SAS Communities https://communities.sas.com/t5/SAS-Programming/Truncation-when-using-CASE-in-SQL-statement/m-p/852137#M336855
Just use the already calculated result in the other variable instead by using the CALCULATED keyword.
CREATE TABLE WORK.Area_Z_Added AS
SELECT
t1.Area
,t1.pedArea
,t1.Text
,(tranwrd(t1.Text,'zOffset="0"',cats('zOffset="',put(t2.Z,8.2),'"')))
AS text21 length=1800
,(case when t1.type='Area'
then calculated text21
else t1.Text
end) AS text2 LENGTH=1800
,t1.Type
,t1.id
,t1.x
,t1.y
,t2.Z
FROM WORK.VISSIM_IND t1
LEFT JOIN WORK.AREA_Z t2
ON (t1.Type = t2.Type)
AND (t1.Area = t2.Area)
ORDER BY t1.id
;
If you don't need the extra TEXT21 variable then use the DROP= dataset option to remove it.
CREATE TABLE WORK.Area_Z_Added(drop=text21) AS ....

giving a string variable values conditional on another variable

I am using Stata 14. I have US states and corresponding regions as integer.
I want create a string variable that represents the region for each observation.
Currently my code is
gen div_name = "A"
replace div_name = "New England" if div_no == 1
replace div_name = "Middle Atlantic" if div_no == 2
.
.
replace div_name = "Pacific" if div_no == 9
..so it is a really long code.
I was wondering if there is a shorter way to do this where I can automate assigning values rather than manually hard coding them.
You can define value labels in one line with label define and then use decode to create the string variable. See the help for those commands.
If the correspondence was defined in a separate dataset you could use merge. See e.g. this FAQ
There can't be a short-cut here other than typing all the names at some point or exploiting the fact that someone else typed them earlier into a file.
With nine or so labels, typing them yourself is quickest.
Note that you type one statement more than you need, even doing it the long way, as you could start
gen div_name = "New England" if div_no == 1

Coldfusion - Checking for all lowercase or uppercase

I have been given the daunting task of sifting through a database of over 30,000 registrants and correcting the letter casing of names and addresses where needed. I am trying to write a program that will search for names and addresses in our database that are either all lowercase or all uppercase and output these mishaps in a webpage for me to review and correct more efficiently. I was informed that I could utilize Regular Expressions to find fields that adhere to my criteria, only I am new to programming and I am unfamiliar with the syntax of RegEx.
If anyone could provide me with some pointers as how to use RegEx to query for these inconsistencies, it would be greatly appreciated.
Thank you.
strComp should work
SELECT col
FROM table
WHERE strComp(col, lcase(col), 0) = 0 --all lower case
OR strComp(col, ucase(col), 0) = 0 --all upper case
The first two arguments are the columns to compare. The 3rd argument says to do a binary comparison. If the two strings are equal 0 is returned.
How will you accurately correct the data? If you see a last name of "MACGUYVER" should it change to Macguyver or MacGuyver? If you see a last name of "DE LA HOYA" will it become de la Hoya, De La Hoya, or something else? This task seems a bit dangerous.
If your plan is basically to just do initial capitalization then I suggest that you run an update first before doing any manual review.
You could run something like this to change your name fields to initial capital letters:
update yourTable
set lname = StrConv(lname,3)
where StrComp(lname, StrConv(lname,3), 0) <> 0
and StrComp(mid(lname,2,len(lname)), lcase(mid(lname,2,len(lname))), 0) = 0;
Where "lname" above is your last name column, for example.
The above would have to be run for each name field.
Note that this will not update names that legitimately have multiple capital letters, like MacGuyver or O'Connor, which need manual review.
Also note that it will update last names that start with van, von, de la, and others that may intentionally be lowercase.
You could then query for just the names that need manual review, which I assume will be a much smaller subset:
select *
from yourTable
where StrComp(lname, StrConv(lname,3), 0) <> 0;
Addresses are tougher. To find just those that are either all lowercase or all uppercase you can do this:
select *
from yourTable
where strComp(address1, lcase(address1), 0) = 0;
select *
from yourTable
where strComp(address1, ucase(address1), 0) = 0;
Obviously this won't catch address lines like "123 New YORK AveNUE".
Consider asking for permission to just set all address values to uppercase.
You'll save yourself a lot of trouble.

Access 2010 Query add text to end of existing text if condition is met

I have a column of data, diagnosis codes to be exact. the problem is that when the data is imported it turns 111.0 into 111 (or any whole number). I am wondering if there is an update query I can run that will add the ".0" to the end of any value that is 3 characters long. I had a problem of it stripping a value from 008.45 to 8.45 but I figured that part out using:
UPDATE Master SET DIAGNOSIS01 = LEFT("00", 3-LEN(DIAGNOSIS01)) + DIAGNOSIS01
WHERE LEN(DIAGNOSIS01)<3 AND Len(DIAGNOSIS01)>0;
I got that from here on stackoverflow. Is there a variation of this update query I can use to add to the right if it's only 3 digits?
Additional info... formats of the values in this column include xxx.x or xxx.xx with x being a number
When it comes to sql I am very new so please treat me like I'm 3... ;)
UPDATE Master
SET Master.DIAGNOSIS01 = IIf(Len([Master].[DIAGNOSIS01])=3,[Master].[DIAGNOSIS01] & ".0",[Master].[DIAGNOSIS01]);

Stata: Efficient way to replace numerical values with string values

I have code that currently looks like this:
replace fname = "JACK" if id==103
replace lname = "MARTIN" if id==103
replace fname = "MICHAEL" if id==104
replace lname = "JOHNSON" if id==104
And it goes on for multiple pages like this, replacing an ID name with a first and last name string. I was wondering if there is a more efficient way to do this en masse, perhaps by using the recode command?
I will echo the other answers that suggest a merge is the best way to do this.
But if you absolutely must code the lines item-wise (again, messy) you can generate a long list ("pages") of replace commands by using MS Excel to "help" you write the code. Here is a picture of your Excel sheet with one example, showing the MS Excel formula:
columns:
A B C D
row: 1 last first id code
2 MARTIN JACK 103 ="replace fname=^"&B2&"^ if id=="&C2
You type that in, make sure it looks like Stata code when the formula calculates (aside from the carets), and copy the formula in column D down to the end of your list. Then copy the whole block of Stata code in column D generated by the formulas into your do-file, and do a find and replace (be careful here if you are using the caret elsewhere for mathematical uses!!) for all ^ to be replaced with ", which will end up generating proper Stata syntax.
(This is truly a brute force way of doing this, and is less dynamic in the case that there are subsequent changes to your generation list. All--apologies in advance for answering a question here advocating use of Excel :) )
You don't explain where the strings you want to add come from, but what is generally the best technique is explained at
http://www.stata.com/support/faqs/data-management/group-characteristics-for-subsets/index.html
Create an associative array of ids vs Fname,Lname
103 => JACK,MARTIN
104 => MICHAEL,JOHNSON
...
Replace
id => hash{id} ( fname & lname )
The efficiency of doing this will be taken care by the programming language used