Name of day padded with spaces prevent proper behavior - if-statement

I'm trying to go in a if condition in PLSQL when certaint dates are on a specific day in the week.
DECLARE
v_gebDatum CONSTANT DATE := to_date('21-01-1995', 'dd-MM-yyyy');
v_mathDatum DATE := v_gebDatum;
v_huidigeDag VARCHAR2(20);
v_teller number := 0;
BEGIN
WHILE to_char(v_mathDatum, 'yyyy') < to_char(sysdate, 'yyyy')
LOOP
v_mathDatum := add_months(v_mathDatum, 12);
v_huidigeDag := to_char(v_mathDatum, 'DAY');
dbms_output.put_line(v_huidigeDag);
IF v_huidigeDag IN('ZATERDAG', 'ZONDAG')
THEN
dbms_output.put_line(v_mathDatum);
END IF;
END LOOP;
END;
Problem is that I can't get this to work. When printing the v_huidigeDag it clearly has the values 'ZATERDAG and 'ZONDAG' in some of the printed lines.
It appears that, for some reason when I printed the values for daynames the program added spaces in. I guess it takes the char length for the longest day name ('DONDERSDAG') in my Locale. And everything with less characters than that it adds spaces. So zaterdag becomes = 'ZATERDAG ' and zondag becomes = 'ZONDAG '. Now it's working but does somebody know an easy way to prevent this?

The problem is with the DAY format. According to the date format documentation:
DAY
Name of day, padded with blanks to display width of the widest name of day in the date language used for this element.
To disable that behavior, you have to use the FM modifier:
FM
In a datetime format element of a TO_CHAR function, this modifier suppresses blanks in subsequent character elements (such as MONTH) and suppresses leading zeroes for subsequent number elements (such as MI) in a date format model. Without FM, the result of a character element is always right padded with blanks to a fixed length, and leading zeroes are always returned for a number element. With FM, which suppresses blank padding, the length of the return value may vary.
In your particular case:
v_huidigeDag := to_char(v_mathDatum, 'fmDAY');

Related

SAS treatment of blanks interferes with regex rules

I have a dataset which I need to clean using regex rules. These rules come from a file regex_rules.csv with columns string_pattern and string_replace and are applied using a combination of prxparse and prxchange as follows:
array a_rules{1:&NOBS} $200. _temporary_;
array a_rules_parsed{1:&num_rules} _temporary_;
if _n_ = 1 then
do i = 1 to &num_rules;
a_rules{i} = cat("'s/",string_pattern,"/",string_replace,"/'");
a_rules_parsed{i} = prxparse(cats('s/',string_pattern,'/',string_replace,'/','i'));
end
set work.dirty_strings;
clean_string = dirty_string;
do i = 1 to &num_rules;
debug_string = cats("Executing prxchange(",a_rules{i},",",-1,",","'",clean_string,"'",")");
put debug_string;
clean_string = PRXCHANGE(a_rules_parsed{i},-1,clean_string);
end
Some rules specify replacing certain patterns with a single blank space, so the corresponding string_replace value in the file is a single blank space.
The issue I'm facing is that SAS never respects the single space, and instead replaces the matched string_pattern for these records with an empty string (the other rules are applied as expected).
To troubleshoot I executed the following:
proc sql;
create table work.single_blanks as
select
string_pattern,
string_replace,
from work.regex_rules
where string_replace = " ";
quit;
which yielded the expected records. I was confused to find that changing the where clause to
where string_replace = "" or
where string_replace = " " gave identical results! (I've been using sas for a while but I guess this behavior has gone unnoticed until now). Consequently, I could not determine whether SAS is neglecting to properly read in the file and retain the single blank, or whether one of the prx functions is failing to properly handle the single blank.
I can think of "hacky" work-arounds, but I'd rather understand what I'm doing wrong here and what the correct solution should be.
EDIT 1:
Here is a rule from the file and how I'd expect it to act on an example input value:
string_pattern, string_replace
"(#|,|/|')", " "
running the code above on the input string dirty_string = "10,120 DIRTY DRIVE"; does not produce the expected output of "10 120 DIRTY DRIVE" but rather "10120 DIRTY DRIVE".
EDIT 2
In addition to not respecting single spaces, leading and trailing spaces do not seem to be respected. For example, for a file with the rules
string_pattern, string_replace
"\\bDR(\\.|\\b)", "DRIVE "
"\\bS(\\.|\\b)?W(\\.|\\b)", " SOUTH WEST"
running the code above on the input string dirty_string = "10120 DIRTY DR.SW."; does not produce the expected output of "10120 DIRTY DRIVE SOUTH WEST" but rather "10120 DIRTY DRIVESW.". This is because the space at the end of the first string_replace value gets lost, meaning there is no word boundary at the beginning of the second string_pattern to be matched.
SAS stores character variables as fixed length strings that are padded with spaces. As a consequence string comparisons ignore trailing spaces. So x=' ' and x=' ' are the same test.
The CATS() will remove all of the leading and trailing spaces, so empty strings will generate nothing at all. It sounds like you want to treat an empty string as a single space. The TRIM() function will return a single space for an empty string. So perhaps you just want to change this:
cats('s/',string_pattern,'/',string_replace,'/','i')
into
cat('s/',trim(string_pattern),'/',trim(string_replace),'/','i')
Here is a working code (with a fixed string_pattern) of your example data:
data test;
length string_pattern string_replace dirty_string expect
clean_string regex $200
;
infile cards dsd truncover;
input string_pattern string_replace dirty_string expect;
regex= cat('s/',trim(string_pattern),'/',trim(string_replace),'/i') ;
regex_id = prxparse(trim(regex));
clean_string = prxchange(regex_id,-1,trim(dirty_string));
if clean_string=expect then put 'GOOD'; else put 'BAD';
*put (_character_) (=$quote./);
cards4;
"(#|,|\/|')", " ","10,120 DIRTY DRIVE","10 120 DIRTY DRIVE"
;;;;
If any of your values have significant trailing spaces then you will need to store the data differently. You could for example quote the values:
string_replace = "'DRIVE '";
...
cat('s/',dequote(string_pattern),'/',dequote(string_replace),'/','i')
If you only add quotes around values that need them then you will need to include the TRIM() function calls.
cat('s/',dequote(trim(string_pattern)),'/',dequote(trim(string_replace)),'/','i')
Or store the string lengths into separate numeric fields.
cat('s/',substrn(string_pattern,1,len1),'/',substrn(string_replace,1,len2),'/','i')
And note that if any of your original character strings had either significant leading or trailing spaces they would have been eliminated by reading the data from a CSV file.

SAS trim function/ Length statement

Why does this code not need two trim statements, one for first and one for last name? Does the length statement remove blanks?
data work.maillist; set cert.maillist;
length FullName $ 40;
fullname=trim(firstname)||' '||lastname;
run;
length is a declarative statement and introduces a variable to the Program Data Vector (PDV) with the specific length you specify. When an undeclared variable is used in a formula SAS will assign it a default length depending on the formula or usage context.
Character variables in SAS have a fixed length and are padded with spaces on the right. That is why the trim(firstname) is needed when || lastname concatenation occurs. If it wasn't, the right padding of firstname would be part of the value in the concatenation operations, and might likely exceed the length of the variable receiving the result.
There are concatenation functions that can simplify string operations
CAT same as using <var>|| operator
CATT same as using trim(<var>)||
CATS same as using trim(left(<var>))||
CATX same as using CATS with a delimiter.
STRIP same as trim(left(<var>))
Your expression could be re-coded as:
fullname = catx(' ', firstname, lastname);
Is there a reason you think it should? Can you see trailing spaces in the surname, have you tried a length() function?
I could be wrong here but sometimes when you apply a function (put especially) or import data you can inadvertently store leading or trailing spaces. Trailing spaces are a mystery because you don't realise they are there until you try to do something else with the data.
A length statement should allow you to store exactly the data you give it providing you use a number/character variable correctly with truncation only occurring if the length value is too short.
I've found the
compress() function to be the most convenient for dealing with white space and punctuation particularly if you are concatenating variables.
https://www.geeksforgeeks.org/sas-compress-function-with-examples/
All the best,
Phil
Because SAS will truncate the value when it is too long to fit into FULLNAME. And when it is too short it will fill in the rest of FULLNAME with spaces anyway so there is no need to remove them.
It would only be an issue if the length of FULLNAME is smaller than the sum of the lengths of FIRSTNAME and LASTNAME plus one. Otherwise the result cannot be too long to fit into FULLNAME, even if there are no trailing spaces in either FIRSTNAME or LASTNAME.
Try it yourself with non-blank values so it is easier to see what is happening.
1865 data test;
1866 length one $1 two $2 three $3 ;
1867 one = 'ABCD';
1868 two = 'ABCD';
1869 three='ABCD';
1870 put (_all_) (=);
1871 run;
one=A two=AB three=ABC
NOTE: The data set WORK.TEST has 1 observations and 3 variables.

Is there a function for removing dashes from a string of numbers, without removing leading zeros?

I would like to remove dashes from 3 to 9 digit numbers. A certain percentage of those numbers have leading zeros in them. I tried using the Compress function, but this stripped the zeros as well. What would be the best function to use?
I understand your "numbers" are actually codes with digits and dashes and you want to keep only the digits, so what you need is string processing.
The compress function in SAS has a second (optional) parameter. If you don't specify it, the function will remove all white space characters. If you do, it will remove the characters specified. So try
no_dash = compress(with_dash, '-');
Alternatively you could remove all non digit characters, using a third (also optional) parameter
no_dash = compress(with_dash, '0123456789', 'k');
The k specifies to keep instead of remove the characters specified. You can shorten this by adding the d to the third parameter, telling SAS to add all digits to the second:
no_dash = compress(with_dash, '', 'dk');
If you have stored the compressed result (with implicit conversion) in a numeric variable, that variable may need a format to get the result you want.
data _null_;
my_dashed_text = '000-90-123';
my_compressed_text = compress(my_dashed_text, '-');
attrib my_num_var
length = 8
format = z9.
;
my_num_var = compress(my_dashed_text, '-');
put (_all_) (=/);
run;
------ LOG -----
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
36:16
my_dashed_text=000-90-123
my_compressed_text=00090123
my_num_var=000090123
The Z numeric format tells SAS to add leading zeros that fill out to the specified width when displaying the number. The format is a fixed width, so a my_num_var from both "123-456" and "0-1-2-3-45-6" will display a Z9 formatted value of 000123456. SAS formatting can't make a number value look like 123456 or 0123456 when rendered through a single format specification (such a Z9)

Extract numbers from a field in PostgreSQL

I have a table with a column po_number of type varchar in Postgres 8.4. It stores alphanumeric values with some special characters. I want to ignore the characters [/alpha/?/$/encoding/.] and check if the column contains a number or not. If its a number then it needs to typecast as number or else pass null, as my output field po_number_new is a number field.
Below is the example:
SQL Fiddle.
I tired this statement:
select
(case when regexp_replace(po_number,'[^\w],.-+\?/','') then po_number::numeric
else null
end) as po_number_new from test
But I got an error for explicit cast:
Simply:
SELECT NULLIF(regexp_replace(po_number, '\D','','g'), '')::numeric AS result
FROM tbl;
\D being the class shorthand for "not a digit".
And you need the 4th parameter 'g' (for "globally") to replace all occurrences.
Details in the manual.
For a known, limited set of characters to replace, plain string manipulation functions like replace() or translate() are substantially cheaper. Regular expressions are just more versatile, and we want to eliminate everything but digits in this case. Related:
Regex remove all occurrences of multiple characters in a string
PostgreSQL SELECT only alpha characters on a row
Is there a regexp_replace equivalent for postgresql 7.4?
But why Postgres 8.4? Consider upgrading to a modern version.
Consider pitfalls for outdated versions:
Order varchar string as numeric
WARNING: nonstandard use of escape in a string literal
I think you want something like this:
select (case when regexp_replace(po_number, '[^\w],.-+\?/', '') ~ '^[0-9]+$'
then regexp_replace(po_number, '[^\w],.-+\?/', '')::numeric
end) as po_number_new
from test;
That is, you need to do the conversion on the string after replacement.
Note: This assumes that the "number" is just a string of digits.
The logic I would use to determine if the po_number field contains numeric digits is that its length should decrease when attempting to remove numeric digits.
If so, then all non numeric digits ([^\d]) should be removed from the po_number column. Otherwise, NULL should be returned.
select case when char_length(regexp_replace(po_number, '\d', '', 'g')) < char_length(po_number)
then regexp_replace(po_number, '[^0-9]', '', 'g')
else null
end as po_number_new
from test
If you want to extract floating numbers try to use this:
SELECT NULLIF(regexp_replace(po_number, '[^\.\d]','','g'), '')::numeric AS result FROM tbl;
It's the same as Erwin Brandstetter answer but with different expression:
[^...] - match any character except a list of excluded characters, put the excluded charaters instead of ...
\. - point character (also you can change it to , char)
\d - digit character
Since version 12 - that's 2 years + 4 months ago at the time of writing (but after the last edit that I can see on the accepted answer), you could use a GENERATED FIELD to do this quite easily on a one-time basis rather than having to calculate it each time you wish to SELECT a new po_number.
Furthermore, you can use the TRANSLATE function to extract your digits which is less expensive than the REGEXP_REPLACE solution proposed by #ErwinBrandstetter!
I would do this as follows (all of the code below is available on the fiddle here):
CREATE TABLE s
(
num TEXT,
new_num INTEGER GENERATED ALWAYS AS
(NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER) STORED
);
You can add to the 'ABCDEFG... string in the TRANSLATE function as appropriate - I have decimal point (.) and a space ( ) at the end - you may wish to have more characters there depending on your input!
And checking:
INSERT INTO s VALUES ('2'), (''), (NULL), (' ');
INSERT INTO t VALUES ('2'), (''), (NULL), (' ');
SELECT * FROM s;
SELECT * FROM t;
Result (same for both):
num new_num
2 2
NULL
NULL
NULL
So, I wanted to check how efficient my solution was, so I ran the following test inserting 10,000 records into both tables s and t as follows (from here):
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
INSERT INTO t
with symbols(characters) as
(
VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
)
select string_agg(substr(characters, (random() * length(characters) + 1) :: INTEGER, 1), '')
from symbols
join generate_series(1,10) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;
The differences weren't that huge but the regex solution was consistently slower by about 25% - even changing the order of the tables undergoing the INSERTs.
However, where the TRANSLATE solution really shines is when doing a "raw" SELECT as follows:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER
FROM s;
and the same for the REGEXP_REPLACE solution.
The differences were very marked, the TRANSLATE taking approx. 25% of the time of the other function. Finally, in the interests of fairness, I also did this for both tables:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
num, new_num
FROM t;
Both extremely quick and identical!

Regular expressions, what a trouble!

I need your kind help to resolve this question.
I state that I am not able to use regolar expressions with Oracle PL/SQL, but I promise that I'll study them ASAP!
Please suppose you have a table with a column called MY_COLUMN of type VARCHAR2(4000).
This colums is populated as follows:
Description of first no.;00123457;Description of 2nd number;91399399119;Third Descr.;13456
You can see that the strings are composed by couple of numbers (which may begin with zero), and strings (containing all alphanumeric characters, and also dot, ', /, \, and so on):
Description1;Number1;Description2;Number2;Description3;Number3;......;DescriptionN;NumberN
Of course, N is not known, this means that the number of couples for every record can vary from record to record.
In every couple the first element is always the number (which may begin with zero, I repeat), and the second element is the string.
The field separator is ALWAYS semicolon (;).
I would like to transform the numbers as follows:
00123457 ===> 001-23457
91399399119 ===> 913-99399119
13456 ===> 134-56
This means, after the first three digits of the number, I need to put a dash "-"
How can I achieve this using regular expressions?
Thank you in advance for your kind cooperation!
I don't know Oracle/PL/SQL, but I can provide a regex:
([[:digit:]]{3})([[:digit:]]+)
matches a number of at least four digits and remembers the first three separately from the rest.
RegexBuddy constructs the following code snippet from this:
DECLARE
result VARCHAR2(255);
BEGIN
result := REGEXP_REPLACE(subject, '([[:digit:]]{3})([[:digit:]]+)', '\1-\2', 1, 0, 'c');
END;
If you need to make sure that those numbers are always directly surrounded by ;, you can alter this slightly:
(^|;)([[:digit:]]{3})([[:digit:]]+)(;|$)
However, this will not work if two numbers can directly follow each other (12345;67890 will only match the first number). If that's not a problem, use
result := REGEXP_REPLACE(subject, '(^|;)([[:digit:]]{3})([[:digit:]]+)(;|$)', '\1\2-\3\4', 1, 0, 'c');