I want to find ascii strings in oracle query which have symbols more than chr(127)
I see a lot of suggestions that '['||chr(128)||'-'||chr(255)||']' must work, but it doesn't
so next must return OK, but it doesn't
select 'OK' as result from dual where regexp_like('why Ä ?', '['||chr(128)||'-'||chr(255)||']')
and next must not return OK, but it does
select 'OK' as result from dual where regexp_like('why - ?', '['||chr(128)||'-'||chr(255)||']')
UPD: Sorry, capital A umlaut in my case is \xC4 (ISO 8859 Latin 1) , but here it turns into unicode chr(50052)
How about a different approach? Split string into characters and check whether maximum value is higher than 127.
For example:
SQL> with test (col) as
2 (select 'why Ä ?' from dual)
3 select substr(col, level, 1) one_character,
4 ascii(substr(col, level, 1)) ascii_of_one_character
5 from test
6 connect by level <= length(col);
ONE_ ASCII_OF_ONE_CHARACTER
---- ----------------------
w 119
h 104
y 121
32
Ä 50621 --> here it is!
32
? 63
7 rows selected.
SQL>
Now, move it into a subquery and fetch the result:
SQL> with test (col) as
2 (select 'why Ä ?' from dual)
3 select case when max(ascii_of_one_character) > 127 then 'OK'
4 else 'Not OK'
5 end result
6 from (select substr(col, level, 1) one_character,
7 ascii(substr(col, level, 1)) ascii_of_one_character
8 from test
9 connect by level <= length(col)
10 );
RESULT
------
OK
Or:
SQL> with test (col) as
2 (select 'why - ?' from dual)
3 select case when max(ascii_of_one_character) > 127 then 'OK'
4 else 'Not OK'
5 end result
6 from (select substr(col, level, 1) one_character,
7 ascii(substr(col, level, 1)) ascii_of_one_character
8 from test
9 connect by level <= length(col)
10 );
RESULT
------
Not OK
Millions of rows? Well, even for two rows queries I posted wouldn't work properly. Switch to
SQL> with test (col) as
2 (select 'why - ?' from dual union all
3 select 'why Ä ?' from dual
4 )
5 select col,
6 case when max(ascii_of_one_character) > 127 then 'OK'
7 else 'Not OK'
8 end result
9 from (select col,
10 substr(col, column_value, 1) one_character,
11 ascii(substr(col, column_value, 1)) ascii_of_one_character
12 from test cross join table(cast(multiset(select level from dual
13 connect by level <= length(col)
14 ) as sys.odcinumberlist))
15 )
16 group by col;
COL RESULT
-------- ------
why - ? Not OK
why Ä ? OK
SQL>
How will it behave? I don't know, try it and tell us. Note that for large data sets regular expressions might actually be slower than a simple substr option.
Yet another option: how about TRANSLATE? You don't have to split anything in that case. For example:
SQL> with test (col) as
2 (select 'why - ?' from dual union all
3 select 'why Ä ?' from dual
4 )
5 select col,
6 case when nvl(length(res), 0) > 0 then 'OK'
7 else 'Not OK'
8 end result
9 from (select col,
10 translate
11 (col,
12 '!"#$%&''()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ',
13 '!') res
14 from test
15 );
COL RESULT
-------- ------
why - ? Not OK
why Ä ? OK
SQL>
There is also another approach:
with t(str) as (
select 'why Ä ?' from dual union all
select 'why - ?' from dual union all
select 'why - ? Ä' from dual union all
select 'why' from dual
)
select
str,
case
when regexp_like(str, '[^'||chr(1)||'-'||chr(127)||']')
then 'Ok'
else 'Not ok'
end as res,
xmlcast(
xmlquery(
'count(string-to-codepoints(.)[. > 127])'
passing t.str
returning content)
as int) cnt_over_127
from t;
Results:
STR RES CNT_OVER_127
---------- ------ ------------
why Ä ? Ok 1
why - ? Not ok 0
why - ? Ä Ok 1
why Not ok 0
As you can see I've used xmlquery() with string-to-codepoints xpath function, then filtered out codepoints >127 and returned their count().
Also you can use dump or utl_raw.cast_to_raw() functions, but it's a bit more complex and I'm a bit lazy to write full solutions using them.
But just small draft:
with t(str) as (
select 'why Ä ?' from dual union all
select 'why - ?' from dual union all
select 'why - ? Ä' from dual union all
select 'why' from dual
)
select
str,
case
when regexp_like(str, '[^'||chr(1)||'-'||chr(127)||']')
then 'Ok'
else 'Not ok'
end as res,
dump(str,1016) dmp,
dump(str,1015) dmp,
utl_raw.cast_to_raw(str) as_row,
regexp_count(dump(str,1016)||',', '[89a-f][0-9a-f],') xs
from t;
Results:
STR RES DMP DMP AS_ROW XS
---------- ------ ------------------------------------------------------------------- ----------------------------------------------------------------------- -------------------- --
why Ä ? Ok Typ=1 Len=8 CharacterSet=AL32UTF8: 77,68,79,20,c3,84,20,3f Typ=1 Len=8 CharacterSet=AL32UTF8: 119,104,121,32,195,132,32,63 77687920C384203F 2
why - ? Not ok Typ=1 Len=7 CharacterSet=AL32UTF8: 77,68,79,20,2d,20,3f Typ=1 Len=7 CharacterSet=AL32UTF8: 119,104,121,32,45,32,63 776879202D203F 0
why - ? Ä Ok Typ=1 Len=10 CharacterSet=AL32UTF8: 77,68,79,20,2d,20,3f,20,c3,84 Typ=1 Len=10 CharacterSet=AL32UTF8: 119,104,121,32,45,32,63,32,195,132 776879202D203F20C384 2
why Not ok Typ=1 Len=3 CharacterSet=AL32UTF8: 77,68,79 Typ=1 Len=3 CharacterSet=AL32UTF8: 119,104,121 776879 0
Note: as that is unicode, so the first byte >127 means that is a multibyte character, so it counts 'Ä' twice - c3,84, - both bytes are higher than 127.
Don't know why you want to use codepoints instead of character sets, but you can invert the logic - use not 1-127 - [^1-127] :
DBFiddle
select 'OK' as result
from dual
where regexp_like('why Ä ?', '[^'||chr(1)||'-'||chr(127)||']');
select regexp_substr('why Ä ?', '[^'||chr(1)||'-'||chr(127)||']') x from dual;
And do not forget that some characters can be special characters like ] or even non-printable
I have a Redshift database with the following entries:
table name = subscribers
time_at
calc_subscribers
calc_unsubscribers
current_subscribers
2021-07-02 07:30:00
0
0
0
2021-07-02 07:45:00
39
8
0
2021-07-02 08:00:00
69
17
0
2021-07-02 08:15:00
67
21
0
2021-07-02 08:30:00
48
23
0
The goal is to calculate current_subscribers with the previous value.
current_subscribers = calc_subscribers - calc_unsubscribers + previous_current_subscribers
I do the following:
UPDATE subscribers sa
SET current_subscribers = COALESCE( sa.calc_subscribers - sa.calc_unsubscribers + sub.previous_current_subscribers,0)
FROM (
SELECT
time_at,
LAG(current_subscribers, 1) OVER
(ORDER BY time_at desc) previous_current_subscribers
FROM subscribers
) sub
WHERE sa.time_at = sub.time_at
The problem is that in the sub query "sub" a table is generated that is based on the current values in the table, and thus previous_current_subscribers is always 0. Instead of going through this row by row. So the result is: current_subscribers = calc_subscribers - calc_unsubscribers + 0 I have also already tried it with CTE, unfortunately without success:
The result should look like this:
time_at
calc_subscribers
calc_unsubscribers
current_subscribers
2021-07-02 07:30:00
0
0
0
2021-07-02 07:45:00
39
8
31
2021-07-02 08:00:00
69
17
83
2021-07-02 08:15:00
67
21
129
2021-07-02 08:30:00
48
95
82
I am grateful for any ideas.
The problem you are running into is that you want to use the result of one row in the calculation of the current row. This is recursive which I think you can do in this case but is expensive.
The result you are looking for is the sum of all calc_subscribers for this row and previous rows minus the sum of all calc_unsubscribers for this row and previous rows. This is the difference between 2 window functions - sum over.
sum(calc_subscribers) over (order by time_at desc rows unbounded preceding) - sum(calc_unsubscribers) over (order by time_at desc rows unbounded preceding) as current_subscribers
I wanted to see if this was doable in SAS. I have a dataset of the members of congress and want to split full name into first and last. However, occasionally they seem to list their middle initial or name. It is from a .txt file.
Norton, Eleanor Holmes [D-DC] 16 0 440 288 0
Cohen, Steve [D-TN] 15 0 320 209 0
Schakowsky, Janice D. [D-IL] 6 0 289 186 0
McGovern, James P. [D-MA] 8 1 252 139 0
Clarke, Yvette D. [D-NY] 7 0 248 166 0
Moore, Gwen [D-WI] 2 3 244 157 1
Hastings, Alcee L. [D-FL] 13 1 235 146 0
Raskin, Jamie [D-MD] 8 1 232 136 0
Grijalva, Raul M. [D-AZ] 9 1 228 143 0
Khanna, Ro [D-CA] 4 0 223 150 0
Good day,
SAS is a bit clunky when it comes to Strings. However it can be done. As other have mentioned, it's the logic defined, which is the really hard part.
Begin with some raw data...
data begin;
input raw_str $ 1-100;
cards;
Norton, Eleanor Holmes [D-DC] 16 0 440 288 0
Cohen, Steve [D-TN] 15 0 320 209 0
Schakowsky, Janice D. [D-IL] 6 0 289 186 0
McGovern, James P. [D-MA] 8 1 252 139 0
Clarke, Yvette D. [D-NY] 7 0 248 166 0
Moore, Gwen [D-WI] 2 3 244 157 1
Hastings, Alcee L. [D-FL] 13 1 235 146 0
Raskin, Jamie [D-MD] 8 1 232 136 0
Grijalva, Raul M. [D-AZ] 9 1 228 143 0
Khanna, Ro [D-CA] 4 0 223 150 0
; run;
first I select the leading names till the first bracket.
count the number of strings
data names;
set begin;
names_only = scan(raw_str,1,'[');
Nr_of_str = countw(names_only,' ');
run;
Assumption: First sting is the last name.
If there are only 2 strings, the first and last are pretty easy with scan and substring:
data names2;
set names;
if Nr_of_str = 2 then do;
last_name = scan(names_only, 1, ' ');
_FirstBlank = find(names_only, ' ');
first_name = strip(substr(names_only, _FirstBlank));
end;
run;
Assumption: there are only 3 strings.
approach 1. Middle name has dot in it. Filter it out.
approach 2. Middle name is shorter than real name:
data names3;
set names2;
if Nr_of_str > 2 then do;
last_name = scan(names_only, 1, ' '); /*this should still hold*/
_FirstBlank = find(names_only, ' '); /*Substring approach */
first_name = strip(substr(names_only, _FirstBlank));
second_str = scan(names_only, 2, ' ');
third_str = scan(names_only, 3, ' ');
if find(second_str,'.') = 0 then /*1st approch */
first_name = scan(names_only, 2, ' ');
else
first_name = scan(names_only, 3, ' ');
if len(second_str) > len(second_str) then /*2nd approch */
first_name = second_str;
else
first_name = third_str;
end;
run;
For more see about subsring and scan:
I have a text file with 26 columns and have to filter rows ONLY if column 22 has the value of '0', '1', '2' or '3' (out of 0-5).
If Column 22 has value '0','1', '2' or '3' (highlighted below), then remove anything less than 10 and greater 100 (based on column 5) AND remove anything less than 5.0 (this column has decimals) (based on column 13).
I am not sure how to insert the condition of do the following only if column 22 has the following values of 1 to 3 and retain the other rows (ie that has values 4 and 5) as it is in the output file
awk -F "\t" 'NR==1; NR>1 {if ($5 > 10 && $5<100 && $13>5.0) print $0}' input.txt > output.txt
My input file is as follows
Column1 ... Column5 ... Column13 .. Column22
ID1 a1 5 0 5
ID2 a2 10 1.2 0
ID3 a3 4 5.6 1
ID4 a4 300 2.6 2
ID5 a5 40 32 0
ID6 a6 200 4.6 3
ID7 a7 200 4.5 5
ID8 a8 3456 4.9 4
and my desired output is
Column1 Column5 Column13 Column22
ID1 a1 5 0 5
ID5 a5 40 32 0
ID7 a7 200 4.5 5
ID8 a8 3456 4.9 4
Any help is appreciated. Thank you
one alternative would be...
awk -F'\t' '{p=0<=$22 && $22<=3; q=10<$5 && $5<100 && 5<$13} NR==1 || q || !p ' file
You want the other conditions (q) applied only if $22 is between 0 and 3 (p).
if you had a testable input output it could have been verified.
I have a MS Access 2010 application which writes back to (backend) sql server.The table has student id, test score and rank as columns. The application has a form, which takes input from users. When a new student enters his/her ID, score and rank, based on inserted rank the rest of the ranks must be updated.
For eg, if a new student has a score 79, and rank 5, the current student at 5 must be changed to 6, sixth rank to seventh and so on, in the SQL table
Before:
Student_ID Score Rank
1 89 1
16 88 2
25 84 3
3 81 4
7 78 5
15 75 6
12 72 7
17 70 8
56 65 9
9 64 10
After:
Student_ID Score Rank
1 89 1
16 88 2
25 84 3
3 81 4
7 78 6
15 75 7
12 72 8
17 70 9
56 65 10
9 64 11
10 75 5
Remove the rank field and create a query that calculates the rank (row number) on the fly. To speed this up, use a collection as shown here:
Public Function RowCounter( _
ByVal strKey As String, _
ByVal booReset As Boolean, _
Optional ByVal strGroupKey As String) _
As Long
' Builds consecutive RowIDs in select, append or create query
' with the possibility of automatic reset.
' Optionally a grouping key can be passed to reset the row count
' for every group key.
'
' Usage (typical select query):
' SELECT RowCounter(CStr([ID]),False) AS RowID, *
' FROM tblSomeTable
' WHERE (RowCounter(CStr([ID]),False) <> RowCounter("",True));
'
' Usage (with group key):
' SELECT RowCounter(CStr([ID]),False,CStr[GroupID])) AS RowID, *
' FROM tblSomeTable
' WHERE (RowCounter(CStr([ID]),False) <> RowCounter("",True));
'
' The Where statement resets the counter when the query is run
' and is needed for browsing a select query.
'
' Usage (typical append query, manual reset):
' 1. Reset counter manually:
' Call RowCounter(vbNullString, False)
' 2. Run query:
' INSERT INTO tblTemp ( RowID )
' SELECT RowCounter(CStr([ID]),False) AS RowID, *
' FROM tblSomeTable;
'
' Usage (typical append query, automatic reset):
' INSERT INTO tblTemp ( RowID )
' SELECT RowCounter(CStr([ID]),False) AS RowID, *
' FROM tblSomeTable
' WHERE (RowCounter("",True)=0);
'
' 2002-04-13. Cactus Data ApS. CPH
' 2002-09-09. Str() sometimes fails. Replaced with CStr().
' 2005-10-21. Str(col.Count + 1) reduced to col.Count + 1.
' 2008-02-27. Optional group parameter added.
' 2010-08-04. Corrected that group key missed first row in group.
Static col As New Collection
Static strGroup As String
On Error GoTo Err_RowCounter
If booReset = True Then
Set col = Nothing
ElseIf strGroup <> strGroupKey Then
Set col = Nothing
strGroup = strGroupKey
col.Add 1, strKey
Else
col.Add col.Count + 1, strKey
End If
RowCounter = col(strKey)
Exit_RowCounter:
Exit Function
Err_RowCounter:
Select Case Err
Case 457
' Key is present.
Resume Next
Case Else
' Some other error.
Resume Exit_RowCounter
End Select
End Function
Study the in-line comments for typical usage.