Searching for unknown characters in an Oracle database

Searching for unknown characters in an Oracle database - regex

Is there a way to search an Oracle database (some sort of regex I suspect) to find unknown characters (which often appear as □ □)?

There is no standard way to search over entire Oracle database. You would need a tedious script that walks over various types of Oracle objects in dba_objects, and then descends into each (for a trivial example if an object is a table you need to parse the columns, and if a column contains a character data, REGEXP_LIKE; but there are more types of objects, for example a package - do you want to search package's literals too?). I would instead make manually an explicit list of queries over tables and columns.

Try something like this:
select co11, ...
from tab1
where col1 like '%'||chr(9)||'%' -- ascii code for tab
or col1 like '%'||chr(20)||'%' -- ascii code for newline
--...
;

Related

Specify the number of characters that should match a LIKE REGEX in T-SQL

I've done a ton of Googling on this and can't find the answer. Or, at least, not the answer I am hoping to find. I am attempting to convert a REGEXP_SUBSTR search from Teradata into T-SQL on SQL Server 2016.
This is the way it is written in Teradata:
REGEXP_SUBSTR(cn.CONTRACT_PD_AOR,'\b([a-zA-Z]{2})-([[:digit:]]{2})-([[:digit:]]{3})(-([a-zA-Z]{2}))?\b')
The numbers in the curly brackets specify the number of characters that can match the specific REGEXP. So, this is looking for a contract number that look like this format: XX-99-999-XX
Is this not possible in T-SQL? Specifying the amount of characters to look at? So I would have to write something like this:
where CONTRACT_PD_AOR like '[a-zA-Z][a-zA-Z]-[0-9][0-9]-[0-9][0-9][0-9]-[a-zA-Z][a-zA-Z]%'
Is there not a simpler way to go about it?

While not an answer, with this method it makes things a little less panful. This is a way to set a format and reuse it if you'll need it multiple times in your code while keeping it clean and readable.
Set a format variable at the top, then do the needed replaces to build it. Then use the format name in the code. Saves a little typing, makes your code less fugly, and has the benefit of making that format variable reusable should you need it in multiple queries without all that typing.
Declare #fmt_CONTRACT_PD_AOR nvarchar(max) = 'XX-99-999-XX';
Set #fmt_CONTRACT_PD_AOR = REPLACE(#fmt_CONTRACT_PD_AOR, '9', '[0-9]');
Set #fmt_CONTRACT_PD_AOR = REPLACE(#fmt_CONTRACT_PD_AOR, 'X', '[a-zA-Z]');
with tbl(str) as (
select 'AA-23-234-ZZ' union all
select 'db-32-123-dd' union all
select 'ab-123-88-kk'
)
select str from tbl
where str like #fmt_CONTRACT_PD_AOR;

Wildcard character

I've a dataframe, and I'm trying to select columns with certain properties in the name.
One example (of many) is I want to select columns called "t*_b**" where * would be a wildcard character. This would select columns with names t1_b2, t2_b2, t3_b2 and t4_b2 (as well as several others like t1_b13, t2_b13 etc.).
If there is such a wildcard character I could use, I know that I could just use the following command:
grep("t*_b", names(df))
As opposed to doing:
c(grep("t1_b", names(df)), grep("t2_b", names(df)), grep("t3_b", names(df)), grep("t4_b", names(df)))
which is messier and harder to read.
Update: the first comment has resolved my issue. I don't have any real need for any further input, thanks for the help!

The wildcard 'character' in regular expressions is a .. As such, you could do
grep("t._b", names(df))

regular expression to extract insert sql statement from a text file and to check for hardcoded parameters

I have a bunch of sql statements updated by my team developers.
I intend to run a check before these statements are run against a db.
for example, check if a certain column is hardcoded instead of being fetched from the respective table (foreign key)
for example:
INSERT INTO [Term1] ([CreatedBy]
,[CreateUser]) values(1,'asdadad')
where 1 is hardcoded value.
Is there a regular expression that can extract all insert statements from the file so that they can be parse?
I tried with this expression http://regexlib.com/REDetails.aspx?regexp_id=1750 but it didnot work

You may need to run a multi-level regex on this. First parse the entire parameter string from the whole query, then parse each individual field from the paramter string that you previously got to get each one specifically ignoring all the other characters that may come up.

Using a RegEx in a SQL Query

Here's the situation I'm in: We have a field in our database that contains a 3 digit number, surrounded by some text. This number is actually a PK in another table, and I need to extract this out so I can implement a proper FK relationship. Here's an example of what would currently reside in the column:
Some Text Goes Here - (305) Followed By Some More Text
So, what I'm looking to do is extract the '305' from the column, and hopefully end up with a result that looks something like this (pseudo code)
SELECT
<My Extracted Value>,
Original Column Text,
Id
FROM dbo.MyTable
It seems to me that using a Regex match in my query is the most effective way to do this. Can anybody point me in the right direction?
EDIT: We're using SQL Server 2005

RegExp in SQL is defined by a SQL-Standard but most databases implemented their own syntax, you should tell us the product name of your RDBMS ;)

This is based on Pranay's first answer that has since been changed.
DECLARE #NumStr varchar(1000)
SET #NumStr = 'Some Text Goes Here - (305) Followed By Some More Text';
SELECT SUBSTRING(#NumStr,PATINDEX('%[0-9][0-9][0-9]%',#NumStr),3)
Returns 305

Microsoft seems to suggest using a CLR assembly to do Regex pattern matching in SQL Server 2005.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Apart from LIKE (which is not going to solve your problem) I don't know of a built-in pattern matching functionality in SQL Server 2005 (that is, more advanced than simple string searches).

Just after I implemented a solution in Postgres, I see you are using SqlServer... Just for the records, then, with a regex that extracts data in parenthesis.
Postgresql solution:
create table main(id text not null)
insert into main values('some text (44) other text');
insert into main values('and more text (78) and even more');
select substring(id from '\\(([^\\(]+)\\)') from main

The only way to access RegEx-type functions in SQL 2005 (and probably 2008) is by writing (or downloading) and using CLR functions.
If all the strings are always formatted in such a way as you can identify the specific numbers you want, you can do something like the following. This is based on the (big) assumption that the first set of parenthesis found in the string contains the number that you want.
/*
CREATE TABLE MyTable
(
MyText varchar(500) not null
)
INSERT MyTable values ('Some Text Goes Here - (305) Followed By Some More Text')
*/
SELECT
MyText -- String
,charindex('(', MyText) -- Where's the open parenthesis
,charindex(')', MyText) -- Where's the closed parenthesis
,substring(MyText
,charindex('(', MyText) + 1, charindex(')'
,MyText) - charindex('(', MyText) - 1) -- Glom it all together
from MyTable
Awkward as heck (because SQL has a pathetically limited set of string manipulation functions), but it works.

regular expression or replace function in where clause of a mysql query

I write a mysql query
select * from table where name like '%salil%'
which works fine but it will no return records with name 'sal-il', 'sa#lil'.
So i want a query something like below
select * from table whereremove_special_character_from(name)like '%salil%'
remove_special_character_from(name) is a mysql method or a regular expression which remove all the special characters from name before like executed.

No, mysql doesn't support regexp based replace.
I'd suggest to use normalized versions of the search terms, stored in the separate fields.
So, at insert time you strip all non-alpha characters from the data and store it in the data_norm field for the future searches.

Since I know no way to do this, I'd use a "calculated column" for this, i.e. a column which depends on the value of name but without the special characters. This way, the cost for the transformation is paid only once and you can even create an index on the new column.
See this answer how to do this.

I agree with Aaron and Col. Shrapnel that you should use an extra column on the table e.g. search_name to store a normalised version of the name.
I noticed that this question was originally tagged ruby-on-rails. If this is part of a Rails application then you can use a before_save callback to set the value of this field.

In MYSQL 5.1 you can use REGEXP to do regular expression matching like this
SELECT * FROM foo WHERE bar REGEXP "baz"
see http://dev.mysql.com/doc/refman/5.1/en/regexp.html
However, take note that it will be slow and you should do what others posters suggested and store the clean value in a separate field.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Searching for unknown characters in an Oracle database - regex

Is there a way to search an Oracle database (some sort of regex I suspect) to find unknown characters (which often appear as □ □)?

Try something like this: select co11, ... from tab1 where col1 like '%'||chr(9)||'%' -- ascii code for tab or col1 like '%'||chr(20)||'%' -- ascii code for newline --... ;

Related

Specify the number of characters that should match a LIKE REGEX in T-SQL

Wildcard character

regular expression to extract insert sql statement from a text file and to check for hardcoded parameters

Using a RegEx in a SQL Query

regular expression or replace function in where clause of a mysql query

Categories

Resources