Using a RegEx in a SQL Query - regex

Here's the situation I'm in: We have a field in our database that contains a 3 digit number, surrounded by some text. This number is actually a PK in another table, and I need to extract this out so I can implement a proper FK relationship. Here's an example of what would currently reside in the column:
Some Text Goes Here - (305) Followed By Some More Text
So, what I'm looking to do is extract the '305' from the column, and hopefully end up with a result that looks something like this (pseudo code)
SELECT
<My Extracted Value>,
Original Column Text,
Id
FROM dbo.MyTable
It seems to me that using a Regex match in my query is the most effective way to do this. Can anybody point me in the right direction?
EDIT: We're using SQL Server 2005

RegExp in SQL is defined by a SQL-Standard but most databases implemented their own syntax, you should tell us the product name of your RDBMS ;)

This is based on Pranay's first answer that has since been changed.
DECLARE #NumStr varchar(1000)
SET #NumStr = 'Some Text Goes Here - (305) Followed By Some More Text';
SELECT SUBSTRING(#NumStr,PATINDEX('%[0-9][0-9][0-9]%',#NumStr),3)
Returns 305

Microsoft seems to suggest using a CLR assembly to do Regex pattern matching in SQL Server 2005.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Apart from LIKE (which is not going to solve your problem) I don't know of a built-in pattern matching functionality in SQL Server 2005 (that is, more advanced than simple string searches).

Just after I implemented a solution in Postgres, I see you are using SqlServer... Just for the records, then, with a regex that extracts data in parenthesis.
Postgresql solution:
create table main(id text not null)
insert into main values('some text (44) other text');
insert into main values('and more text (78) and even more');
select substring(id from '\\(([^\\(]+)\\)') from main

The only way to access RegEx-type functions in SQL 2005 (and probably 2008) is by writing (or downloading) and using CLR functions.
If all the strings are always formatted in such a way as you can identify the specific numbers you want, you can do something like the following. This is based on the (big) assumption that the first set of parenthesis found in the string contains the number that you want.
/*
CREATE TABLE MyTable
(
MyText varchar(500) not null
)
INSERT MyTable values ('Some Text Goes Here - (305) Followed By Some More Text')
*/
SELECT
MyText -- String
,charindex('(', MyText) -- Where's the open parenthesis
,charindex(')', MyText) -- Where's the closed parenthesis
,substring(MyText
,charindex('(', MyText) + 1, charindex(')'
,MyText) - charindex('(', MyText) - 1) -- Glom it all together
from MyTable
Awkward as heck (because SQL has a pathetically limited set of string manipulation functions), but it works.

Related

Specify the number of characters that should match a LIKE REGEX in T-SQL

I've done a ton of Googling on this and can't find the answer. Or, at least, not the answer I am hoping to find. I am attempting to convert a REGEXP_SUBSTR search from Teradata into T-SQL on SQL Server 2016.
This is the way it is written in Teradata:
REGEXP_SUBSTR(cn.CONTRACT_PD_AOR,'\b([a-zA-Z]{2})-([[:digit:]]{2})-([[:digit:]]{3})(-([a-zA-Z]{2}))?\b')
The numbers in the curly brackets specify the number of characters that can match the specific REGEXP. So, this is looking for a contract number that look like this format: XX-99-999-XX
Is this not possible in T-SQL? Specifying the amount of characters to look at? So I would have to write something like this:
where CONTRACT_PD_AOR like '[a-zA-Z][a-zA-Z]-[0-9][0-9]-[0-9][0-9][0-9]-[a-zA-Z][a-zA-Z]%'
Is there not a simpler way to go about it?
While not an answer, with this method it makes things a little less panful. This is a way to set a format and reuse it if you'll need it multiple times in your code while keeping it clean and readable.
Set a format variable at the top, then do the needed replaces to build it. Then use the format name in the code. Saves a little typing, makes your code less fugly, and has the benefit of making that format variable reusable should you need it in multiple queries without all that typing.
Declare #fmt_CONTRACT_PD_AOR nvarchar(max) = 'XX-99-999-XX';
Set #fmt_CONTRACT_PD_AOR = REPLACE(#fmt_CONTRACT_PD_AOR, '9', '[0-9]');
Set #fmt_CONTRACT_PD_AOR = REPLACE(#fmt_CONTRACT_PD_AOR, 'X', '[a-zA-Z]');
with tbl(str) as (
select 'AA-23-234-ZZ' union all
select 'db-32-123-dd' union all
select 'ab-123-88-kk'
)
select str from tbl
where str like #fmt_CONTRACT_PD_AOR;

Searching for unknown characters in an Oracle database

Is there a way to search an Oracle database (some sort of regex I suspect) to find unknown characters (which often appear as □ □)?
There is no standard way to search over entire Oracle database. You would need a tedious script that walks over various types of Oracle objects in dba_objects, and then descends into each (for a trivial example if an object is a table you need to parse the columns, and if a column contains a character data, REGEXP_LIKE; but there are more types of objects, for example a package - do you want to search package's literals too?). I would instead make manually an explicit list of queries over tables and columns.
Try something like this:
select co11, ...
from tab1
where col1 like '%'||chr(9)||'%' -- ascii code for tab
or col1 like '%'||chr(20)||'%' -- ascii code for newline
--...
;

regular expression to extract insert sql statement from a text file and to check for hardcoded parameters

I have a bunch of sql statements updated by my team developers.
I intend to run a check before these statements are run against a db.
for example, check if a certain column is hardcoded instead of being fetched from the respective table (foreign key)
for example:
INSERT INTO [Term1] ([CreatedBy]
,[CreateUser]) values(1,'asdadad')
where 1 is hardcoded value.
Is there a regular expression that can extract all insert statements from the file so that they can be parse?
I tried with this expression http://regexlib.com/REDetails.aspx?regexp_id=1750 but it didnot work
You may need to run a multi-level regex on this. First parse the entire parameter string from the whole query, then parse each individual field from the paramter string that you previously got to get each one specifically ignoring all the other characters that may come up.

regular expression or replace function in where clause of a mysql query

I write a mysql query
select * from table where name like '%salil%'
which works fine but it will no return records with name 'sal-il', 'sa#lil'.
So i want a query something like below
select * from table whereremove_special_character_from(name)like '%salil%'
remove_special_character_from(name) is a mysql method or a regular expression which remove all the special characters from name before like executed.
No, mysql doesn't support regexp based replace.
I'd suggest to use normalized versions of the search terms, stored in the separate fields.
So, at insert time you strip all non-alpha characters from the data and store it in the data_norm field for the future searches.
Since I know no way to do this, I'd use a "calculated column" for this, i.e. a column which depends on the value of name but without the special characters. This way, the cost for the transformation is paid only once and you can even create an index on the new column.
See this answer how to do this.
I agree with Aaron and Col. Shrapnel that you should use an extra column on the table e.g. search_name to store a normalised version of the name.
I noticed that this question was originally tagged ruby-on-rails. If this is part of a Rails application then you can use a before_save callback to set the value of this field.
In MYSQL 5.1 you can use REGEXP to do regular expression matching like this
SELECT * FROM foo WHERE bar REGEXP "baz"
see http://dev.mysql.com/doc/refman/5.1/en/regexp.html
However, take note that it will be slow and you should do what others posters suggested and store the clean value in a separate field.

Use cases for regular expression find/replace

I recently discussed editors with a co-worker. He uses one of the less popular editors and I use another (I won't say which ones since it's not relevant and I want to avoid an editor flame war). I was saying that I didn't like his editor as much because it doesn't let you do find/replace with regular expressions.
He said he's never wanted to do that, which was surprising since it's something I find myself doing all the time. However, off the top of my head I wasn't able to come up with more than one or two examples. Can anyone here offer some examples of times when they've found regex find/replace useful in their editor? Here's what I've been able to come up with since then as examples of things that I've actually had to do:
Strip the beginning of a line off of every line in a file that looks like:
Line 25634 :
Line 632157 :
Taking a few dozen files with a standard header which is slightly different for each file and stripping the first 19 lines from all of them all at once.
Piping the result of a MySQL select statement into a text file, then removing all of the formatting junk and reformatting it as a Python dictionary for use in a simple script.
In a CSV file with no escaped commas, replace the first character of the 8th column of each row with a capital A.
Given a bunch of GDB stack traces with lines like
#3 0x080a6d61 in _mvl_set_req_done (req=0x82624a4, result=27158) at ../../mvl/src/mvl_serv.c:850
strip out everything from each line except the function names.
Does anyone else have any real-life examples? The next time this comes up, I'd like to be more prepared to list good examples of why this feature is useful.
Just last week, I used regex find/replace to convert a CSV file to an XML file.
Simple enough to do really, just chop up each field (luckily it didn't have any escaped commas) and push it back out with the appropriate tags in place of the commas.
Regex make it easy to replace whole words using word boundaries.
(\b\w+\b)
So you can replace unwanted words in your file without disturbing words like Scunthorpe
Yesterday I took a create table statement I made for an Oracle table and converted the fields to setString() method calls using JDBC and PreparedStatements. The table's field names were mapped to my class properties, so regex search and replace was the perfect fit.
Create Table text:
...
field_1 VARCHAR2(100) NULL,
field_2 VARCHAR2(10) NULL,
field_3 NUMBER(8) NULL,
field_4 VARCHAR2(100) NULL,
....
My Regex Search:
/([a-z_])+ .*?,?/
My Replacement:
pstmt.setString(1, \1);
The result:
...
pstmt.setString(1, field_1);
pstmt.setString(1, field_2);
pstmt.setString(1, field_3);
pstmt.setString(1, field_4);
....
I then went through and manually set the position int for each call and changed the method to setInt() (and others) where necessary, but that worked handy for me. I actually used it three or four times for similar field to method call conversions.
I like to use regexps to reformat lists of items like this:
int item1
double item2
to
public void item1(int item1){
}
public void item2(double item2){
}
This can be a big time saver.
I use it all the time when someone sends me a list of patient visit numbers in a column (say 100-200) and I need them in a '0000000444','000000004445' format. works wonders for me!
I also use it to pull out email addresses in an email. I send out group emails often and all the bounced returns come back in one email. So, I regex to pull them all out and then drop them into a string var to remove from the database.
I even wrote a little dialog prog to apply regex to my clipboard. It grabs the contents applies the regex and then loads it back into the clipboard.
One thing I use it for in web development all the time is stripping some text of its HTML tags. This might need to be done to sanitize user input for security, or for displaying a preview of a news article. For example, if you have an article with lots of HTML tags for formatting, you can't just do LEFT(article_text,100) + '...' (plus a "read more" link) and render that on a page at the risk of breaking the page by splitting apart an HTML tag.
Also, I've had to strip img tags in database records that link to images that no longer exist. And let's not forget web form validation. If you want to make a user has entered a correct email address (syntactically speaking) into a web form this is about the only way of checking it thoroughly.
I've just pasted a long character sequence into a string literal, and now I want to break it up into a concatenation of shorter string literals so it doesn't wrap. I also want it to be readable, so I want to break only after spaces. I select the whole string (minus the quotation marks) and do an in-selection-only replace-all with this regex:
/.{20,60} /
...and this replacement:
/$0"¶ + "/
...where the pilcrow is an actual newline, and the number of spaces varies from one incident to the next. Result:
String s = "I recently discussed editors with a co-worker. He uses one "
+ "of the less popular editors and I use another (I won't say "
+ "which ones since it's not relevant and I want to avoid an "
+ "editor flame war). I was saying that I didn't like his "
+ "editor as much because it doesn't let you do find/replace "
+ "with regular expressions.";
The first thing I do with any editor is try to figure out it's Regex oddities. I use it all the time. Nothing really crazy, but it's handy when you've got to copy/paste stuff between different types of text - SQL <-> PHP is the one I do most often - and you don't want to fart around making the same change 500 times.
Regex is very handy any time I am trying to replace a value that spans multiple lines. Or when I want to replace a value with something that contains a line break.
I also like that you can match things in a regular expression and not replace the full match using the $# syntax to output the portion of the match you want to maintain.
I agree with you on points 3, 4, and 5 but not necessarily points 1 and 2.
In some cases 1 and 2 are easier to achieve using a anonymous keyboard macro.
By this I mean doing the following:
Position the cursor on the first line
Start a keyboard macro recording
Modify the first line
Position the cursor on the next line
Stop record.
Now all that is needed to modify the next line is to repeat the macro.
I could live with out support for regex but could not live without anonymous keyboard macros.