regular expression or replace function in where clause of a mysql query - regex

I write a mysql query
select * from table where name like '%salil%'
which works fine but it will no return records with name 'sal-il', 'sa#lil'.
So i want a query something like below
select * from table whereremove_special_character_from(name)like '%salil%'
remove_special_character_from(name) is a mysql method or a regular expression which remove all the special characters from name before like executed.

No, mysql doesn't support regexp based replace.
I'd suggest to use normalized versions of the search terms, stored in the separate fields.
So, at insert time you strip all non-alpha characters from the data and store it in the data_norm field for the future searches.

Since I know no way to do this, I'd use a "calculated column" for this, i.e. a column which depends on the value of name but without the special characters. This way, the cost for the transformation is paid only once and you can even create an index on the new column.
See this answer how to do this.

I agree with Aaron and Col. Shrapnel that you should use an extra column on the table e.g. search_name to store a normalised version of the name.
I noticed that this question was originally tagged ruby-on-rails. If this is part of a Rails application then you can use a before_save callback to set the value of this field.

In MYSQL 5.1 you can use REGEXP to do regular expression matching like this
SELECT * FROM foo WHERE bar REGEXP "baz"
see http://dev.mysql.com/doc/refman/5.1/en/regexp.html
However, take note that it will be slow and you should do what others posters suggested and store the clean value in a separate field.

Related

How can I replace text in a Siebel data mapping?

I have an outgoing web service to send data from Siebel 7.8 to an external system. In order for the integration to work, before I send the data, I must change one of the field values, replacing every occurence of "old" with "new". How can I do this with EAI data mappings?
In an ideal world I would just use an integration source expression like Replace([Description], "old", "new"). However Siebel is far from ideal, and doesn't have a replace function (or if it does, it's not documented). I can use all the Siebel query language functions which don't need an execution context. I can also use the functions available for calculated fields (sane people could expect both lists to be the same, but Siebel documentation is also far from ideal).
My first attempt was to use the InvokeServiceMethod function and replace the text myself in eScript. So, this is my field map source expression:
InvokeServiceMethod('MyBS', 'MyReplace', 'In="' + [Description] + '"', 'Out')
After some configuration steps it works fine... except if my description field contains the " character: Error parsing expression 'In="This is a "test" with quotes"' for field '3' (SBL-DAT-00481)
I know why this happens. My double quotes are breaking the expression and I have to escape them by doubling the character, as in This is a ""test"" with quotes. However, how can I replace each " with "" in order to call my business service... if I don't have a replace function? :)
Oracle's support web has only one result for the SBL-DAT-00481 error, which as a workaround, suggests to place the whole parameter inside double quotes (which I already had). There's a linked document in which they acknowledge that the workaround is valid for a few characters such as commas or single quotes, but due to a bug in Siebel 7.7-7.8 (not present in 8.0+), it doesn't work with double quotes. They suggest to pass instead the row id as argument to the business service, and then retrieve the data directly from the BC.
Before I do that and end up with a performance-affecting workaround (pass only the ID) for the workaround (use double quotes) for the workaround (use InvokeServiceMethod) for not having a replace function... Am I going crazy here? Isn't there a simple way to do a simple text replacement in a Siebel data mapping?
first thing (quite possibly - far from optimal one) which is coming to my mind - is to create at source BC calculated field, aka (NEW_VALUE), which becomes "NEW" for every record, where origin field has a value "OLD". and simply use this field in integration map.

Wildcard character

I've a dataframe, and I'm trying to select columns with certain properties in the name.
One example (of many) is I want to select columns called "t*_b**" where * would be a wildcard character. This would select columns with names t1_b2, t2_b2, t3_b2 and t4_b2 (as well as several others like t1_b13, t2_b13 etc.).
If there is such a wildcard character I could use, I know that I could just use the following command:
grep("t*_b", names(df))
As opposed to doing:
c(grep("t1_b", names(df)), grep("t2_b", names(df)), grep("t3_b", names(df)), grep("t4_b", names(df)))
which is messier and harder to read.
Update: the first comment has resolved my issue. I don't have any real need for any further input, thanks for the help!
The wildcard 'character' in regular expressions is a .. As such, you could do
grep("t._b", names(df))

Is there a way to search terms in order with RegexpQuery in lucene?

I would like to search my indexed documents in order using RegexpQuery.
For example I have 2 Document
text: Oracle unveils better than expected quarterly results.
text: Research In Motion shares gained almost 13 per cent on the Toronto Stock Exchange Friday, a day after the smartphone maker posted better than expected quarterly results.
So far I tried this but I got no luck.
Query regexq = new RegexpQuery(new Term("text", "^.+better.+quarterly.+results"));
Is there another way of implementing this?
Thanks
I believe a PhraseQuery fits what you are looking for better. You can use PhraseQuery.setSlop(int) to allow terms to appear between the terms of the query. This would like like:
Query pq = new PhraseQuery();
pq.add(new Term("text", "better"));
pq.add(new Term("text", "quarterly"));
pq.add(new Term("text", "results"));
pq.setSlop(10); //Or whatever is an appropriate slop value for you.
This sort of query is also supported by the standard QueryParser, as seen here, like:
text:"better quarterly results"~10
I think a PhraseQuery is most definitely the better implementation here, but...
Regarding RegexpQuery:
I believe it is intended to compare terms against the regex, and since the phrase you are searching for (I am assuming) is tokenized, no single Term matches your whole regex. You would need to index the entire field as a single Term to make this work, using StringField, KeywordAnalyzer, or similar.
I believe it works like Matcher.matches(), rather than Matcher.find(), which is to say, it must match the entire input term, rather than a portion of it. So, if you had specified "text" as a StringField, you would need to add a .* to the end to consume the rest of the input.
On a similar note, I'm not sure if it supports the use of the character "^" as the start of input, being that it is redundant in that case. I don't see it specified in Lucene's Regexp, but I have seen reference to it's use, so I'm not sure whether it would be accepted or not.
To summarize, a RegexpQuery could work like:
Query regexq = new RegexpQuery(new Term("text", ".+better.+quarterly.+results.*"));
If you used a StringField, or KeywordAnalyzer index the entire field as a single Term.
With the leading wildcard in your regexp, though, you could expect very poor performance from it (See the warning at the top of the RegexpQuery documentation).

regular expression to extract insert sql statement from a text file and to check for hardcoded parameters

I have a bunch of sql statements updated by my team developers.
I intend to run a check before these statements are run against a db.
for example, check if a certain column is hardcoded instead of being fetched from the respective table (foreign key)
for example:
INSERT INTO [Term1] ([CreatedBy]
,[CreateUser]) values(1,'asdadad')
where 1 is hardcoded value.
Is there a regular expression that can extract all insert statements from the file so that they can be parse?
I tried with this expression http://regexlib.com/REDetails.aspx?regexp_id=1750 but it didnot work
You may need to run a multi-level regex on this. First parse the entire parameter string from the whole query, then parse each individual field from the paramter string that you previously got to get each one specifically ignoring all the other characters that may come up.

Using a RegEx in a SQL Query

Here's the situation I'm in: We have a field in our database that contains a 3 digit number, surrounded by some text. This number is actually a PK in another table, and I need to extract this out so I can implement a proper FK relationship. Here's an example of what would currently reside in the column:
Some Text Goes Here - (305) Followed By Some More Text
So, what I'm looking to do is extract the '305' from the column, and hopefully end up with a result that looks something like this (pseudo code)
SELECT
<My Extracted Value>,
Original Column Text,
Id
FROM dbo.MyTable
It seems to me that using a Regex match in my query is the most effective way to do this. Can anybody point me in the right direction?
EDIT: We're using SQL Server 2005
RegExp in SQL is defined by a SQL-Standard but most databases implemented their own syntax, you should tell us the product name of your RDBMS ;)
This is based on Pranay's first answer that has since been changed.
DECLARE #NumStr varchar(1000)
SET #NumStr = 'Some Text Goes Here - (305) Followed By Some More Text';
SELECT SUBSTRING(#NumStr,PATINDEX('%[0-9][0-9][0-9]%',#NumStr),3)
Returns 305
Microsoft seems to suggest using a CLR assembly to do Regex pattern matching in SQL Server 2005.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Apart from LIKE (which is not going to solve your problem) I don't know of a built-in pattern matching functionality in SQL Server 2005 (that is, more advanced than simple string searches).
Just after I implemented a solution in Postgres, I see you are using SqlServer... Just for the records, then, with a regex that extracts data in parenthesis.
Postgresql solution:
create table main(id text not null)
insert into main values('some text (44) other text');
insert into main values('and more text (78) and even more');
select substring(id from '\\(([^\\(]+)\\)') from main
The only way to access RegEx-type functions in SQL 2005 (and probably 2008) is by writing (or downloading) and using CLR functions.
If all the strings are always formatted in such a way as you can identify the specific numbers you want, you can do something like the following. This is based on the (big) assumption that the first set of parenthesis found in the string contains the number that you want.
/*
CREATE TABLE MyTable
(
MyText varchar(500) not null
)
INSERT MyTable values ('Some Text Goes Here - (305) Followed By Some More Text')
*/
SELECT
MyText -- String
,charindex('(', MyText) -- Where's the open parenthesis
,charindex(')', MyText) -- Where's the closed parenthesis
,substring(MyText
,charindex('(', MyText) + 1, charindex(')'
,MyText) - charindex('(', MyText) - 1) -- Glom it all together
from MyTable
Awkward as heck (because SQL has a pathetically limited set of string manipulation functions), but it works.