Query for only integers Rails 2.3 - regex

My environment is Ruby 1.8.7-p358 with Rails 2.3.11. I am trying to query for all accounts that have a first_name containing only numbers (they are all 3 digit integers, i.e. 001, 143, 815, etc.). I have been trying to use this query:
Account.find(:all, :conditions => ["first_name LIKE ?", "[^0-9][^0-9][^0-9]"])
All I get in return is => [].
If I use the same query only with NOT LIKE I get all accounts, including the ones where first_name is an integer.
I've also tried using:
Account.find(:all, :conditions => ["first_name REGEXP ?", "[^0-9][^0-9][^0-9]"])
but that only gives me:
ActiveRecord::StatementInvalid: PGError: ERROR: syntax error at or near "REGEXP"
LINE 1: SELECT * FROM "accounts" WHERE (first_name REGEXP '[0-9][0-9...
^
: SELECT * FROM "accounts" WHERE (first_name REGEXP '[0-9][0-9][0-9]') ORDER BY first_name ASC
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/connection_adapters/abstract_adapter.rb:227:in `log'
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/connection_adapters/postgresql_adapter.rb:520:in `execute'
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/connection_adapters/postgresql_adapter.rb:1002:in `select_raw'
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/connection_adapters/postgresql_adapter.rb:989:in `select'
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all_without_query_cache'
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/connection_adapters/abstract/query_cache.rb:62:in `select_all'
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/base.rb:665:in `find_by_sql'
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/base.rb:1582:in `find_every'
from /Users/kyle/.rvm/gems/ruby-1.8.7-p358#popcorn_recruiter_portal/gems/activerecord-2.3.11/lib/active_record/base.rb:619:in `find'
from (irb):189
from :0
Are regular expressions not allowed? How am I able to find each account with a first_name that's equal to a three digit integer?

The like operator does not evaluate regular expressions. In instead you can use both similar to:
select '980' similar to '[0-9][0-9][0-9]';
?column?
----------
t
Or ~:
select '980' ~ '[0-9][0-9][0-9]';
?column?
----------
t
(1 row)

Related

Regular Expression: changing matching method from OR to AND

I have a regular expression like the following: (Running on Oracle's regexp_like(), despite the question isn't Oracle-specific)
abc|bcd|def|xyz
This basically matches a tags field on database to see if tags field contains abc OR bcd OR def OR xyz when user has input for the search query "abc bcd def xyz".
The tags field on the database holds keywords separated by spaces, e.g. "cdefg abcd xyz"
On Oracle, this would be something like:
select ... from ... where
regexp_like(tags, 'abc|bcd|def|xyz');
It works fine as it is, but I want to add an extra option for users to search for results that match all keywords. How should I change the regular expression so that it matches abc AND bcd AND def AND xyz ?
Note: Because I won't know what exact keywords the user will enter, I can't pre-structure the query in the PL/SQL like this:
select ... from ... where
tags like '%abc%' AND
tags like '%bcd%' AND
tags like '%def%' AND
tags like '%xyz%';
You can split the input pattern and check that all the parts of the pattern match:
SELECT t.*
FROM table_name t
CROSS APPLY(
WITH input (match) AS (
SELECT 'abc bcd def xyz' FROM DUAL
)
SELECT 1
FROM input
CONNECT BY LEVEL <= REGEXP_COUNT(match, '\S+')
HAVING COUNT(
REGEXP_SUBSTR(
t.tags,
REGEXP_SUBSTR(match, '\S+', 1, LEVEL)
)
) = REGEXP_COUNT(match, '\S+')
)
Or, if you have Java enabled in the database then you can create a Java function to match regular expressions:
CREATE AND COMPILE JAVA SOURCE NAMED RegexParser AS
import java.util.regex.Pattern;
public class RegexpMatch {
public static int match(
final String value,
final String regex
){
final Pattern pattern = Pattern.compile(regex);
return pattern.matcher(value).matches() ? 1 : 0;
}
}
/
Then wrap it in an SQL function:
CREATE FUNCTION regexp_java_match(value IN VARCHAR2, regex IN VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA NAME 'RegexpMatch.match( java.lang.String, java.lang.String ) return int';
/
Then use it in SQL:
SELECT *
FROM table_name
WHERE regexp_java_match(tags, '(?=.*abc)(?=.*bcd)(?=.*def)(?=.*xyz)') = 1;
Try this, the idea being counting that the number of matches is == to the number of patterns:
with data(val) AS (
select 'cdefg abcd xyz' from dual union all
select 'cba lmnop xyz' from dual
),
targets(s) as (
select regexp_substr('abc bcd def xyz', '[^ ]+', 1, LEVEL) from dual
connect by regexp_substr('abc bcd def xyz', '[^ ]+', 1, LEVEL) is not null
)
select val from data d
join targets t on
regexp_like(val,s)
group by val having(count(*) = (select count(*) from targets))
;
Result:
cdefg abcd xyz
I think dynamic SQL will be needed for this. The match all option will require individual matching with logic to ensure every individual match is found.
An easy way would be to build a join condition for each keyword. Concatenate the join statements in a string. Use dynamic SQL to execute the string as a query.
The example below uses the customer table from the sample schemas provided by Oracle.
DECLARE
-- match string should be just the values to match with spaces in between
p_match_string VARCHAR2(200) := 'abc bcd def xyz';
-- need logic to determine match one (OR) versus match all (AND)
p_match_type VARCHAR2(3) := 'OR';
l_sql_statement VARCHAR2(4000);
-- create type if bulk collect is needed
TYPE t_email_address_tab IS TABLE OF customers.EMAIL_ADDRESS%TYPE INDEX BY PLS_INTEGER;
l_email_address_tab t_email_address_tab;
BEGIN
WITH sql_clauses(row_idx,sql_text) AS
(SELECT 0 row_idx -- build select plus beginning of where clause
,'SELECT email_address '
|| 'FROM customers '
|| 'WHERE 1 = '
|| DECODE(p_match_type, 'AND', '1', '0') sql_text
FROM DUAL
UNION
SELECT LEVEL row_idx -- build joins for each keyword
,DECODE(p_match_type, 'AND', ' AND ', ' OR ')
|| 'email_address'
|| ' LIKE ''%'
|| REGEXP_SUBSTR( p_match_string,'[^ ]+',1,level)
|| '%''' sql_text
FROM DUAL
CONNECT BY LEVEL <= LENGTH(p_match_string) - LENGTH(REPLACE( p_match_string, ' ' )) + 1
)
-- put it all together by row_idx
SELECT LISTAGG(sql_text, '') WITHIN GROUP (ORDER BY row_idx)
INTO l_sql_statement
FROM sql_clauses;
dbms_output.put_line(l_sql_statement);
-- can use execute immediate (or ref cursor) for dynamic sql
EXECUTE IMMEDIATE l_sql_statement
BULK COLLECT
INTO l_email_address_tab;
END;
Variable
Value
p_match_string
abc bcd def xyz
p_match_type
AND
l_sql_statement
SELECT email_address FROM customers WHERE 1 = 1 AND email_address LIKE '%abc%' AND email_address LIKE '%bcd%' AND email_address LIKE '%def%' AND email_address LIKE '%xyz%'
Variable
Value
p_match_string
abc bcd def xyz
p_match_type
OR
l_sql_statement
SELECT email_address FROM customers WHERE 1 = 0 OR email_address LIKE '%abc%' OR email_address LIKE '%bcd%' OR email_address LIKE '%def%' OR email_address LIKE '%xyz%'

Python exclude directory with fnmatch

I'm working with some legacy code that I can't change (for reasons).
It uses fnmatch.fnmatch to filter a list of paths, like so (simplified):
import fnmatch
paths = ['a/x.txt', 'b/y.txt']
for path in paths:
if fnmatch.fnmatch(path, '*.txt'):
print 'do things'
Via configuration I am able to change the pattern used to match the files. I need to exclude everything in b/, is that possible?
From reading the docs (https://docs.python.org/2/library/fnmatch.html) it does not appear to be, but I thought asking was worth a try.
From the fnmatch.fnmatch documentation:
Patterns are Unix shell style:
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any char not in seq
When I run:
for path in paths:
if fnmatch.fnmatch(path, '[!b]*'):
print path
I get:
a/x.txt
Somehow this method works for alphabet just after "!'
for example in my case from the list col_names
['# Spec No', 'Name', 'Date (DD/MM/YYYY)', 'Time (hh:mm:ss)', 'Year',
'Fractional day', 'Fractional time', 'Scans', 'Tint', 'SZA',
'NO2_UV.RMS', 'NO2_UV.RefZm', 'NO2_UV.RefNumber', 'NO2_UV.SlCol(bro)',
'NO2_UV.SlErr(bro)', 'NO2_UV.SlCol(ring)', 'NO2_UV.SlErr(ring)',
'NO2_UV.SlCol(HCHO)', 'NO2_UV.SlErr(HCHO)', 'NO2_UV.SlCol(O4)',
'NO2_UV.SlErr(O4)', 'NO2_UV.SlCol(O3a)', 'NO2_UV.SlErr(O3a)',
'NO2_UV.SlCol(O3223k)', 'NO2_UV.SlErr(O3223k)', 'NO2_UV.SlCol(NO2)',
'NO2_UV.SlErr(NO2)', 'NO2_UV.SlCol(no2a)', 'NO2_UV.SlErr(no2a)',
'NO2_UV.Offset (Constant)', 'NO2_UV.Err(Offset (Constant))',
'NO2_UV.Offset (Order 1)', 'NO2_UV.Err(Offset (Order 1))',
'NO2_UV.Shift(Spectrum)', 'NO2_UV.Stretch(Spectrum)1',
'NO2_UV.Stretch(Spectrum)2', 'HCHO.RMS', 'HCHO.RefZm', 'HCHO.RefNumber',
'HCHO.SlCol(bro)', 'HCHO.SlErr(bro)', 'HCHO.SlCol(ring)',
'HCHO.SlErr(ring)', 'HCHO.SlCol(HCHO)', 'HCHO.SlErr(HCHO)',
'HCHO.SlCol(O4)', 'HCHO.SlErr(O4)', 'HCHO.SlCol(O3a)',
'HCHO.SlErr(O3a)', 'HCHO.SlCol(O3223k)', 'HCHO.SlErr(O3223k)',
'HCHO.SlCol(NO2)', 'HCHO.SlErr(NO2)', 'HCHO.Offset (Constant)',
'HCHO.Err(Offset (Constant))', 'HCHO.Offset (Order 1)',
'HCHO.Err(Offset (Order 1))', 'HCHO.Shift(Spectrum)',
'HCHO.Stretch(Spectrum)1', 'HCHO.Stretch(Spectrum)2', 'Fluxes 318',
'Fluxes 330', 'Fluxes 390', 'Fluxes 440']
I wanted to search all the names that did not contain NO2_UV.
If I do
header_hcho = fnmatch.filter(col_names, '[!NO2_UV.]*');
it excludes the second element that is "Name"., because it starts with N. And the result is the same as if i do
header_hcho = fnmatch.filter(col_names, '[!N]*');
So, I went by rather an old-school method
header_hcho = []
idx=0
for idx in range(0, len(col_names)):
if col_names[idx].find("NO2_UV") == -1:
header_hcho.append(col_names[idx])
idx=idx+1

How to use regular expressions properly on a SQL files?

I have a lot of undocumented and uncommented SQL queries. I would like to extract some information within the SQL-statements. Particularly, I'm interested in DB-names, table names and if possible column names. The queries have usually the following syntax.
SELECT *
FROM mydb.table1 m
LEFT JOIN mydb.sometable o ON m.id = o.id
LEFT JOIN mydb.sometable t ON p.id=t.id
LEFT JOIN otherdb.sometable s ON s.column='test'
Usually, the statements involes several DBs and Tables. I would like only extract DBs and Tables with any other information. I thought if whether it is possible to extract first the information which begins after FROM & JOIN & LEFT JOIN. Here its usually db.table letters such as o t s correspond already to referenced tables. I suppose they are difficult to capture. What I tried without any success is to use something like:
gsub(".*FROM \\s*|WHERE|ORDER|GROUP.*", "", vec)
Assuming that each statement ends with WHERE/where or ORDER/order or GROUP... But that doesnt work out as expected.
You haven't indicated which database system you are using but virtually all such systems have introspection facilities that would allow you to get this information a lot more easily and reliably than attempting to parse SQL statements. The following code which supposes SQLite can likely be adapted to your situation by getting a list of your databases and then looping over the databases and using dbConnect to connect to each one in turn running code such as this:
library(gsubfn)
library(RSQLite)
con <- dbConnect(SQLite()) # use in memory database for testing
# create two tables for purposes of this test
dbWriteTable(con, "BOD", BOD, row.names = FALSE)
dbWriteTable(con, "iris", iris, row.names = FALSE)
# get all table names and columns
tabinfo <- Map(function(tab) names(fn$dbGetQuery(con, "select * from $tab limit 0")),
dbListTables(con))
dbDisconnect(con)
giving an R list whose names are the table names and whose entries are the column names:
> tabinfo
$BOD
[1] "Time" "demand"
$iris
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
or perhaps long form output is preferred:
setNames(stack(tabinfo), c("column", "table"))
giving:
column table
1 Time BOD
2 demand BOD
3 Sepal.Length iris
4 Sepal.Width iris
5 Petal.Length iris
6 Petal.Width iris
7 Species iris
You could use the stringi package for this.
library(stringi)
# Your string vector
myString <- "SELECT *
FROM mydb.table1 m
LEFT JOIN mydb.sometable o ON m.id = o.id
LEFT JOIN mydb.sometable t ON p.id=t.id
LEFT JOIN otherdb.sometable s ON s.column='test'"
# Three stringi functions used
# stringi_extract_all_regex will extract the strings which have FROM or JOIN followed by some text till the next space
# string_replace_all_regex will replace all the FROM or JOIN followed by space with null string
# stringi_unique will extract all unique strings
t <- stri_unique(stri_replace_all_regex(stri_extract_all_regex(myString, "((FROM|JOIN) [^\\s]+)", simplify = TRUE),
"(FROM|JOIN) ", ""))
> t
[1] "mydb.table1" "mydb.sometable" "otherdb.sometable"

python replace string function throws asterix wildcard error

When i use * i receive the error
raise error, v # invalid expression
error: nothing to repeat
other wildcard characters such as ^ work fine.
the line of code:
df.columns = df.columns.str.replace('*agriculture', 'agri')
am using pandas and python
edit:
when I try using / to escape, the wildcard does not work as i intend
In[44]df = pd.DataFrame(columns=['agriculture', 'dfad agriculture df'])
In[45]df
Out[45]:
Empty DataFrame
Columns: [agriculture, dfad agriculture df]
Index: []
in[46]df.columns.str.replace('/*agriculture*','agri')
Out[46]: Index([u'agri', u'dfad agri df'], dtype='object')
I thought the wildcard should output Index([u'agri', u'agri'], dtype='object)
edit:
I am currently using hierarchical columns and would like to only replace agri for that specific level (level = 2).
original:
df.columns[0] = ('grand total', '2005', 'agriculture')
df.columns[1] = ('grand total', '2005', 'other')
desired:
df.columns[0] = ('grand total', '2005', 'agri')
df.columns[1] = ('grand total', '2005', 'other')
I'm looking at this link right now: Changing columns names in Pandas with hierarchical columns
and that author says it will get easier at 0.15.0 so I am hoping there are more recent updated solutions
You need to the asterisk * at the end in order to match the string 0 or more times, see the docs:
In [287]:
df = pd.DataFrame(columns=['agriculture'])
df
Out[287]:
Empty DataFrame
Columns: [agriculture]
Index: []
In [289]:
df.columns.str.replace('agriculture*', 'agri')
Out[289]:
Index(['agri'], dtype='object')
EDIT
Based on your new and actual requirements, you can use str.contains to find matches and then use this to build a dict to map the old against new names and then call rename:
In [307]:
matching_cols = df.columns[df.columns.str.contains('agriculture')]
df.rename(columns = dict(zip(matching_cols, ['agri'] * len(matching_cols))))
Out[307]:
Empty DataFrame
Columns: [agri, agri]
Index: []

Check if a string value matches up with the content of an existing table in Oracle 11G

At the moment I am not working as efficient as I could be. For the problem I have I almost know certain that there is a smarter and better way to fix it.
What I am trying to do:
I got a string like this:
'NL 4633 4809 KTU'
The NL is a country code from an existing table and KTU is an university code from an existing table. I need to put this string in my function and check if the string is validated.
In my function (to validate the string) this is what I am working on. I have managed to split up the string with this:
countryCode := checkISIN; -- checkISIN is the full string ('NL 4633 4809 KTU') and I am giving the NL value to this variable. countryCode is the type varchar2(50)
countryCode := regexp_substr(countryCode, '[^ ]+', 1, 1);
Now that I have the country code as shown below:
NL
Has valid country code
I want to validate/check the country code for it's existence from it's own table. I tried this:
if countryCode in ('NL', 'FR', 'DE', 'GB', 'BE', 'US', 'CA')
then dbms_output.put_line('Has valid country code');
else
dbms_output.put_line('Has invald country code. Change the country code to a valid one');
end if;
This works, but it's not dynamically. If someone adds a country code then I have to change the function again.
So is there a (smart/dynamically) way to check the country codes for their existing tables?
I hope my question is not too vague
Cheers
If you have Country codes table and it looks like this:
ID | NAME
----------
1 | NL
2 | FR
3 | BE
when you parse string, you can make like this :
select count(1)
into v_quan
from CountryCodes cc
where nvl(countryCode,'') = cc.name
if v_quan > 0 then
dbms_output.put_line('Has valid country code');
else
dbms_output.put_line('Has invald country code. Change the country code to a valid one');
end if;