Shorter REGEXP for MySQL query - regex

I want to do a MySQL query to get the following effect:
table_column [varchar]
-----------------------
1|5|7
25
55|12
5
3&5
5|11
I want a reliable way to get all the values where 5 is the complete value.
So, for example, if I do a REGEXP query for the number 5 on the upper table I would like to get all rows except the ones containing "25" and "55|12".
This is the best I've come up with so far:
[^[:digit:]]5[^[:digit:]] | [^[:digit:]]5 | 5[^[:digit:]] | ^5$
is there a shorter way?
Thanks.

Try using word boundaries:
[[:<:]]5[[:>:]]

^.*[^[:digit:]]*5[^[:digit:]]*.*$

Related

Regex - finding optional arguments

I would like to write a regex to extract arguments and operation sign if given or just extract a value in a given formula.
"=400/500" will find 400, /, and 500
"=400" will find 400
So far I tried with group matching approach and a regex like following:
=(.*)(/|\*|\+|-)(.*)
However, that does not work in all cases. For example, I get following:
"=400/500" will find 400, /, and 500 which is exactly what I need
"=400" does not find any matches and I expect to get 400
I tried some modifications to my script but so far without any success.
Thanks for your help in advance!
try
=(.*)(\/)(.*)|=(.*)
if you are going with the any chr wild cards with an "/" deliminator
Try this simplest one, Hope this will be helpful. You can add () at different places to capture all matches in different groups.
Regex demo
Regex: \d+[+*\/-]?\d+
1. [+*\/-]? match for +, -, /, and * any of these operations, and ? makes it optional.
2. \d+ This will match digits one or more digits.
Do you mean something like this one ? I assumed that it will be digits only.
=\d+[+\-*\/]?\d+
Regex Demo
If you would like to use grouping:
=(\d+)([+\-*\/])?(\d+)
Regex Demo
Remove = if you dont want to match it
Thanks to (.*) who responded :-)
Your answers were very quick and insightful. I was not 100% clear in my original question, hence people provided different solutions. I wanted to utilize regex group searching. Expressions are mainly formulae or assignment operation. For example:
=500 // want to get 500 and understand there is no operator
=500+34 // want to get 500 and 34 and understand the operation is addition
=500/100 // want to get 500 and 100 and understand the operation is division
=500-100 //...
=500*100
That way I can easily extract arguments and operator or just a value in case there is no operator. I settled with modified Sahil's answer. For example, now I am using the following:
=(\d+)([+*\/-])?(\d+)?
These are results for the following inputs:
Input string: "=400/500" Result: 0: [0,8] =400/500
1: [1,4] 400
2: [4,5] /
3: [5,8] 500
Input string: "=500" Result: 0: [0,4] =500
1: [1,4] 500
2: [-1,-1] null
3: [-1,-1] null
Input string: "=400/" Result: 0: [0,5] =400/
1: [1,4] 400
2: [4,5] /
3: [-1,-1] null
If I use regex this way based on a number of groups found I can easily figure out type of formula used and therefore extract all the values provided in groups.

how to do a fast regex search on a hdf5 database

I have an HDF5 database with 100 million+ rows of text each storing a simple three column set of values:
ID WORD HEADWORD
1 the the
2 cats cat
3 sat sit
4 on on
5 the the
6 mats mat
...
I want to do a search on the "WORD" column to find all hits for at (i.e., 'cats', 'sat', 'mats').
In some other database (e.g. PostgresQL) I might do this with a simple regex search '?at?'. If I could search the HDF5 index using regex, that would be fine. But, I don't think this is possible. Any suggestions for how to do this kind of 'wildcard' (regex) search quickly?
Try following regex
[^\s]+[\s]+([a-zA-Z]*at[a-zA-Z]*)[\s]+[^\s]+
Group 1 in above regex will give you desired result.
"WORD" column to find all hits for at (i.e., 'cats', 'sat', 'mats').
Debuggex Demo
Regex Demo

R: grepl select first charachter on a string

I apologize in advance, this might be a repeat question. However, I just spent the two last hours over stackoverflow, and can't seem to find a solution.
I want to use grepl to detect rows that begin with a digit, that's what I tried to use but It didn't give me the rigt answer:
grep.numeric=as.data.frame(grepl("^[:digit:]",df_mod$name))
I guess that the problem is from the regular expression "^[:digit:]", but I couldn't figure it out.
UPDATE
My dataframe looks like this, It's huge, but below is an example:
ID mark name
1 whatever name product
2 whatever 10 product
3 whatever 250 product
4 another_mark other product
I want to detect products which their names begin with a number.
UPDATE 2
applying grep.numeric=grepl("^[[:digit:]]",df_mod$name) on the example below give me the right answer which is:
grep.numeric
[1] FALSE TRUE TRUE FALSE
But, what drive me crazy is when I pply this fuction to my real dataframe:
grep.numeric=grepl("^[[:digit:]]",df_mod[217,]$nom)
give me this result:
grep.numeric
[1] FALSE
But actually, what I have is this :
df_mod[217,]$nom
[1] 100 lipo 30 gélules
Please help me.
Apparently, some of your values have leading spaces, so you could either modify your regex to (or something similar)
grepl("^\\s*[[:digit:]]", df_mod$name)
Or use the built in trimws function
grepl("^[[:digit:]]", trimws(df_mod$name))

Pattern Matching in MS Access: Is there an "or" operator?

I've tried searching online for the answer to this, but my Google-fu has failed me.
I have an Access database containing records represented by a string. The first 3 characters of that string are a 3-digit representation of the 366-day calendar date on which the record was created (000-366...yes, leap days count).
I'm having trouble coming up with the correct pattern match to include in a query that matches a 3-digit substring that can be between 000 and 366, where you don't lose the significant figures.
I know the query would be something like:
SELECT * FROM myTable WHERE Field1 LIKE "^[0-2]## or 3[0-5]# or 36[0-6]*";
...but I can't find any resource that says, in MS Access, what the "or" operator is. I tried "||" (double pipe) and "|" (single pipe), neither of which worked.
Is there an "or" operator that can be used with a MS Access pattern match?
The LIKE operator in Access is pretty limited, and doesn't support most of the features more 'fully-fledged' regular expression engines provide.
Instead, use multiple conditions in your WHERE clause like this:
SELECT *
FROM myTable
WHERE Field1 LIKE "[0-2]##*" OR
Field1 LIKE "3[0-5]#*" OR
Field1 LIKE "36[0-6]*"
Another alternative is to simply extract the first 3 characters to a string, convert them to an integer and test to see if their value is within the acceptable range.
Why not just pull the first three characters?
SELECT * FROM myTable WHERE CInt(Left(Field1,3)) <= 366
http://www.techonthenet.com/access/functions/datatype/cint.php

Using regexp with sphinx

I need to make an algorythm that allows me to use uncertain (regexp) search in sphinx.
For example: i need to find a phrase that contains uncertain symbols: "2x4" maybe look like "2x4" or "2*4" or "2-4".
I want to do something like this: "2(x|*|-)4". But if i try to use this construction in query, sphinx split it on three words: "2", "(x|*|-)" and "4":
$ search -p "2x4"
...
index 'xxx': query '2x4 ': returned 25 matches of 25 total in 0.000 sec
...
words:
1. '2x4': 25 documents, 25 hits
$ search -p "2(x|y)4"
...
index 'xxx': query '2(x|y)4 ': returned 0 matches of 0 total in 0.000 sec
words:
1. '2': 816 documents, 842 hits
2. 'x': 21 documents, 21 hits
3. 'y': 0 documents, 0 hits
4. '4': 2953 documents, 3014 hits
Like ugly hack I cat do something like (2x4)|(2*4)|(2-4), but this is not good solution if I get a big phrase like "2x4x2.2" and need "2(x|*|-)4(x|*|-)2(.|,)2".
I can use "charset_table" option to define "*>x","->x",",>." and so on, but this is not flexible decision.
Can you find a better solution?
ps: sorry for my english =)
From what I've read, Sphinx doesn't support regex searches. Moreover, while the extended syntax (enabled with the -e option) has operators that support alternatives (the "OR" operator: |) and sequencing (the strict order operator: <<), they only work on words, not atoms, so that 2 << (x|*|-) << 4 would match strings where each element is a separate word, such as '2 x 4', '2 * 4'.
One option is to write a utility that converts a pattern of the form 2(x|*|-)4(x|*|-)2(.|,)2 (or, to follow the regex idiom, 2[-*x]4[-*x]2[.,]2) into a Sphinx extended query.
You can indeed use regular expressions with Sphinx.
While they cannot be used at search time, they can be used while building the index to identify a group of words/symbols that should be considered to be the same token.
http://sphinxsearch.com/docs/current.html#conf-regexp-filter
# index '13-inch' as '13inch'
regexp_filter = \b(\d+)\" => \1inch
# index 'blue' or 'red' as 'color'
regexp_filter = (blue|red) => color
Sphinx indexes whole words - and 'tokenizes' the word into an integer that is then stored in the index. As such regular expressions can't work because dont have the original words.
However there is dict=keywords - which does store the words in an index. But this can only right now be used for * and ? wildcards, doesnt support regular expressions.
Also, perhaps could use the techniques discussed here
http://swtch.com/~rsc/regexp/regexp4.html
This shows how generic regex searching can be implemented with a trigram index. Sphinx
itself would work as the trigram index. You store the trigrams as keywords which then
sphinx indexes. Sphinx can run the boolean queries taht that system outputs.
(normal sphinx, works pretty much like the 'Indexed Word Search' section documents. So
the trick would be using sphinx as the backend for the indexed Reg-Ex Search)