Related
I just started with python and I am still a newbie , I want to create a function that grabs parts of filenames corresponding to a certain pattern these files are stored in s3 bucket.
So in my case, let's say I have 5 .txt files
Transfarm_DAT_005995_20190911_0300.txt
Transfarm_SupplierDivision_058346_20190911_0234.txt
Transfarm_SupplierDivision_058346_20200702_0245.txt
Transfarm_SupplierDivision_058346_20200703_0242.txt
Transfarm_SupplierDivision_058346_20200704_0241.txt
I want the script to go through these filenames, grab the string "Category i.e "Transfarm_DAT" and date "20190911"" and before the filename extension.
Can you point me in the direction to which Python modules and possibly guides that could assist me?
Check out the split and join functions if your filenames are always like this. Otherwise, regex is another avenue.
files_list = ['Transfarm_DAT_005995_20190911_0300.txt ', 'Transfarm_SupplierDivision_058346_20190911_0234.txt',
'Transfarm_SupplierDivision_058346_20200702_0245.txt', 'Transfarm_SupplierDivision_058346_20200703_0242.txt', 'Transfarm_SupplierDivision_058346_20200704_0241.txt']
category_list = []
date_list = []
for f in files_list:
date = f.split('.')[0].split('_',2)[2]
category = '_'.join([f.split('.')[0].split('_')[0], f.split('.')[0].split('_')[1]])
# print(category, date)
category_list.append(category)
date_list.append(date)
print(category_list, date_list)
Output lists:
['Transfarm_DAT', 'Transfarm_SupplierDivision', 'Transfarm_SupplierDivision', 'Transfarm_SupplierDivision', 'Transfarm_SupplierDivision'] ['005995_20190911_0300', '058346_20190911_0234', '058346_20200702_0245', '058346_20200703_0242', '058346_20200704_0241']
I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end
I am currently learning Python 2.7 and am really impressed by how much it can do.
Right now, I'm working my way through basics such as functions and loops. I'd reckon a more 'real-world' problem would spur me on even further.
I use a satellite recording device to capture TV shows etc to hard drive.
The naming convention is set by the device itself. It makes finding the shows you want to watch after the recording more difficult to find as the show name is preceded with lots of redundant info...
The recordings (in .mts format) are dumped into a folder called "HBPVR" at the root of the drive. I'd be running the script on my Mac when the drive is connected to it.
Example.
"Channel_4_+1-15062015-2100-Exams__Cheating_the_....mts"
or
"BBC_Two_HD-19052015-2320-Newsnight.mts"
I included the double-quotes.
I'd like a Python script that (ideally) would remove the broadcaster name, reformat the date info, strip the time info and then put the show's name to the front of the file name.
E.g "BBC_Two_HD-19052015-2320-Newsnight.mts" ->> "Newsnight 19 May 2015.mts"
What may complicate matters is that the broadcaster names are not all of equal length.
The main pattern is that broadcaster name runs up until the first hyphen.
I'd like to be able to re-run this script at later points for newer recordings and not have already renamed recordings renamed further.
Thanks.
Try this:
import calendar
input = "BBC_Two_HD-19052015-2320-Newsnight.mts"
# Remove broadcaster name
input = '-'.join(input.split("-")[1:])
# Get show name
show = ''.join(' '.join(input.split("-")[2:]).split(".mts")[:-1])
# Get time string
timestr = ''.join(input.split("-")[0])
day = int(''.join(timestr[0:2])) # The day is the first two digits
month = calendar.month_name[int(timestr[2:4])] # The month is the second two digits
year = timestr[4:8] # The year is the third through sixth digits
# And the new string:
new = show + " " + str(day) + " " + month + " " + year + ".mts"
print(new) # "Newsnight 19 May 2015.mts"
I wasn't quite sure what the '2320' was, so I chose to ignore it.
Thanks Coder256.
That has given me a bit more insight into how Python can actually help solve real world (first world!) problems like mine.
It tried it out with some different combos of broadcaster and show names and it worked.
I would like though to use the script to rename a batch of recordings/files inside the folder from time to time.
The script did throw and error when processing an already re-named recording, which is to be expected I guess. Should the renamed file have a special character at the start of its name to help avoid this happening?
e.g "_Newsnight 19 May 2015.mts"
Or is there a more aesthetically pleasing way of doing this, with special chars being added on etc.
Thanks.
One way to approach this, since you have a defined pattern is to use regular expressions:
>>> import datetime
>>> import re
>>> s = "BBC_Two_HD-19052015-2320-Newsnight.mts"
>>> ts, name = re.findall(r'.*?-(\d{8}-\d{4})-(.*?)\.mts', s)[0]
>>> '{} {}.mts'.format(name, datetime.datetime.strptime(ts, '%d%m%Y-%H%M').strftime('%d %b %Y'))
'Newsnight 19 May 2015.mts'
I have an xml file that i need to find, multiply (e.g. by 1.25) and replace all prices.
Price tag looks like that:
<price><![CDATA[15.9]]></price>
The price tag should look like that after the operation:
<price><![CDATA[19.875]]></price>
Can this be done in Notepad++ or PowerGrep using a regular expression?
Thanks in advance.
As far as I know you can't use either program to preform the math but you can build a simple program in most any language of your choice to take a file use regex to find the number. Cast that string as a double do the math and put it back into the string. Later on today I could probably build something in c# but it should be relatively straight forward in most languages. You may even be able to build a shell script and use grep if you're not in a windows environment or use Powershell for windows but I have less experience with Powershell.
Edit: There is an easier way to do this http://msdn.microsoft.com/en-us/library/hcebdtae(v=vs.110).aspx
this is essentially what you want to do using the xmldocument object.
Edit2: I did this even though I couldn't get a hold of the original poster I thought someone might be able to use the info and I learned a lot. I can add the source code to github if anyone is interested.
public static void ChangePricesWork(string filepath, double multiply)
{
var document = new XmlDocument();
document.Load(filepath);
XmlNodeList nodeList = document.GetElementsByTagName("price");
foreach (XmlNode node in nodeList)
{
if (!string.IsNullOrEmpty(node.InnerText))
{
node.InnerText = Convert.ToString(multiplyPrice(multiply, node.InnerText));
}
}
string newFilePath = string.Format(#"{0}\{1}_updated.xml", Path.GetDirectoryName(filepath), Path.GetFileNameWithoutExtension(filepath));
document.Save(newFilePath);
}
private static double multiplyPrice(double multiply, string oldPrice)
{
var newPrice = new double();
if (Double.TryParse(oldPrice, out newPrice))
{
newPrice = newPrice * multiply;
}
return newPrice;
}
Notepad++ has a Pythonscript plugin that allows you to create quick Python scripts that have access to your document and Notepad++ itself.
The install and setup is described in this answer.
The API has moved on a little bit since then, you do a regular expression replace with Editor.rereplace now.
# Start a sequence of actions that is undone and redone as a unit. May be nested.
editor.beginUndoAction()
# multiply_price_cdata
from decimal import *
TWOPLACES = Decimal(10) ** -2
def multiply_price_cdata( m ):
price = Decimal( m.group(2) ) * Decimal( 1.25 )
return m.group(1) + str(price.quantize(TWOPLACES)) + m.group(3)
def cdata( m ):
return "CDATA"
# npp++ search/replace
re_price = r'(<price><!\[CDATA\[)(\d+\.\d+|\d+)(\]\]></price>)'
editor.rereplace( re_price , multiply_price_cdata )
# end the undo sequence
editor.endUndoAction()
I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end