regex to filter results sqlite

regex to filter results sqlite - regex

I have data in sql lite table called file_path like following
full_path
---------
H:\new.docx
H:\outer
H:\outer\inner1
H:\outer\inner2
H:\outer\inner1\inner12
H:\new.docx
H:\outer\in1.pdf
H:\outer\inner1\in11.jpg
H:\outer\inner1\inner12\in121.wma
H:\new1.doc
H:\new2.rtf
H:\new.txt
I want to get the rows which are direct child of "H" means I do not want files/folders which are inside a folder. Is it possible using regex?

You'd look for rows that start with h:, but contain only one \ character (no subfolders).
So:
select * from file_path where
(full_path like 'h:%') and
not (full_path like '%\%\%');

Related

replace expression format xx-xx-xxxx_12345678

IDENTIFIER
31-03-2022_13636075
01-04-2022_13650262
04-04-2022_13663174
05-04-2022_13672025
20220099001
11614491_R
10781198
00000000000
11283627_P
11614491_R
-1
how can i remove (only) the "XX-XX-XXXXX_" Part in certain values of a column in SSIS but WITHOUT affecting values that doesn't have this format? For example "21-05-2022_12345678" = "12345678" but the other values i don't want them affected. This are just examples of many rows from this column so i want only the ones that have this format to be affected.
SELECT REVERSE(substring(REVERSE('09-03-2022_13481330'),0,CHARINDEX('_',REVERSE('09-03-2022_13481330'),0)))
result
13481330
but this also affects others values.Also this is in ssms not ssis because i am not sure how to transform this expression in ssis code.
Update : Corrected code in SSIS goes as following:
(FINDSTRING(IDENTIFIER,"__-__-____[_]",1) == 1) ? SUBSTRING(IIDENTIFIER,12,LEN(IDENTIFIER) - 11) : IDENTIFIER

Do you have access to the SQL source? You can do this on the sql by using a LIKE and crafting a match pattern using the single char wildcard _ please see below example
DECLARE #Value VARCHAR(50) = '09-03-2022_13481330'
SELECT CASE WHEN #Value LIKE '__-__-____[_]%' THEN
SUBSTRING(#Value,12,LEN(#Value)-11) ELSE #Value END
Please see the Microsoft Documentation on LIKE and using single char wildcards
If you don't have access to the source SQL it gets a bit more tricky as you might need to use regex in a script task or maybe there is a expression you can apply

Grabbing parts of filename with python & boto3

I just started with python and I am still a newbie , I want to create a function that grabs parts of filenames corresponding to a certain pattern these files are stored in s3 bucket.
So in my case, let's say I have 5 .txt files
Transfarm_DAT_005995_20190911_0300.txt
Transfarm_SupplierDivision_058346_20190911_0234.txt
Transfarm_SupplierDivision_058346_20200702_0245.txt
Transfarm_SupplierDivision_058346_20200703_0242.txt
Transfarm_SupplierDivision_058346_20200704_0241.txt
I want the script to go through these filenames, grab the string "Category i.e "Transfarm_DAT" and date "20190911"" and before the filename extension.
Can you point me in the direction to which Python modules and possibly guides that could assist me?

Check out the split and join functions if your filenames are always like this. Otherwise, regex is another avenue.
files_list = ['Transfarm_DAT_005995_20190911_0300.txt ', 'Transfarm_SupplierDivision_058346_20190911_0234.txt',
'Transfarm_SupplierDivision_058346_20200702_0245.txt', 'Transfarm_SupplierDivision_058346_20200703_0242.txt', 'Transfarm_SupplierDivision_058346_20200704_0241.txt']
category_list = []
date_list = []
for f in files_list:
date = f.split('.')[0].split('_',2)[2]
category = '_'.join([f.split('.')[0].split('_')[0], f.split('.')[0].split('_')[1]])
# print(category, date)
category_list.append(category)
date_list.append(date)
print(category_list, date_list)
Output lists:
['Transfarm_DAT', 'Transfarm_SupplierDivision', 'Transfarm_SupplierDivision', 'Transfarm_SupplierDivision', 'Transfarm_SupplierDivision'] ['005995_20190911_0300', '058346_20190911_0234', '058346_20200702_0245', '058346_20200703_0242', '058346_20200704_0241']

SPLUNK subsearch 2 CSV Files join together

I have 2 Files with order data saved in two different sourcetypes in splunk.
One file contains an orderid, plnum(praefix + orderid (one ordernumer contains 3 plnum)), model (type of the order). The second file contains the same plnum's and Materialnumbers to those plnum's.
I want to search for the top Materials used for one or more Models.
So I searched for how to setup a subsearch:
sourcetype=file1 [search sourcetype=file2 MODEL="someting"| fields MODEL] |stats values(MATNR) by MODEL
I dont know why the subsearch dont work.

Run the subsearch by itself to verify it works and produces the expected results. I suspect it is working and is returning a list of PLNUMs in the form foo bar baz.... Splunk puts an implicit AND between search terms so your main search is looking for events containing all PLNUMs, which is unlikely.
Try using format in your subsearch. It returns the results in foo OR bar OR baz... format, which should work better in the main search.
sourcetype=file1 [search sourcetype=file2 MODEL="someting"| fields PLNUM | format] |stats values(MATNR) by PLNUM

Regex code how to filter all names that contain only numbers and end with .jpg and/or _number.jpg?

How to filter all names that consist of numbers and end with .jpg and/or _number.jpg?
Background info:
In SSIS 2008 I have a foreach loop that will store the filename into a variable for all jpg files. The enumorator configuration for Files is currently: *.jpg
This will handle all jpg files.
What is the code so it will only handle names likes?:
3417761506233.jpg
3417761506233_1.jpg
5414233177487.jpg
5414233177487_1.jpg
5414233177487_14.jpg
but not names like:
abc.jpg
abc123.jpg
def.png
456.png
The numbers represent EAN codes by the way.
I thought about this code:
\d|_|.jpg
but SSIS returns an error stating there are no files that meet the criteria eventhough the files(names) are in the folder.

You could use a Script Task within the loop to do the regex filtering:
http://microsoft-ssis.blogspot.com/2012/04/regex-filter-for-foreach-loop.html
Or you could use a (free) Third Party Enumerator:
http://microsoft-ssis.blogspot.com/2012/04/custom-ssis-component-foreach-file.html

For that, you can use the following regex:
^\d+(_\d+)?.jpg$
Demo: http://regex101.com/r/qC7oV3

^(\d+(?:_\d+)?\.jpg$)
DEMO --> http://regex101.com/r/dM9rJ7
Matches:
3417761506233.jpg
3417761506233_1.jpg
5414233177487.jpg
5414233177487_1.jpg
5414233177487_14.jpg
Excludes:
abc.jpg
abc123.jpg
def.png
456.png

How to find all the source lines containing desired table names from user_source by using 'regexp'

For example we have a large database contains lots of oracle packages, and now we want to see where a specific table resists in the source code. The source code is stored in user_source table and our desired table is called 'company'.
Normally, I would like to use:
select * from user_source
where upper(text) like '%COMPANY%'
This will return all words containing 'company', like
121 company cmy
14 company_id, idx_name %% end of coding
453 ;companyname
1253 from db.company.company_id where
989 using company, idx, db_name,
So how to make this result more intelligent using regular expression to parse all the source lines matching a meaningful table name (means a table to the compiler)?
So normally we allow the matched word contains chars like . ; , '' "" but not _
Can anyone make this work?

To find company as a "whole word" with a regular expression:
SELECT * FROM user_source
WHERE REGEXP_LIKE(text, '(^|\s)company(\s|$)', 'i');
The third argument of i makes the REGEXP_LIKE search case-insensitive.
As far as ignoring the characters . ; , '' "", you can use REGEXP_REPLACE to suck them out of the string before doing the comparison:
SELECT * FROM user_source
WHERE REGEXP_LIKE(REGEXP_REPLACE(text, '[.;,''"]'), '(^|\s)company(\s|$)', 'i');
Addendum: The following query will also help locate table references. It won't give the source line, but it's a start:
SELECT *
FROM user_dependencies
WHERE referenced_name = 'COMPANY'
AND referenced_type = 'TABLE';

If you want to identify the objects that refer to your table, you can get that information from the data dictionary:
select *
from all_dependencies
where referenced_owner = 'DB'
and referenced_name = 'COMPANY'
and referenced_type = 'TABLE';
You can't get the individual line numbers from that, but you can then either look at user_source or use a regexp on the specific source code, which woudl at least reduce false positives.

SELECT * FROM user_source
WHERE REGEXP_LIKE(text,'([^_a-z0-9])company([^_a-z0-9])','i')
Thanks #Ed Gibbs, with a little trick this modified answer could be more intelligent.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex to filter results sqlite - regex

You'd look for rows that start with h:, but contain only one \ character (no subfolders). So: select * from file_path where (full_path like 'h:%') and not (full_path like '%\%\%');

Related

replace expression format xx-xx-xxxx_12345678

Grabbing parts of filename with python & boto3

SPLUNK subsearch 2 CSV Files join together

Regex code how to filter all names that contain only numbers and end with .jpg and/or _number.jpg?

How to find all the source lines containing desired table names from user_source by using 'regexp'

Categories

Resources