SQLite extract string from text in column - regex

I have a Spatialite Database and I've imported OSM Data into this database.
With the following query I get all motorways:
SELECT * FROM lines
WHERE other_tags GLOB '*A [0-9]*'
AND highway='motorway'
I use GLOB '*A [0-9]*' here, because in Germany every Autobahn begins with A, followed by a number (like A 73).
There is a column called other_tags with information about the motorway part:
"bdouble"=>"yes","hazmat"=>"designated","lanes"=>"2","maxspeed"=>"none","oneway"=>"yes","ref"=>"A 73","width"=>"7"
If you look closer there is the part "ref"=>"A 73".
I want to extract the A 73 as the name for the motorway.
How can I do this in sqlite?

If the format doesn't change, that means that you can expect that the other_tags field is something like %"ref"=>"A 73","width"=>"7"%, then you can use instr and substr (note that 8 is the length of "ref"=>"):
SELECT substr(other_tags,
instr(other_tags, '"ref"=>"') + 8,
instr(other_tags, '","width"') - 8 - instr(other_tags, '"ref"=>"')) name
FROM lines
WHERE other_tags GLOB '*A [0-9]*'
AND highway='motorway'
The result will be
name
A 73

Check with following condition..
other_tags like A% -- Begin With 'A'.
abs(substr(other_tags, 3,2)) <> 0.0 -- Substring from 3rd character, two character is number.
length(other_tags) = 4 -- length of other_tags is 4
So here is how your query should be:
SELECT *
FROM lines
WHERE other_tags LIKE 'A%'
AND abs(substr(other_tags, 3,2)) <> 0.0
AND length(other_tags) = 4
AND highway = 'motorway'

Related

Merging two Pandas Dataframes using Regular Expressions

I'm new to Python and Pandas but I try to use Pandas Dataframes to merge two dataframes based on regular expression.
I have one dataframe with some 2 million rows. This table contains data about cars but the model name is often specified in - lets say - a creative way, e.g. 'Audi A100', 'Audi 100', 'Audit 100 Quadro', or just 'A 100'. And the same for other brands. This is stored in a column called "Model". In a second model I have the manufacturer.
Index
Model
Manufacturer
0
A 100
Audi
1
A100 Quadro
Audi
2
Audi A 100
Audi
...
...
...
To clean up the data I created about 1000 regular expressions to search for some key words and stored it in a dataframe called 'regex'. In a second column of this table I save the manufacture. This value is used in a second step to validate the result.
Index
RegEx
Manufacturer
0
.* A100 .*
Audi
1
.* A 100 .*
Audi
2
.* C240 .*
Mercedes
3
.* ID3 .*
Volkswagen
I hope you get the idea.
As far as I understood, the Pandas function "merge()" does not work with regular expressions. Therefore I use a loop to process the list of regular expressions, then use the "match" function to locate matching rows in the car DataFrame and assign the successfully used RegEx and the suggested manufacturer.
I added two additional columns to the cars table 'RegEx' and 'Manufacturer'.
for index, row in regex.iterrows():
cars.loc[cars['Model'].str.match(row['RegEx']),'RegEx'] = row['RegEx']
cars.loc[cars['Model'].str.match(row['RegEx']),'Manufacturer'] = row['Manfacturer']
I learnd 'iterrows' should not be used for performance reasons. It takes 8 minutes to finish the loop, what isn't too bad. However, is there a better way to get it done?
Kind regards
Jiriki
I have no idea if it would be faster (I'll be glad, if you would test it), but it doesn't use iterrows():
regex.groupby(["RegEx", "Manufacturer"])["RegEx"]\
.apply(lambda x: cars.loc[cars['Model'].str.match(x.iloc[0])])
EDIT: Code for reproduction:
cars = pd.DataFrame({"Model": ["A 100", "A100 Quatro", "Audi A 100", "Passat V", "Passat Gruz"],
"Manufacturer": ["Audi", "Audi", "Audi", "VW", "VW"]})
regex = pd.DataFrame({"RegEx": [".*A100.*", ".*A 100.*", ".*Passat.*"],
"Manufacturer": ["Audi", "Audi", "VW"]})
#Output:
# Model Manufacturer
#RegEx Manufacturer
#.*A 100.* Audi 0 A 100 Audi
# 2 Audi A 100 Audi
#.*A100.* Audi 1 A100 Quatro Audi
#.*Passat.* VW 3 Passat V VW
# 4 Passat Gruz VW

Rearrange words in a given sentence in python

I'm very new to python.
I get a serial data in COM port in fixed format as a string like this:
"21-12-2015 10:12:05 005 100 10.5 P"
The format is 'date time id count data data'
Here i don't require count and first data, instead i want to add one more data and send this again through another COM port.
I want to rearrange this and give output as
21-12-2015 10:12:05
SI.NO: 1451
Result: 10.5 P
My attempt:
ip = '21-12-2015_10:12:05_005_100_10.5 P'
dt = ip[0]+ip[1]+ip[3]+..... #save date as dt
tm = ip[9]+ip[10]+ip[11]+.... etc
and at the end
Result = dt + tm +"\n" + " "+ "SI.NO"+.......
Please suggest some good concept to do this in python 2.7.11
If you can mention some ideas i will search for the code.
Thank you
You can split up your string on whitespace into fields with split and build a new string using Python's string formatting syntax:
ip = "21-12-2015 10:12:05 005 100 10.5 P"
fields = ip.split()
s = '{date} {time}\n SI.NO: {sino}\n Result: {x} {y}'.format(
date=fields[0],
time=fields[1],
sino=1451, # Provide your own counter here
x=fields[4],
y=fields[5])
print s
21-12-2015 10:12:05
SI.NO: 1451
Result: 10.5 P
It isn't clear from your question whether your fields are separated by spaces or underscores. In the latter case, use fields = ip.split('_').

In R, use regular expression to match multiple patterns and add new column to list

I've found numerous examples of how to match and update an entire list with one pattern and one replacement, but what I am looking for now is a way to do this for multiple patterns and multiple replacements in a single statement or loop.
Example:
> print(recs)
phonenumber amount
1 5345091 200
2 5386052 200
3 5413949 600
4 7420155 700
5 7992284 600
I would like to insert a new column called 'service_provider' with /^5/ as Company1 and /^7/ as Company2.
I can do this with the following two lines of R:
recs$service_provider[grepl("^5", recs$phonenumber)]<-"Company1"
recs$service_provider[grepl("^7", recs$phonenumber)]<-"Company2"
Then I get:
phonenumber amount service_provider
1 5345091 200 Company1
2 5386052 200 Company1
3 5413949 600 Company1
4 7420155 700 Company2
5 7992284 600 Company2
I'd like to provide a list, rather than discrete set of grepl's so it is easier to keep country specific information in one place, and all the programming logic in another.
thisPhoneCompanies<-list(c('^5','Company1'),c('^7','Company2'))
In other languages I would use a for loop on on the Phone Company list
For every row in thisPhoneCompanies
Add service provider to matched entries in recs (such as the grepl statement)
end loop
But I understand that isn't the way to do it in R.
Using stringi :
library(stringi)
recs$service_provider <- stri_replace_all_regex(str = recs$phonenumber,
pattern = c('^5.*','^7.*'),
replacement = c('Company1', 'Company2'),
vectorize_all = FALSE)
recs
# phonenumber amount service_provider
# 1 5345091 200 Company1
# 2 5386052 200 Company1
# 3 5413949 600 Company1
# 4 7420155 700 Company2
# 5 7992284 600 Company2
Thanks to #thelatemail
Looks like if I use a dataframe instead of a list for the phone companies:
phcomp <- data.frame(ph=c(5,7),comp=c("Company1","Company2"))
I can match and add a new column to my list of phone numbers in a single command (using the match function).
recs$service_provider <- phcomp$comp[match(substr(recs$phonenumber,1,1), phcomp$ph)]
Looks like I lose the ability to use regular expressions, but the matching here is very simple, just the first digit of the phone number.

fetching multiple values from a string using regular expression

I have a table temp that have a column name "REMARKS"
Create script
Create table temp (id number,remarks varchar2(2000));
Insert script
Insert into temp values (1,'NAME =GAURAV Amount=981 Phone_number =98932324 Active Flag =Y');
Insert into temp values (2,'NAME =ROHAN Amount=984 Phone_number =98932333 Active Flag =N');
Now , i want to fetch the corresponding value of NAME ,Amount ,phone_number, active_flag from the remarks column of the table.
I thought of using regular expression ,but i am not comfortable in using it .
I tried with substr and instr to fetch the name from the remakrs column ,but if i want to fetch all four, i need to write a pl sql .Can we achieve this using Regular expression.
Can i get output(CURSOR) like
id Name Amount phone_number Active flag
------------------------------------------
1 Gaurav 981 98932324 Y
2 Rohan 984 98932333 N
-------------------------------------------
Thanks for your help
you can use something like :
SQL> select regexp_replace(remarks, '.*NAME *=([^ ]*).*', '\1') name,
2 regexp_replace(remarks, '.*Amount *=([^ ]*).*', '\1') amount,
3 regexp_replace(remarks, '.*Phone_number *=([^ ]*).*', '\1') ph_number,
4 regexp_replace(remarks, '.*Active Flag *=([^ ]*).*', '\1') flag
5 from temp;
NAME AMOUNT PH_NUMBER FLAG
-------------------- -------------------- -------------------- --------------------
GAURAV 981 98932324 Y
ROHAN 981 98932324 N

VBA Excel : Read file, and stock images in a column

I am new to VBA (I mean, REALLY new) and I would like you to give me some tips.
I have an Excel file with 2 columns: SKU and media_gallery
I also have images stocked in a folder (lets name it /imageFolder)
I need to parse the imageFolder and look for ALL images sarting by SKU.jpg , and put them into the media_gallery column separated by a semicolon ( ; )
Example: My SKU is "1001", I need to parse the image folder for all images starting by 1001 (all image have this pattern: 1001-2.jpg , 1001-3.jpg etc...)
I can do that in Java or C# but I want to give a chance to VBA. :)
How can I do that?
EDIT: I only need file names yes! And I should of said that I have 20 000 images in my folder, and 8000 SKUs , so I don't know how we can handle looping on 20 000 images names.
EDIT2: If SKU contains a dash ( - ), I don't need to treat it, so I can pass to the next SKU. And each SKU has a maximum of 5 images (....;SKU-5.jpg)
Thanks all.
How to insert images given you have one image name per cell in a column: How to get images to appear in Excel given image url
Take the above and introduce an inner loop for the file name:
if instr(url_column.Cells(i).Value, "-") = 0 then
dim cur_file_name as string
cur_file_name = dir("imageFolder\" & url_column.Cells(i).Value & "*.jpg")
do until len(cur_file_name) = 0
image_column.Cells(i).Value = image_column.Cells(i).Value & cur_file_name & ";"
cur_file_name = Dir
loop
end if