Using regex within a case statement to pull out dynamic content - regex

I am working with URL strings that have the following structure:
URL
page/wa/seattle
page/ca/sandiego
page/mi/detroit
I essentially was wondering if it is possible to use Regex in combination with a case statement to create the following:
Page State City
page wa seattle
page ca sandiego
page mi detroit
I currently have the code written that pulls out which pages are state pages and which pages are city pages.
CASE WHEN (regexp_instr(HITSPAGEPAGEPATH::text, '^/page/[a-z]{2}/[a-z]+'::
CHARACTER VARYING::text))
THEN (regexp_instr(HITSPAGEPAGEPATH::text,
'^/page/[a-z]{2}/[a-z]+'::CHARACTER VARYING::text))
ELSE NULL
END AS city
The part that I can't figure out is what I can put after the "then' to have just the city or state displayed. This is for postgres sql on Amazon redshift using sql workbench if that helps with what syntax to answer in.

No need for a regex, just split the string in 3 elements (separated by /) and use each element as a column:
select elements[1] as page,
elements[2] as state,
elements[3] as city
from (
select string_to_array(hitspagepagepath, '/') as elements
from the_table
) t
order by page;

select hitspagepagepath, split_part(hitspagepagepath,'/',2) as root_url,
split_part(hitspagepagepath,'/',3) as State,
split_part(hitspagepagepath,'/',4) as City
from Table

Related

String matching in URL using Hive / Spark SQL

I have two tables, one containing list of URL and other having a list of words. My requirement is to filter out the URLs containing the words.
For eg:
URL
https://www.techhive.com/article/3409153/65-inch-oled-4k-tv-from-lg-at-a-1300-dollar-discount.html
https://www.techradar.com/in/news/lg-c9-oled-65-inch-4ktv-price-drop
https://www.t3.com/news/cheap-oled-tv-deals-currys-august
https://indianexpress.com/article/technology/gadgets/lg-bets-big-on-oled-tvs-in-india-to-roll-out-rollable-tv-by-year-end-5823635/
https://www.sony.co.in/electronics/televisions/a1-series
https://www.amazon.in/Sony-138-8-inches-Bravia-KD-55A8F/dp/B07BWKVBYW
https://www.91mobiles.com/list-of-tvs/sony-oled-tv
Words
Sony
Samsung
Deal
Bravia
Now I want to filter any URL that has any of the words. Normally i would do a
Select url from url_table where url not like '%Sony%' or url not like '%Samsung%' or url not like '%Deal%' or not like '%Bravia%';
But that's a cumbersome and not scalable way to do it. What is the best way to achieve this? How do I use a not like function to the words table?
Using regex:
where url not rlike '(?i)Sony|Samsung|Deal|Bravia'
(?i) means case insesitive.
And now let's build the same regexp from the table with words.
You can aggregate list of words from the table and pass it to the rlike. See this example:
with
initial_data as (--replace with your table
select stack(7,
'https://www.techhive.com/article/3409153/65-inch-oled-4k-tv-from-lg-at-a-1300-dollar-discount.html',
'https://www.techradar.com/in/news/lg-c9-oled-65-inch-4ktv-price-drop',
'https://www.t3.com/news/cheap-oled-tv-deals-currys-august',
'https://indianexpress.com/article/technology/gadgets/lg-bets-big-on-oled-tvs-in-india-to-roll-out-rollable-tv-by-year-end-5823635/',
'https://www.sony.co.in/electronics/televisions/a1-series',
'https://www.amazon.in/Sony-138-8-inches-Bravia-KD-55A8F/dp/B07BWKVBYW',
'https://www.91mobiles.com/list-of-tvs/sony-oled-tv'
) as url ) ,
words as (-- replace with your words table
select stack (4, 'Sony','Samsung','Deal','Bravia') as word
),
sub as (--aggregate list of words for rlike
select concat('''','(?i)',concat_ws('|',collect_set(word)),'''') words_regex from words
)
select s.url
from initial_data s cross join sub --cross join with words_regex
where url not rlike sub.words_regex --rlike works fine
Result:
OK
url
https://www.techhive.com/article/3409153/65-inch-oled-4k-tv-from-lg-at-a-1300-dollar-discount.html
https://www.techradar.com/in/news/lg-c9-oled-65-inch-4ktv-price-drop
https://indianexpress.com/article/technology/gadgets/lg-bets-big-on-oled-tvs-in-india-to-roll-out-rollable-tv-by-year-end-5823635/
Time taken: 10.145 seconds, Fetched: 3 row(s)
Also you can calculate sub subquery separately and pass it's result as a variable instead of cross join in my example. Hope you got the idea.

Kettle database lookup case insensitive

I've a table "City" with more than 100k records.
The field "name" contains strings like "Roma", "La Valletta".
I receive a file with the city name, all in upper case as in "ROMA".
I need to get the id of the record that contains "Roma" when I search for "ROMA".
In SQL, I must do something like:
select id from city where upper(name) = upper(%name%)
How can I do this in kettle?
Note: if the city is not found, I use an Insert/update field to create it, so I must avoid duplicates generated by case-sensitive names.
You can make use of the String Operations steps in Pentaho Kettle. Set the Lower/Upper option to Y
Pass the city (name) from the City table to the String operations steps which will do the Upper case of your data stream i.e. city name. Join/lookup with the received file and get the required id.
More on String Operations step in pentaho wiki.
You can use a 'Database join' step. Here you can write the sql:
select id from city where upper(name) = upper(?)
and specify the city field name from the text file as parameter. With 'Number of rows to return' and 'Outer join?' you can control the join behaviour.
This solution doesn't work well with a large number of rows, as it will execute one query per row. In those cases Rishu's solution is better.
This is how I did:
First "Modified JavaScript value" step for create a query:
var queryDest="select coalesce( (select id as idcity from city where upper(name) = upper('"+replace(mycity,"'","\'\'")+"') and upper(cap) = upper('"+mycap+"') ), 0) as idcitydest";
Then I use this string as a query in a Dynamic SQL row.
After that,
IF idcitydest == 0 then
insert new city;
else
use the found record
This system make a query for file's row but it use few memory cache

Capture values of a shuttle list

I am using APex 4.2
I have a page (page 31) with a shuttle list. The list contains several job categories (a, e, x, c, etc). I have a button on that page that generates a report based on the selected job category. I click my button, a query runs, and it takes me to a report page (page 27). The query is along these lines
select * from 'table'
where (instr(':'||:P31_JOB_CATEGORY||':',':'||JOB_CATEGORY||':') > 0)
where P31_JOB_CATEGORY is represented by a shuttle list. This gives me the desired results on page 27, however, is there a way to capture each selected job category that was selected in the shuttle list on page 31 and pass it to page 27 to be displayed? It would be nice to have it stored in a concatenated string of some sort for easy handling, i.e. A, E, C, X.
Any help would be greatly appreciated. Thanks in advance.
The reason is in how shuttle values are saved to session state and how the apex URL is constructed.
Shuttle values are constructed, as you have seen, by concatenating the selected values with a colon. So for example, I have a shuttle on ENAME from EMP, select 3 values and submit the page. Session state for the shuttle is: URUGUAY:HOWARD:M BENZ.
Now say that you redirect to another page, setting an item with the value of this shuttle item. The URL will look like this: f?p=54687:6:100741653098795::NO::P6_TEXT:URUGUAY:HOWARD:M BENZ
The apex URL is constructed by using colons. You putting values in there which has colons simply doesn't work for apex.
Solution? You could submit the page and use a computation to replace the colons, and then branch away to the destination page.
For example, with my enames I'm replacing the colon with a ~: URUGUAY~HOWARD~M BENZ. On your destination page you can then use this value and adapt your sql or use a computation in before/after header to replace the seperator again.

How to parse through a column in Pig to create additional columns

New Apache Pig user here. I basically have data in a format and need to split this into 6 columns to create my desired schema and then load into Pig for my existing script to run.
Sorry if the format below is untidy, i cant upload a picture due to reputation score.
Existing format has 3 columns
User-Equipment values::key:bytearray values:value:bytearray
user1-mobile 20130306-AC 9
user1-mobile 20130306-AT 21
user2-laptop 20130306-BC 0
Required format:
User Equipment Date Type "Count or Time" Value
user1 mobile 20130306 A C 9
user1 mobile 20130306 A T 21
Any suggestions on how to ge this done? IS there a regex I need to write?
The tricky thing here is all the columns have a delimiter (-) between them except "Type" and column "C or T"
If you don't have a common delimiter I can think of two possibilities:
You could implement your own LoadFunc as explained here: http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html
You could use REGEX_EXTRACT_ALL as explained here: Apache Pig: Extra query parameters from web log
Here you go for 2.:
A = LOAD 'abc.txt' AS (line:CHARARRAY);
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (User:CHARARRAY,Equipment:CHARARRAY,Date:CHARARRAY,Type:CHARARRAY,CountorTime:CHARARRAY,Value:CHARARRAY);

Rails 4 + MongoDB + Search query LIKE does not give correct output

In Rails, I am trying to fetch data from mongodb using LIKE query by providing regular expression but even though not getting the correct output.
Model : User
_id, name, display_name, age, address, nick_name
a1, Johny, Johny K, 12, New York, John
b1, James, James Waltor, 15, New York, James
c1, Joshua, Joshua T, 13, California, Josh
Now I have 3 set of records.
Query 1 : Search User having 'Jo' as keyword in initial name
User.where(name: /^jo/i)
Output - Only One record - instead of two.
Query 2 :- Match the text with all column values
User.where($where: /^jo/i)
Not getting the proper output.
Ok on the Query 1, can you output the documents. I believe one of your records in 'name' has a character in front of it such as white space. I just run the same query locally and it pulled multiple records back.
Try this:
User.where(name/(.*)jo(.*)/i).count and see what that returns. It should match 2. If that works, then you'll need to look at what is incorrect with the store value.
On Query 2, where have you seen this syntax. The $where is expecting a string of a js function to execute to match records. In your case to match any field within the document with an expression you would need to do a recursive function across each field in each document.
For Query 2 to match against all fields
One solution, although inefficient, is to do it within the Rails app instead of Mongodb query.
e.g.
User.all.select do | user | user.attributes.values.grep(/^jo/i).any? end