I'm trying to extract the product name from the URL between the 2 slashes using REGEXP_EXTRACT. For example, I want to extraxt ace-5 from the URLs below:
www.abc.com/products/phones/ace-5/
www.abc.com/products/phones/ace-5/?cid=dm66363&bid
www.abc.com/products/phones/ace-5/?fbclid=iwar30dpnmmpwppnla7
www.abc.com/products/phones/ace-5/?et_cid=em_367029&et_rid=130
I have a RegEx to extract the Domain Name but it is not something I'm actually looking for. Below is the RegEx:
REGEXP_EXTRACT(page,'^[^.]+.([^.]+)')
It gives the following result: abc
Assuming that the product name would always be the fixed fourth path element, we can try:
REGEXP_EXTRACT(page, '(?:[^\/]+\/){3}([^\/]+).*')
or, if the above would not work:
REGEXP_EXTRACT(page, '[^\/]+\/[^\/]+\/[^\/]+\/([^\/]+).*')
Here is a demo for the above:
Demo
Since I do not have the Same Page with my GDS, but I tried to recreate with my set of data source i.e pages from the google analytics.
Use may use the below which will get you all the records after two slash as per your requirement.
REGEXP_EXTRACT(Page,'[^/]+/[^/]+/([^/]+)')
You need to create a calculated column with this formula, once you have created this calculated column you might need to add an additional filter to remove those with the null value.
example Page: "/products/phones/ace-5/"
The Calculated Column value will be "ace-5"
Just make sure this regex will only give you the extracted word after phones/, if you do not have any record after that it will give you null in return.
The REGEXP_EXTRACT Calculated Field below does the trick, extracting all characters after the 3rd / till the next instance of /:
REGEXP_EXTRACT(Page, "^(?:[^/]+/){3}([^/]+)")
Google Data Studio Report and a GIF to elaborate
During importing a CSV file I want to transform one column with money values so that it will insert them into database without problem.
I have values such as "134,245.99 RUB" and the output should be "134,245.99" or "134245.99" at best.
I tried doing it using transformation but there is no documentation (sic!) on that subject from Oracle how to use it.
Do you have any ideas?
#tweant: You can use regexp_replace function and do this easily. Here's an example:
select trim(regexp_replace(' 2345abc ','\D*$','')) as str from dual;
This will remove all the non digit characters from the end and trim the white spaces.
More information about the function here.
I have some cells in openoffice calc which contain links/URLs. They display, of course, in calc as text, and hovering the mouse shows the URL. Clicking on those cells brings up the URL referenced.
I want to match a string in the displayed text. The below shows the spreadsheet:
spreadsheet
Cell A1 contains the string searched for.
Cells A4:A7 contain the links/URLs.
Cells B4:B7 are copies of A4:A7 but with Default format to remove the link/URLs. Cell B3 contains my match formula, which successfully finds the string in B4:B7.
I've tried the following in cell A3 to find the string in A4:A7
`=MATCH("^"&A1&".*";B4:B7;0)` #only works on the default formatted cells.
`=MATCH(".*"&A1&".*";A4:A7;0)` #
`=MATCH(A1&".*";A4:A7;0)` #
`=MATCH(A1;A4:A7;0)` #
Also, tried several other regular expressions, none of which work. Yes, I'm rusty on regex's, but what am I doing wrong? Or, is the literal string actually not present in the search field unless I change the format?
All the problems with the searches were caused by the fact that
'Search criteria = and <> must apply to whole cells'
was enabled in Tools->Options->Openoffice Calc->Calculate.
Turning this setting off makes everything work as advertised. The clue was that the regex ".*"&A1&".*", which of course matches a full line of plain text, worked with the range B4:B7.
The simplest solution is the expression:
=MATCH(""&A1;A4:A7;0) # "" invoked to trigger regex
Given an index where the values of a property 'nodeName' reflect the list below, how can I use Lucene to return only nodes with an exactly matched name?
foo
bar
foobar
foo foo bar
If I search 'bar', I only want the second node returned.
I thought I could use regex in the search term (something like "+nodeName:\"/^{0}$\" where {0} is the query) to match on the start and end of the string, but that's not working - it returns all nodes that include the query.
Also tried an inclusive range ("+nodeName: [{0} TO {0}]") which returned nothing.
Regex query isn't really going to help you here. The regex in your query can not span multiple analyzed terms. The best way to ensure that a match spans the entire contents of a field is to index it in a way that facilitates that, that is, as a single token. I'm assuming this is a TextField using StandardAnalyzer, or something like it. In order to match against the whole input, a StringField would be a good choice, which would index the entire field as one token. Then a simple TermQuery could be used for this sort of search:
TermQuery("nodeName","bar") Would match only the document specified, rather than multiples
TermQuery("nodeName","foo foo bar") Would also match the last example, rather than none at all.
If you also need to be able to perform more standard (full-text) searches against analyzed text in this field, I would recommend indexing the same content in two separate fields, one StringField and one TextField.
I'm rendering a list in an HTML template using {{ my_list | join:"<\br>"}} , and it appears as...
$GPGGA,062511,2816.8178,S,15322.3185,E,6,04,2.6,72.6,M,37.5,M,,*68
$GPGGA,062512,2816.8177,S,15322.3184,E,1,04,2.6,72.6,M,37.5,M,,*62
$GPGGA,062513,2816.8176,S,15322.3181,E,1,04,2.6,72.6,M,37.5,M,,*67
$GPGGA,062514,2816.8176,S,15322.3180,E,1,03,2.6,72.6,M,37.5,M,,*66
$GPGGA,062515,2816.8176,S,15322.3180,E,6,03,2.6,72.6,M,37.5,M,,*60
I am attempting to use regular expressions to insert the CSS at the 4th and 5th commas so I can highlight the text in this column, however I'm not able to figure out the expression to do this. Other methods to achieve this also appreciated.
Other info:
1) each line ends with a '\n'. Although this can be removed and the HTML display is unchanged, I've left it in for the regular expression to use if required.
2) The string will not always have a nice header such as '$GPGGA' in this example, although I could add one to help ID the start of the line if required by the regex.
3) The columns may not be a uniform number of characters as indicated in this example.
The filters I'm working on are as follows
#register.filter(is_safe=True)
def highight_start(text):
return re.sub('regex to find 4th comma in each line', ",<span class='my_highlight'>", text, flags=re.MULTILINE)
#register.filter(is_safe=True)
def highight_end(text):
return re.sub('regex to find 5th comma in each line', "</span>,", text, flags=re.MULTILINE)
Regards
You can achieve that by replacing the 5th value with the value itself wrapped in your <span> tags.
RegEx: ^((?:[\w\d\.\$]+,){4})([\d\.]+)
Replacement: \1<span class='my_highlight'>\2</span>
Explained demo here: http://regex101.com/r/cX5iA0
Note: I assumed the 5th value will be digits and dots
Thanks #ka, who got me ontrack with this solution. My working filter uses:
expression = '^((?:[^,]+,){4})([^,]+)'
replace = r'\g<1><span class="my_highlight">\g<2></span>'
#[^,] also allows matching of hidden HTML tags in the text
#To get the groups to insert back into the text and not be overwritten, they need to be referenced as indicated in 'replace'.