I am looking to fix some tables up using XSLT. I need to use the Colspan attribute but the code I am converting from uses namest and nameend.
example:
<entry namest="col1" nameend="col3">
I need to turn this into <td colspan="3">. I thought about setting variables and then using substring($var,4,1) to get the number at the end of the col3/col1 and then doing math the math- by subtracting the digit from namest from the digit from nameend and then adding one but it didn't work.
If entry is the context node, the following expression returns the difference of the "col" values plus one which should be the colspan value you're looking for:
substring(#nameend, 4) - substring(#namest, 4) + 1
substring(#attr, 4) returns the substring of #attr starting from the fourth character until the end. The substrings are implicitly converted to numbers by the minus operator.
Test of the expression with libxslt's xmllint:
$ echo '<entry namest="col1" nameend="col3"/>' >so.xml
$ xmllint --shell so.xml
/ > cd entry
entry > xpath substring(#nameend, 4) - substring(#namest, 4) + 1
Object is a number : 3
Related
I have div blocks on website like this: <div id="banner-XXX-1"></div>
So I need to query this banner, where XXX is any digit number.
How to do that? Currently I use this way:
//div[contains(#id,'banner-') and contains(#id,'-1')]
But this way is not good if XXX starts with 1. So, is there any way to do like this: //div[contains(#id,'banner-' + <any_decimal> + '-1')]?
It seems match operator on popular Chrome plugin XPath Helper does not work, so I use v1.0
https://chrome.google.com/webstore/detail/xpath-helper/hgimnogjllphhhkhlmebbmlgjoejdpjl?hl=en
XPath 1.0
This XPath 1.0 expression,
//div[ starts-with(#id,'banner-')
and translate(substring(#id, 8, 3), '0123456789', '') = ''
and substring(#id, 11) = '-1']
selects all div elements whose id attribute value
starts with banner-,
followed by 3 digits, which a translate() trick mapped to nothing,
followed by -1,
as requested.
XPath 2.0
This XPath 2.0 expression,
//div[matches(#id,'^banner-\d{3}-1$')]
selects all div elements whose id attribute value matches the shown regex and
starts (^) with banner-,
followed by 3 digits, (\d{3}),
and ends ($) with -1,
as requested.
I'm struggling with extracting 2 float numbers from a string using REGEXP_EXTRACT in Hive. I only want the float numbers and no $ sign.
Input String: value=$110.60-$79.30,
Expected outcome: 110.60 and 79.30
I tried all of these variables, but the result was empty.
(str,'value=$$([0-9]* \ .[0-9]*)\ -$$([0-9]* \ .[0-9]*)', 1)
(str,'value=\$([0-9]* \ .[0-9]*)\ -\$([0-9]* \ .[0-9]*)', 1)
(str,'value=(. * ?)-(. * ?)', 2)
If I make a lengthy sub-query and use SUBSTR, I can get rid of the $ sign, but still doesn't return the 2nd value ($79.30).
QUESTION
What RegEx will achieve my desired output from this input?
It should help
regexp_extract(string `[$]([0-9]+[.][0-9]+)-$([0-9]+[.][0-9]+)`, 1)
regexp_extract(string `[$]([0-9]+[.][0-9]+)-$([0-9]+[.][0-9]+)`, 2)
It matches a number followed with dollar sign and keeps the number inside the first capturing group that is referenced with 1. Then you can take another match using 2.
regexp_extract(string subject, string pattern, int index)
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.
Docs
I have a table with a column po_number of type varchar in Postgres 8.4. It stores alphanumeric values with some special characters. I want to ignore the characters [/alpha/?/$/encoding/.] and check if the column contains a number or not. If its a number then it needs to typecast as number or else pass null, as my output field po_number_new is a number field.
Below is the example:
SQL Fiddle.
I tired this statement:
select
(case when regexp_replace(po_number,'[^\w],.-+\?/','') then po_number::numeric
else null
end) as po_number_new from test
But I got an error for explicit cast:
Simply:
SELECT NULLIF(regexp_replace(po_number, '\D','','g'), '')::numeric AS result
FROM tbl;
\D being the class shorthand for "not a digit".
And you need the 4th parameter 'g' (for "globally") to replace all occurrences.
Details in the manual.
For a known, limited set of characters to replace, plain string manipulation functions like replace() or translate() are substantially cheaper. Regular expressions are just more versatile, and we want to eliminate everything but digits in this case. Related:
Regex remove all occurrences of multiple characters in a string
PostgreSQL SELECT only alpha characters on a row
Is there a regexp_replace equivalent for postgresql 7.4?
But why Postgres 8.4? Consider upgrading to a modern version.
Consider pitfalls for outdated versions:
Order varchar string as numeric
WARNING: nonstandard use of escape in a string literal
I think you want something like this:
select (case when regexp_replace(po_number, '[^\w],.-+\?/', '') ~ '^[0-9]+$'
then regexp_replace(po_number, '[^\w],.-+\?/', '')::numeric
end) as po_number_new
from test;
That is, you need to do the conversion on the string after replacement.
Note: This assumes that the "number" is just a string of digits.
The logic I would use to determine if the po_number field contains numeric digits is that its length should decrease when attempting to remove numeric digits.
If so, then all non numeric digits ([^\d]) should be removed from the po_number column. Otherwise, NULL should be returned.
select case when char_length(regexp_replace(po_number, '\d', '', 'g')) < char_length(po_number)
then regexp_replace(po_number, '[^0-9]', '', 'g')
else null
end as po_number_new
from test
If you want to extract floating numbers try to use this:
SELECT NULLIF(regexp_replace(po_number, '[^\.\d]','','g'), '')::numeric AS result FROM tbl;
It's the same as Erwin Brandstetter answer but with different expression:
[^...] - match any character except a list of excluded characters, put the excluded charaters instead of ...
\. - point character (also you can change it to , char)
\d - digit character
Since version 12 - that's 2 years + 4 months ago at the time of writing (but after the last edit that I can see on the accepted answer), you could use a GENERATED FIELD to do this quite easily on a one-time basis rather than having to calculate it each time you wish to SELECT a new po_number.
Furthermore, you can use the TRANSLATE function to extract your digits which is less expensive than the REGEXP_REPLACE solution proposed by #ErwinBrandstetter!
I would do this as follows (all of the code below is available on the fiddle here):
CREATE TABLE s
(
num TEXT,
new_num INTEGER GENERATED ALWAYS AS
(NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER) STORED
);
You can add to the 'ABCDEFG... string in the TRANSLATE function as appropriate - I have decimal point (.) and a space ( ) at the end - you may wish to have more characters there depending on your input!
And checking:
INSERT INTO s VALUES ('2'), (''), (NULL), (' ');
INSERT INTO t VALUES ('2'), (''), (NULL), (' ');
SELECT * FROM s;
SELECT * FROM t;
Result (same for both):
num new_num
2 2
NULL
NULL
NULL
So, I wanted to check how efficient my solution was, so I ran the following test inserting 10,000 records into both tables s and t as follows (from here):
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
INSERT INTO t
with symbols(characters) as
(
VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
)
select string_agg(substr(characters, (random() * length(characters) + 1) :: INTEGER, 1), '')
from symbols
join generate_series(1,10) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;
The differences weren't that huge but the regex solution was consistently slower by about 25% - even changing the order of the tables undergoing the INSERTs.
However, where the TRANSLATE solution really shines is when doing a "raw" SELECT as follows:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER
FROM s;
and the same for the REGEXP_REPLACE solution.
The differences were very marked, the TRANSLATE taking approx. 25% of the time of the other function. Finally, in the interests of fairness, I also did this for both tables:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
num, new_num
FROM t;
Both extremely quick and identical!
I am trying to write a query that takes in a string, where an equation in the form
x^3 + 0.0046x^2 - 0.159x +1.713
is expected. The equation is used to calculate new values in the output table from a list of existing values. Hence I will need to convert whatever the input equation string is into an equation that postgresql can process, e.g.
power(data.value,3) + 0.0046 * power(data.value,2) - 0.159 * data.value + 1.713
A few comforting constraints in this task are
The equation will always be in the form of a polynomial, e.g. sum(A_n * x^n)
The user will always use 'x' to represent the variable in the input equation
I have been pushing my queries into a string and executing it at the end, e.g.
_query TEXT;
SELECT 'select * from ' INTO _query;
SELECT _query || 'product.getlength( ' || min || ',' || max || ')' INTO _query;
RETURN QUERY EXECUTE _query;
Hence I know I only need to somehow
Replace the 'x''s to 'data.values'
Find all the places in the equation string where a number
immediately precede a 'x', and add a '*'
Find all exponential operations (x^n) in the equation string and
convert them to power(x,n)
This may very well be something very trivial for a lot of people, unfortunately postgresql is not my best skill and I have already spent way more time than I can afford to get this working. Any type of help is highly appreciated, cheers.
Your 9am-noon time frame is over, but here goes.
Every term of the polynomial has 4 elements:
Addition/subtraction modifier
Multiplier
Parameter, always x in your case
Power
The problem is that these elements are not always present. The first term has no addition element, although it could have a subtraction sign - which is then typically connected to the multiplier. Multipliers are only given when not equal to 1. The parameter is not present in the last term and neither is a power in the last two terms.
With optional capture groups in regular expression parsing you can sort out this mess and PostgreSQL has the handy regexp_matches() function for this:
SELECT * FROM
regexp_matches('x^3 + 0.0046x^2 - 0.159x +1.713',
'\s*([+-]?)\s*([0-9.]*)(x?)\^?([0-9]*)', 'g') AS r (terms);
The regular expression says this:
\s* Read 0 or more spaces.
([+-]?) Capture 0 or 1 plus or minus sign.
\s* Read 0 or more spaces.
([0-9.]*) Capture a number consisting of digit and a decimal dot, if present.
(x?) Capture the parameter x. This is necessary to differentiate between the last two terms, see query below.
\^? Read the power symbol, if present. Must be escaped because ^ is the constraint character.
([0-9]*) Capture an integer number, if present.
The g modifier repeats this process for every matching pattern in the string.
On your string this yields, in the form of string arrays:
| terms |
|-----------------|
| {'','',x,3} |
| {+,0.0046,x,2} |
| {-,0.159,x,''} |
| {+,1.713,'',''} |
| {'','','',''} |
(I have no idea why the last line with all empty strings comes out. Maybe a real expert can explain that.)
With this result, you can piece your query together:
SELECT id, sum(term)
FROM (
SELECT id,
CASE WHEN terms[1] = '-' THEN -1
WHEN terms[1] = '+' THEN 1
WHEN terms[3] = 'x' THEN 1 -- If no x then NULL
END *
CASE terms[2] WHEN '' THEN 1. ELSE terms[2]::float
END *
value ^ CASE WHEN terms[3] = '' THEN 0 -- If no x then 0 (x^0)
WHEN terms[4] = '' THEN 1 -- If no power then 1 (x^1)
ELSE terms[4]::int
END AS term
FROM data
JOIN regexp_matches('x^3 + 0.0046x^2 - 0.159x +1.713',
'\s*([+-]?)\s*([0-9.]*)(x?)\^?([0-9]*)', 'g') AS r (terms) ON true
) sub
GROUP BY id
ORDER BY id;
SQLFiddle
This assumes you have an id column to join on. If all you have is a value then you can still do it but you should then wrap the above query in a function that you feed the polynomial and the value. The power is assumed to be integral but you can easily turn that into a real number by adding a dot . to the regular expression and a ::float cast instead of ::int in the CASE statement. You can also support negative powers by adding another capture group to the regular expression and a case statement in the query, same as for the multiplier term; I leave this for your next weekend hackfest.
This query will also handle "odd" polynomials such as -4.3x^3+ 101.2 + 0.0046x^6 - 0.952x^7 +4x just so long as the pattern described above is maintained.
I need to search a cell array and return a single boolean value indicating whether any cell matches a regular expression.
For example, suppose I want to find out if the cell array strs contains foo or -foo (case-insensitive). The regular expression I need to pass to regexpi is ^-?foo$.
Sample inputs:
strs={'a','b'} % result is 0
strs={'a','foo'} % result is 1
strs={'a','-FOO'} % result is 1
strs={'a','food'} % result is 0
I came up with the following solution based on How can I implement wildcard at ismember function of matlab? and Searching cell array with regex, but it seems like I should be able to simplify it:
~isempty(find(~cellfun('isempty', regexpi(strs, '^-?foo$'))))
The problem I have is that it looks rather cryptic for such a simple operation. Is there a simpler, more human-readable expression I can use to achieve the same result?
NOTE: The answer refers to the original regexp in the question: '-?foo'
You can avoid the find:
any(~cellfun('isempty', regexpi(strs, '-?foo')))
Another possibility: concatenate first all cells into a single string:
~isempty(regexpi([strs{:}], '-?foo'))
Note that you can remove the "-" sign in any of the above:
any(~cellfun('isempty', regexpi(strs, 'foo')))
~isempty(regexpi([strs{:}], 'foo'))
And that allows using strfind (with lower) instead of regexpi:
~isempty(strfind(lower([strs{:}]),'foo'))