Multiple IF AND statements in one line with OpenOffice Calc - if-statement

I'm using multiple IF AND statements in one cell and finding operator missing error 509. It worked with fewer variables just not sure the syntax is correct for calc here.
tried using nested statement with error 509 returned as well.
=IF(M5="Statement 1";L5;K5)IF(AND(M6="Tax";A5=A6); | L6;K6);IF(AND(M7="Discounts";A7=A6); | L7;K7);IF(AND(M8="Alternate";A8=A7); | L8;K8);IF(AND(M9="Other";A9=A8); | L9;K9);IF(AND(M10="Local";A10=A9); | L10;K10);IF(AND(M11="State";A11=A10); | L11;K11)
Desired outcome: Trying to get this output pending all values are true: L5 | L6 | L7 | L8 | L9 | L10 | L11
Current outcome: Error:509 which is Operator Missing error.

=IF(M5="Statement 1";L5;K5) missing& IF(AND(M6="Tax";A5=A6); missing & followed by quotes | missing quotes followed by & L6;K6) ; does not belong here should be &;IF(AND(M7="Discounts";A7=A6); | L7;K7);IF(AND(M8="Alternate";A8=A7); | L8;K8);IF(AND(M9="Other";A9=A8); | L9;K9);IF(AND(M10="Local";A10=A9); | L10;K10);IF(AND(M11="State";A11=A10); | L11;K11)
=if(M5="Statement 1";L5;K5)&if(AND(M6="Tax";A5=A6); "|" & L6;K6)&if(AND(M7="Discounts";A7=A6); "|" & L7;K7)&if(AND(M8="Alternate";A8=A7); "|" & L8;K8)&if(AND(M9="Other";A9=A8); "|" & L9;K9)&if(AND(M10="Local";A10=A9); "|" & L10;K10)&if(AND(M11="State";A11=A10); "|" & L11;K11)

Related

What is the maximum length of a regex expression?

In PostgreSQL, I want to exclude rows if the desc field contains any forbidden words.
items:
| id | desc |
|----|------------------|
| 1 | apple foo cat bar|
| 2 | foo bar |
| 3 | foocatbar |
| 4 | foo dog bar |
The forbidden words list is stored in another table, currently it has 400 words to check.
forbidden_word_table:
| word |
|---------|
| apple |
| boy |
| cat |
| dog |
| .... |
SQL query:
select id, desc
from items
where
desc !~* (select '\y(' || string_agg(word, '|') || ')\y' from forbidden_word_table)
I am checking if desc does not match the regex expression:
desc !~* '\y(apple|boy|cat|dog|.............)\y'
Results:
| id | desc |
|----|------------------|
| 2 | foo bar |
| 3 | foocatbar |
** 3rd is not excluded since cat is not a single word
My forbidden_word_table will likely grow with many rows, the above regex will become a very lengthy expression.
Do regex expressions have a maximum length limit (in bytes or characters)? I'm afraid of my regex matching approach will not work if forbidden_word_table keeps growing.
Seems, that Wiktor Stribiżew is right about "catastrophic backtracking".
I'd suggest to use ILIKE and ANY:
SELECT *
FROM items i
WHERE NOT i."desc" ILIKE ANY
(
SELECT '%' || word || '%'
FROM forbidden_word_table
);
db-fiddle

Remove all Unicode space separators in PostgreSQL?

I would like to trim() a column and to replace any multiple white spaces and Unicode space separators to single space. The idea behind is to sanitize usernames, preventing 2 users having deceptive names foo bar (SPACE u+20) vs foo bar(NO-BREAK SPACE u+A0).
Until now I've used SELECT regexp_replace(TRIM('some string'), '[\s\v]+', ' ', 'g'); it removes spaces, tab and carriage return, but it lack support for Unicode space separators.
I would have added to the regexp \h, but PostgreSQL doesn't support it (neither \p{Zs}):
SELECT regexp_replace(TRIM('some string'), '[\s\v\h]+', ' ', 'g');
Error in query (7): ERROR: invalid regular expression: invalid escape \ sequence
We are running PostgreSQL 12 (12.2-2.pgdg100+1) in a Debian 10 docker container, using UTF-8 encoding, and support emojis in usernames.
I there a way to achieve something similar?
Based on the Posix "space" character-class (class shorthand \s in Postgres regular expressions), UNICODE "Spaces", some space-like "Format characters", and some additional non-printing characters (finally added two more from Wiktor's post), I condensed this custom character class:
'[\s\u00a0\u180e\u2007\u200b-\u200f\u202f\u2060\ufeff]'
So use:
SELECT trim(regexp_replace('some string', '[\s\u00a0\u180e\u2007\u200b-\u200f\u202f\u2060\ufeff]+', ' ', 'g'));
Note: trim() comes after regexp_replace(), so it covers converted spaces.
It's important to include the basic space class \s (short for [[:space:]] to cover all current (and future) basic space characters.
We might include more characters. Or start by stripping all characters encoded with 4 bytes. Because UNICODE is dark and full of terrors.
Consider this demo:
SELECT d AS decimal, to_hex(d) AS hex, chr(d) AS glyph
, '\u' || lpad(to_hex(d), 4, '0') AS unicode
, chr(d) ~ '\s' AS in_posix_space_class
, chr(d) ~ '[\s\u00a0\u180e\u2007\u200b-\u200f\u202f\u2060\ufeff]' AS in_custom_class
FROM (
-- TAB, SPACE, NO-BREAK SPACE, OGHAM SPACE MARK, MONGOLIAN VOWEL, NARROW NO-BREAK SPACE
-- MEDIUM MATHEMATICAL SPACE, WORD JOINER, IDEOGRAPHIC SPACE, ZERO WIDTH NON-BREAKING SPACE
SELECT unnest('{9,32,160,5760,6158,8239,8287,8288,12288,65279}'::int[])
UNION ALL
SELECT generate_series (8192, 8202) AS dec -- UNICODE "Spaces"
UNION ALL
SELECT generate_series (8203, 8207) AS dec -- First 5 space-like UNICODE "Format characters"
) t(d)
ORDER BY d;
decimal | hex | glyph | unicode | in_posix_space_class | in_custom_class
---------+------+----------+---------+----------------------+-----------------
9 | 9 | | \u0009 | t | t
32 | 20 | | \u0020 | t | t
160 | a0 |   | \u00a0 | f | t
5760 | 1680 |   | \u1680 | t | t
6158 | 180e | ᠎ | \u180e | f | t
8192 | 2000 |   | \u2000 | t | t
8193 | 2001 |   | \u2001 | t | t
8194 | 2002 |   | \u2002 | t | t
8195 | 2003 |   | \u2003 | t | t
8196 | 2004 |   | \u2004 | t | t
8197 | 2005 |   | \u2005 | t | t
8198 | 2006 |   | \u2006 | t | t
8199 | 2007 |   | \u2007 | f | t
8200 | 2008 |   | \u2008 | t | t
8201 | 2009 |   | \u2009 | t | t
8202 | 200a |   | \u200a | t | t
8203 | 200b | ​ | \u200b | f | t
8204 | 200c | ‌ | \u200c | f | t
8205 | 200d | ‍ | \u200d | f | t
8206 | 200e | ‎ | \u200e | f | t
8207 | 200f | ‏ | \u200f | f | t
8239 | 202f |   | \u202f | f | t
8287 | 205f |   | \u205f | t | t
8288 | 2060 | ⁠ | \u2060 | f | t
12288 | 3000 |   | \u3000 | t | t
65279 | feff | | \ufeff | f | t
(26 rows)
Tool to generate the character class:
SELECT '[\s' || string_agg('\u' || lpad(to_hex(d), 4, '0'), '' ORDER BY d) || ']'
FROM (
SELECT unnest('{9,32,160,5760,6158,8239,8287,8288,12288,65279}'::int[])
UNION ALL
SELECT generate_series (8192, 8202)
UNION ALL
SELECT generate_series (8203, 8207)
) t(d)
WHERE chr(d) !~ '\s'; -- not covered by \s
[\s\u00a0\u180e\u2007\u200b\u200c\u200d\u200e\u200f\u202f\u2060\ufeff]
db<>fiddle here
Related, with more explanation:
Trim trailing spaces with PostgreSQL
You may construct a bracket expression including the whitespace characters from \p{Zs} Unicode category + a tab:
REGEXP_REPLACE(col, '[\u0009\u0020\u00A0\u1680\u2000-\u200A\u202F\u205F\u3000]+', ' ', 'g')
It will replace all occurrences of one or more horizontal whitespaces (match by \h in other regex flavors supporting it) with a regular space char.
Compiling blank characters from several sources, I've ended up with the following pattern which includes tabulations (U+0009 / U+000B / U+0088-008A / U+2409-240A), word joiner (U+2060), space symbol (U+2420 / U+2423), braille blank (U+2800), tag space (U+E0020) and more:
[\x0009\x000B\x0088-\x008A\x00A0\x1680\x180E\x2000-\x200F\x202F\x205F\x2060\x2409\x240A\x2420\x2423\x2800\x3000\xFEFF\xE0020]
And in order to effectively transform blanks including multiple consecutive spaces and those at the beginning/end of a column, here are the 3 queries to be executed in sequence (assuming column "text" from "mytable")
-- transform all Unicode blanks/spaces into a "regular" one (U+20) only on lines where "text" matches the pattern
UPDATE
mytable
SET
text = regexp_replace(text, '[\x0009\x000B\x0088-\x008A\x00A0\x1680\x180E\x2000-\x200F\x202F\x205F\x2060\x2409\x240A\x2420\x2423\x2800\x3000\xFEFF\xE0020]', ' ', 'g')
WHERE
text ~ '[\x0009\x000B\x0088-\x008A\x00A0\x1680\x180E\x2000-\x200F\x202F\x205F\x2060\x2409\x240A\x2420\x2423\x2800\x3000\xFEFF\xE0020]';
-- then squeeze multiple spaces into one
UPDATE mytable SET text=regexp_replace(text, '[ ]+ ',' ','g') WHERE text LIKE '% %';
-- and finally, trim leading/ending spaces
UPDATE mytable SET text=trim(both ' ' FROM text) WHERE text LIKE ' %' OR text LIKE '% ';

regex numbers in arithmetic expression

I want to capture all numbers in a string
for example:
+================+============+
| string | match |
+================+============+
| 5*-33 = 75.3 | 5|-33|75.3 |
+----------------+------------+
| s44+2=7 | 2|7 |
+----------------+------------+
| ii2*-5 = 46 | -5|46 |
+----------------+------------+
| -2*-2.1 = 0.1 | -2|-2.1|0.1|
+================+============+
i tried with following expression, but its not working with signed numbers.
\b([0-9]+(\.\d+)?)\b
Regexr
Don't forget the optional -. - is not a number, so you have to capture it separately.
\b(-?\d+(\.\d+)?)\b
Of course, this will have issues with valid expressions such as:
4-3
But that seems to be a different problem.

Hive: Extract string between first and last occurrence of a character

I have a Hive table column which has string separated by '-' and i need to extract the string between first and last occurrence of '-'
+-----------------+
| col1 |
+-----------------+
| abc-123-na-00-sf|
| 123-abc-01-sd |
| 123-abcd-sd |
+-----------------+
Required output:
+-----------+
| col1 |
+-----------+
| 123-na-00 |
| abc-01 |
| abcd |
+-----------+
Please suggest some regex to extract the desired output.
Thanks
with t as (select explode(array('abc-123-na-00-sf','123-abc-01-sd','123-abcd-sd')) as str)
select regexp_extract (str,'-(.*)-',1)
from t
;
123-na-00
abc-01
abcd
or
with t as (select explode(array('abc-123-na-00-sf','123-abc-01-sd','123-abcd-sd')) as str)
select regexp_extract (str,'(?<=-).*(?=-)',0)
from t
;
123-na-00
abc-01
abcd

code optimization

I must write a function "to_string" wich receives this datatype
datatype prop = Atom of string | Not of prop | And of prop*prop | Or of prop*prop;
and returns a string.
Example
show
And(Atom("saturday"),Atom("night")) =
"(saturday & night)"
My function is working but I have 2 problems.
the interpreter tells me -> Warning: match nonexhaustive
I think i can write the function with locals functions for all the types (Not, And, Or) and avoid duplicate code but I don't know how.
there is my code
datatype prop = Atom of string | Not of prop | And of prop*prop | Or of prop*prop;
fun show(Atom(alpha)) = alpha
| show(Not(Atom(alpha))) = "(- "^alpha^" )"
| show(Or(Atom(alpha),Atom(beta))) = "( "^alpha^" | "^beta^" )"
| show(Not(Or(Atom(alpha),Atom(beta)))) = "(- ( "^alpha^" | "^beta^" ))"
| show(Or(Not(Atom(alpha)),Atom(beta))) = "( (-"^alpha^") | "^beta^" )"
| show(Or(Atom(alpha),Not(Atom(beta)))) = "( "^alpha^" | (-"^beta^") )"
| show(Or(Not(Atom(alpha)),Not(Atom(beta)))) = "( (-"^alpha^") | (-"^beta^") )"
| show(And(Atom(alpha),Atom(beta))) = "( "^alpha^" & "^beta^" )"
| show(Not(And(Atom(alpha),Atom(beta)))) = "(- ( "^alpha^" & "^beta^" ))"
| show(And(Not(Atom(alpha)),Atom(beta))) = "( (-"^alpha^") & "^beta^" )"
| show(And(Atom(alpha),Not(Atom(beta)))) = "( "^alpha^" & (-"^beta^") )"
| show(And(Not(Atom(alpha)),Not(Atom(beta)))) = "( (-"^alpha^") & (-"^beta^") )";
Thanks a lot for your help.
The general rule is as follows: if you have a recursive data type, you should use a recursive function to transform it.
Your match expression is not exhaustive because there are a lot of variants you can't handle - i.e. And(And(Atom("a"), Atom("b")), Atom("c")).
You should rewrite the function with recursive calls to itself - i.e. replace Not(Atom(alpha)) match with Not(expr):
show(Not(expr)) = "(- " ^ show(expr) ^ " )"
I'm sure you can figure out the rest (you'll have two recursive calls for and/or).