FileNet Content Engine query comparing numbers defined as string

FileNet Content Engine query comparing numbers defined as string - casting

So I have a FileNet search query like this
SELECT * from MyPurchase_Docs
WHERE Purchase_Amount > 100.50
Very simple query but my problem is that Purchase_Amount is defined as string
so I get results where
Purchase_Amount is 2.5, 30.25 etc (because it is comparing strings)
I tried CAST function but it does not work with FileNet.
I do not have access to change Field type in Filenet so I am stuck here.
Please let me know if there is a way to solve this problem.

That is not possible, data type conversion is not supported.
<predicate> ::= <comparison_predicate>
| <null_test>
| <in_test>
| <existence_test>
| <isclass_test>
| <isOfclass_test>
| <content_test>
| <satisfies_test>
| <intersects_test>
<comparison_predicate> ::= <scalar_exp> <comparison_op> <scalar_exp>
<scalar_exp> ::= <literal>
| <property_exp>
| ( '(' <scalar_exp> ')' )
| ( <scalar_exp> <arith_op> <scalar_exp> )
| <property_spec> [<arith_op> <timespan_exp>]
| <now> [<arith_op> <timespan_exp>]
SQL Statement Grammar

Related

Concatenating string_view objects

I've been adding std::string_views to some old code for representing string like config params, as it provides a read only view, which is faster due to no need for copying.
However, one cannot concatenate two string_view together as the operator+ isn't defined. I see this question has a couple answers stating its an oversight and there is a proposal in for adding that in. However, that is for adding a string and a string_view, presumably if that gets implemented, the resulting concatenation would be a std::string
Would adding two string_view also fall in the same category? And if not, why shouldn't adding two string_view be supported?
Sample
std::string_view s1{"concate"};
std::string_view s2{"nate"};
std::string_view s3{s1 + s2};
And here's the error
error: no match for 'operator+' (operand types are 'std::string_view' {aka 'std::basic_string_view<char>'} and 'std::string_view' {aka 'std::basic_string_view<char>'})

A view is similar to a span in that it does not own the data, as the name implies it is just a view of the data. To concatenate the string views you'd first need to construct a std::string then you can concatenate.
std::string s3 = std::string(s1) + std::string(s2);
Note that s3 will be a std::string not a std::string_view since it would own this data.

A std::string_view is an alias for std::basic_string_view<char>, which is a std::basic_string_view templated on a specific type of character, i.e. char.
But what does it look like?
Beside the fairly large number of useful member functions such as find, substr, and others (maybe it's an ordinary number, if compared to other container/string-like things offered by the STL), std::basic_string_view<_CharT>, with _CharT being the generic char-like type, has just 2 data members,
// directly from my /usr/include/c++/12.2.0/string_view
size_t _M_len;
const _CharT* _M_str;
i.e. a constant pointer to _CharT to indicate where the view starts, and a size_t (an appropriate type of number) to indicate how long the view is starting from _M_str's pointee.
In other words, a string view just knows where it starts and how long it is, so it represents a sequence of char-like entities which are consecutive in memory. With just two such memebrs, you can't represent a string which is made up of non-contiguous substrings.
Yet in other words, if you want to create a std::string_view, you need to be able to tell how many chars it is long and from which position. Can you tell where s1 + s2 would have to start and how many characters it should be long? Think about it: you can't, becase s1 and s2 are not adjacent.
Maybe a diagram can help.
Assume these lines of code
std::string s1{"hello"};
std::string s2{"world"};
s1 and s2 are totally unrelated objects, as far as their memory location is concerned; here is what they looks like:
&s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v
+---+---+---+---+---+
| h | e | l | l | o |
+---+---+---+---+---+
I've intentionally drawn them misaligned to mean that &s1[0], the memory location where s1 starts, and &s2[0], the memory location where s2 starts, have nothing to do with each other.
Now, imagine you create two string views like this:
std::string_view sv1{s1};
std::string_view sv2(s2.begin() + 1, s2.begin() + 4);
Here's what they will look like, in terms of the two implementation-defined members _M_str and _M_len:
&s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v · ^ ·
+---+---+---+---+---+ · | ·
| h | e | l | l | o | +---+ ·
+---+---+---+---+---+ | · ·
· ^ · | · s2._M_len ·
· | · | <----------->
+---+ · |
| · · +-- s2._M_str
| · s1._M_len ·
| <------------------->
|
+-------- s1._M_str
Given the above, can you see what's wrong with expecting that
std::string_view s3{s1 + s2};
works?
How can you possible define s3._M_str and s3._M_len (based on s1._M_str, s1._M_len, s2._M_str, and s2._M_len), such that they represent a view on "helloworld"?
You can't because "hello" and "world" are located in two unrelated areas of memory.

std::string_view does not own any data, it is only a view. If you want to join two views to get a joined view, you can use boost::join() from the Boost library. But result type will be not a std::string_view.
#include <iostream>
#include <string_view>
#include <boost/range.hpp>
#include <boost/range/join.hpp>
void test()
{
std::string_view s1{"hello, "}, s2{"world"};
auto joined = boost::join(s1, s2);
// print joined string
std::copy(joined.begin(), joined.end(), std::ostream_iterator(std::cout, ""));
std::cout << std::endl;
// other method to print
for (auto c : joined) std::cout << c;
std::cout << std::endl;
}
C++23 has joined ranges in the standard library with the name of std::ranges::views::join_with_view
#include <iostream>
#include <ranges>
#include <string_view>
void test()
{
std::string_view s1{"hello, "}, s2{"world"};
auto joined = std::ranges::views::join_with_view(s1, s2);
for (auto c : joined) std::cout << c;
std::cout << std::endl;
}

Power BI - M query to join records matching range

I have an import query (table a) and an imported Excel file (table b) with records I am trying to match it up with.
I am looking for a method to replicate this type of SQL in M:
SELECT a.loc_id, a.other_data, b.stk
FROM a INNER JOIN b on a.loc_id BETWEEN b.from_loc AND b.to_loc
Table A
| loc_id | other data |
-------------------------
| 34A032B1 | ... |
| 34A3Z011 | ... |
| 3DD23A41 | ... |
Table B
| stk | from_loc | to_loc |
--------------------------------
| STKA01 | 34A01 | 34A30ZZZ |
| STKA02 | 34A31 | 34A50ZZZ |
| ... | ... | ... |
Goal
| loc_id | other data | stk |
----------------------------------
| 34A032B1 | ... | STKA01 |
| 34A3Z011 | ... | STKA02 |
| 3DD23A41 | ... | STKD01 |
All of the other queries I can find along these lines use numbers, dates, or times in the BETWEEN clause, and seem to work by exploding the (from, to) range into all possible values and then filtering out the extra rows. However I need to use string comparisons, and exploding those into all possible values would be unfeasable.
Between all the various solutions I could find, the closest I've come is to add a custom column on table a:
Table.SelectRows(
table_b,
(a) => Value.Compare([loc_id], table_b[from_loc]) = 1
and Value.Compare([loc_id], table_b[to_loc]) = -1
)
This does return all the columns from table_b, however, when expanding the column, the values are all null.

This is not very specific "After 34A01 could be any string..." in trying to figure out how your series progresses.
But maybe you can just test for how a value "sorts" using the native sorting function in PQ.
add custom column with table.Select Rows:
= try Table.SelectRows(TableB, (t)=> t[from_loc]<=[loc_id] and t[to_loc] >= [loc_id])[stk]{0} otherwise null
To reproduce with your examples:
let
TableB=Table.FromColumns(
{{"STKA01","STKA02"},
{"34A01","34A31"},
{"34A30ZZZ","34A50ZZZ"}},
type table[stk=text,from_loc=text,to_loc=text]),
TableA=Table.FromColumns(
{{"34A032B1","34A3Z011","3DD23A41"},
{"...","...","..."}},
type table[loc_id=text, other data=text]),
//determine where it sorts and return the stk
#"Added Custom" = Table.AddColumn(#"TableA", "stk", each
try Table.SelectRows(TableB, (t)=> t[from_loc]<=[loc_id] and t[to_loc] >= [loc_id])[stk]{0} otherwise null)
in
#"Added Custom"
Note: if the above algorithm is too slow, there may be faster methods of obtaining these results

Django annotate StrIndex for empty fields

I am trying to use Django StrIndex to find all rows with the value a substring of a given string.
Eg:
my table contains:
+----------+------------------+
| user | domain |
+----------+------------------+
| spam1 | spam.com |
| badguy+ | |
| | protonmail.com |
| spammer | |
| | spamdomain.co.uk |
+----------+------------------+
but the query
SpamWord.objects.annotate(idx=StrIndex(models.Value('xxxx'), 'user')).filter(models.Q(idx__gt=0) | models.Q(domain='spamdomain.co.uk')).first()
matches <SpamWord: *#protonmail.com>
The query it is SELECT `spamwords`.`id`, `spamwords`.`user`, `spamwords`.`domain`, INSTR('xxxx', `spamwords`.`user`) AS `idx` FROM `spamwords` WHERE (INSTR('xxxx', `spamwords`.`user`) > 0 OR `spamwords`.`domain` = 'spamdomain.co.uk')
It should be <SpamWord: *#spamdomain.co.uk>
this is happening because
INSTR('xxxx', '') => 1
(and also INSTR('xxxxasd', 'xxxx') => 1, which it is correct)
How can I write this query in order to get entry #5 (spamdomain.co.uk)?

The order of the parameters of StrIndex [Django-doc] is swapped. The first parameter is the haystack, the string in which you search, and the second one is the needle, the substring you are looking for.
You thus can annotate with:
from django.db.models import Q, Value
SpamWord.objects.annotate(
idx=StrIndex('user', Value('xxxx'))
).filter(
Q(idx__gt=0) | Q(domain='spamdomain.co.uk')
).first()

Just filter rows where user is empty:
(~models.Q(user='') & models.Q(idx__gt=0)) | models.Q(domain='spamdomain.co.uk')

How to get rows where a field contains ( ) , [ ] % or +. using rlike SparkSQL function

Let's say you have a Spark dataframe with multiple columns and you want to return the rows where the columns contains specific characters. Specifically you want to return the rows where at least one of the fields contains ( ) , [ ] % or +.
What is the proper syntax in case you want to use Spark SQL rlike function?
import spark.implicits._
val dummyDf = Seq(("John[", "Ha", "Smith?"),
("Julie", "Hu", "Burol"),
("Ka%rl", "G", "Hu!"),
("(Harold)", "Ju", "Di+")
).toDF("FirstName", "MiddleName", "LastName")
dummyDf.show()
+---------+----------+--------+
|FirstName|MiddleName|LastName|
+---------+----------+--------+
| John[| Ha| Smith?|
| Julie| Hu| Burol|
| Ka%rl| G| Hu!|
| (Harold)| Ju| Di+|
+---------+----------+--------+
Expected Output
+---------+----------+--------+
|FirstName|MiddleName|LastName|
+---------+----------+--------+
| John[| Ha| Smith?|
| Ka%rl| G| Hu!|
| (Harold)| Ju| Di+|
+---------+----------+--------+
My few attempts returns errors or not what expected even when I try to do it just for searching (.
I know that I could use the simple like construct multiple times, but I am trying to figure out to do it in a more concise way with regex and Spark SQL.

You can try this using rlike method:
dummyDf.show()
+---------+----------+--------+
|FirstName|MiddleName|LastName|
+---------+----------+--------+
| John[| Ha| Smith?|
| Julie| Hu| Burol|
| Ka%rl| G| Hu!|
| (Harold)| Ju| Di+|
| +Tim| Dgfg| Ergf+|
+---------+----------+--------+
val df = dummyDf.withColumn("hasSpecial",lit(false))
val result = df.dtypes
.collect{ case (dn, dt) => dn }
.foldLeft(df)((accDF, c) => accDF.withColumn("hasSpecial", col(c).rlike(".*[\\(\\)\\[\\]%+]+.*") || col("hasSpecial")))
result.filter(col("hasSpecial")).show(false)
Output:
+---------+----------+--------+----------+
|FirstName|MiddleName|LastName|hasSpecial|
+---------+----------+--------+----------+
|John[ |Ha |Smith? |true |
|Ka%rl |G |Hu! |true |
|(Harold) |Ju |Di+ |true |
|+Tim |Dgfg |Ergf+ |true |
+---------+----------+--------+----------+
You can also drop the hasSpecial column if you want.

Try this .*[()\[\]%\+,.]+.*
.* all character zero or more times
[()[]%+,.]+ all characters inside bracket 1 or more times
.* all character zero or more times

hive parse string from log

i'm having a problem when parsing string from log file, this is the case:
"skey":"110","scp_id":"OC05","capedge":"3G"
"skey":"140","scp_id":"OC02","capedge":"3G"
"skey":"0","scp_id":"OC01","capedge":"3G"
this is our expected output for our table
| skey | scp_id | capedge |
| 110 | OC05 | 3G |
| 140 | OC02 | 3G |
| 0 | OC01 | 3G |
i've tried using parse_url method from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF but unfortunately our string is not in url format, is there any better approach for this? or do i have to use regexp_extract for this?
thank you,
Galih

You may use a combination of SPLIT function and REGEXP_EXTRACT
select REGEXP_EXTRACT( skey , ':"(\\w+)"', 1) as skey,
REGEXP_EXTRACT( scp_id , ':"(\\w+)"', 1) as scp_id,
REGEXP_EXTRACT( capedge , ':"(\\w+)"', 1) as capedge
from (
select SPLIT(log_record, ',' )[0] as skey,
SPLIT(log_record , ',')[1] as scp_id,
SPLIT( log_record , ',')[2] as capedge
FROM yourtable
) a;
HUE DEMO : user id,pwd : demo,demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

FileNet Content Engine query comparing numbers defined as string - casting

Related

Concatenating string_view objects

Power BI - M query to join records matching range

Django annotate StrIndex for empty fields

How to get rows where a field contains ( ) , [ ] % or +. using rlike SparkSQL function

hive parse string from log

Categories

Resources