Using Regex to extract decimal numbers from arrays on Snowflake

Using Regex to extract decimal numbers from arrays on Snowflake - regex

The column ('DATA') I want to extract the decimal from is in the following format:
{
"unit": "Miles",
"value": 59290.6
}
I've tried the following code by I get a null...
regexp_substr(DATA, '\{\d+.\d+\}') AS RECORDED_DISTANCE

Do you just mean to access the value element?
select
data:value as record_distance

Related

Search for string that doesn't start with a certain prefix using regex + index

We are trying to use the index when excluding records with a fields that starts with a certain prefix, when using the following, index is not being used:
{"field": {"$not": {"$regex": "^abcd"}}}
And it takes forever to get the result.
But when using:
{"field": {"$regex": "^abcd"}}
The index is used and we get the result instantly!
Is there a way around this?

If the string is only alpha-numeric, you can use inequalities and avoid the regex altogether:
{$or: [{"field": {"$lt": "abcd"}}, {"field": {"$gt": "abcd~"}}]}
Since ~ is greater than z, that query will return all values that don't start with that prefix, as long as ~ is not a valid character in the value you are examining.
Playground

Match a text value in a column and perform a function on values in another column that match

Using Google Sheets I'm attempting to perform a match of values in a particular column and then based on that column execute a function(SUM) on matching values in a different column.
I've tried LOOKUP and VLOOKUP but those are throwing errors, presumably because they are expecting to only return a single value in a given range and not perform the SUM that I am requiring.
=LOOKUP("[sometext]*", A2:A25, SUM(H2:H25))
Ideally, what I would like to happen is to search the range A2:A25, find any rows that match "sometext*", e.g. "sometext1", "sometext2", "sometext3" etc. and then move over to second range and sum the values in the matched rows, e.g. "1", "2", "3" and return "6".

=ARRAYFORMULA(SUM(IF(REGEXMATCH(A2:A25, "sometext"),
REGEXREPLACE(A2:A25, "\D+", )*1, )))
=ARRAYFORMULA(SUM(IF(REGEXMATCH(A2:A25, "sometext"), B2:B25, )))

How to convert dots to commas decimal notations using excel formula

Let's say the user enters 1.234.567,89 or 1,234,567.89 or 1 234 567,89 in any excel cell one by one and in all the above cases the user should get 1234567,89 in the output cell.

Excel
Try TEXT() with a custom format:
=TEXT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,",",""),".","")," ",""),"[>=100]#\,#0;#")
Google Spreadsheets
Try using REGEXREPLACE():
=TEXT(REGEXREPLACE(TEXT(A1,"#"),"[ ,.]",""),"[>=100]#\,#0;#")

=ARRAYFORMULA(IF(LEN(A1:A),
IFERROR(REGEXREPLACE(A1:A, "\s|\.", ),
REGEXREPLACE(""&A1:A, "\s|\.", ",")), ))

find a special patern in colume of mongodb

I want to find one or two first number of string number, in column number, in mongodb. But when use a command...
Blockquote
db.rs1.find( { mob: { $regex: /^98/} } )
...in mongodb, it's give a limit number of column. What is solution of problem?

regex to convert key values in a column to an hstore or jsonb column?

I have a database that has a table called activity with a column called detail that has this unfortunate representation of key/value pairs:
Key ID=[813],\n
Key Name=[Name of Key],\n
Some Field=[2732],\n
Another Field=[2751],\n
Description=[A text string here],\n
Location=[sometext],\n
Other ID=[2360578],\n
It's maybe clear from the formatting above, this is a one value per line and \n is a newline character so there's always one extra newline. I'm trying to avoid having an external program process this data, so I'm looking into postgresql's regex functions. The goal is to convert this to a jsonb or hstore column, I don't really care which.
Schema for the table is like:
CREATE TABLE activity
(
id integer NOT NULL,
activity_type integer NOT NULL,
ts timestamp with time zone,
detail text NOT NULL,
details_hstore hstore,
details_jsonb jsonb,
CONSTRAINT activity_pkey PRIMARY KEY (id),
);
So I'd like to run an UPDATE where I update the details_jsonb or details_hstore with the processed data from detail.
This:
select regexp_matches(activity.detail, '(.*?)=\[(.*?)\]\,[\r|\n]', 'g') as val from activity
gets me these individual rows (this is from pgadmin, I assume these are all strings):
{"Key ID",813}
{"Key Name","Name of Key"}
{"Some Field",2732}
{"Another Field",2751}
{Description,"A text string here"}
{Location,sometext}
{"Other ID",2360578}
I'm not a regex whiz but I think I need some kind of grouping. Also, that's returning as a text array of some kind, but what I really want is like this for jsonb
{"Key ID": "813", "Key Name": "Name of Key"}
or even better, if it's a number only then
{"Key ID": 813, "Key Name": "Name of Key"}
and/or the equivalent for hstore.
I feel like I'm a number of regex-in-postgres concepts away from this goal.
First is how to get ALL the pairs together in some kind of array or something, not as separate rows.
Second is, can I figure if it's a number and optionally get "" around strings and nothing around numbers for jsonb or hstore
Third, get that as some kind of string/text
Fourth is, how to then write that into another jsonb/hstore field using an update
Is this kind of regex update too much to get working in an update? i.e. update activity set details_jsonb = [[insane regex here]]? hstore is also an option (though I like that jsonb has types), so if it's easier to go to an hstore function like hstore(text[]) that's fine too.
Am I crazy and do I need to just write an external process not-in-postgresql that does this?

I would first split the single value into multiple lines. Each line can then be converted to an array from which this can be aggregated into a JSON object:
select string_to_array(regexp_replace(t.line, '(^\s+)|(\s+$)', '', 'g'), '=')
from activity a, regexp_split_to_table(a.detail, ',\s*\n') t (line)
This returns the following:
element
------------------------------------
{KeyID,[813]}
{"Key Name","[Name of Key]"}
{"Some Field",[2732]}
{"Another Field",[2751]}
{Description,"[A text string here]"}
{Location,[sometext]}
{"Other ID",[2360578]}
{}
The regex to split the detail value into lines might need some improvements though.
The regexp_replace(t.line, '(^\s+)|(\s+$)', '', 'g') is there trim the values before converting them to an array.
Now this can be aggregated into a single JSON value, or each line can be converted into a single hstore value (unfortunately there is no hstore_agg())
with activity (detail) as (
values (
'Key ID=[813],
Key Name=[Name of Key],
Some Field=[2732],
Another Field=[2751],
Description=[A text string here],
Location=[sometext],
Other ID=[2360578],
')
), elements (element) as (
select string_to_array(regexp_replace(t.line, '\s', ''), '=')
from activity a, regexp_split_to_table(a.detail, ',') t (line)
)
select json_agg(jsonb_object(element))
from elements
where cardinality(element) > 1 -- this removes the empty line
The above returns a JSON object:
[ { "KeyID" : "[813]" },
{ "Key Name" : "[Name of Key]" },
{ "Some Field" : "[2732]" },
{ "Another Field" : "[2751]" },
{ "Description" : "[A text string here]" },
{ "Location" : "[sometext]" },
{ "Other ID" : "[2360578]" }
]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using Regex to extract decimal numbers from arrays on Snowflake - regex

The column ('DATA') I want to extract the decimal from is in the following format: { "unit": "Miles", "value": 59290.6 } I've tried the following code by I get a null... regexp_substr(DATA, '\{\d+.\d+\}') AS RECORDED_DISTANCE

Do you just mean to access the value element? select data:value as record_distance

Related

Search for string that doesn't start with a certain prefix using regex + index

Match a text value in a column and perform a function on values in another column that match

How to convert dots to commas decimal notations using excel formula

find a special patern in colume of mongodb

regex to convert key values in a column to an hstore or jsonb column?

Categories

Resources