I'm trying to use RXExtract as a calculated column to dissect [column1] by it's values. [column1] looks like "location- 1234 (abc)" and I'd like to just separate the "1234" out of it.
My current code in Spotfire is RXExtract([SM Code], '(\d)(\d)(\d)(\d)', 1 ) but I get an "invalid escape sequence error". Where am I going wrong?
Thanks!
Try
RXExtract([SM Code], '\b\d{4}\b', 1)
this will match any four consecutive or unconsecutive digits.
Related
I have 30000 lines that look like the one below.
342800005013000 CON N GORE PT LOT 31 RP 11R2284 PT PART 1 RP 11R4541 PT PART 2
I would like to capture the 15 digit number at the beginning and any "11R***" numbers.
In Notepad++ I've used \d{15}|(11R\d*)* to match everything that I want. Ultimately I would like to get all the matched results into excel. What would be the best way to do so?
Thanks for your help.
Notepad++ Matches
You could try this one
(^[0-9]*)|(11R[0-9A-Za-z]*)
edit: check it now, the code formatting correctly displays the regex;
I'm using (or I'd like to use) R to extract some information. I have the following sentence and I'd like to split. In the end, I'd like to extract only the number 24.
Here's what I have:
doc <- "Hits 1 - 10 from 24"
And I want to extract the number "24". I know how to extract the number once I can reduce the sentence in "Hits 1 - 10 from" and "24". I tried using this:
n_docs <- unlist(str_split(key_n_docs, ".\\from"))[1]
But this leaves me with: "Hits 1 - 10"
Obviously the split works somehow, but I'm interested in the part after "from" not the one before. All the help is appreciated!
If you want to extract from a single character string:
strsplit(key_n_docs, "from")[[1]][2]
or the equivalent expression used by #BastiM (sorry I saw your answer after I submitted mine)
unlist(strsplit(key_n_docs, "from"))[2]
If you want to extract from a vector of character strings:
sapply(strsplit(key_n_docs, "from"),`[`, 2)
Usually the result of str_split would contain the number you're searching for at index 1, but since you wrap it with unlist it seems you have to increment the index by one. Using
unlist(strsplit("Hits 1 - 10 from 24", "from"))[2]
works like a charm for me.
demo # ideone
You can use str_extract from stringr:
library(stringr)
numbers <- str_extract(doc, "[0-9]+$")
This will give only the numbers in the end of the sentence.
numbers
"24"
You can use sub to extract the number:
sub(".*from *(\\d+).*", "\\1", doc)
# [1] "24"
I am trying to extract a specific text from an Outlook subject line. This is required to calculate turn around time for each order entered in SAP. I have a subject line as below
SO# 3032641559 FW: Attached new PO 4500958640- 13563 TYCO LJ
My final output should be like this: 3032641559
I have been able to do this in MS excel with the formulas like this
=IFERROR(INT(MID([#[Normalized_Subject]],SEARCH(30,[#[Normalized_Subject]]),10)),"Not Found")
in the above formula [#[Normalized_Subject]] is the name of column in which the SO number exists. I have asked to do this in oracle but I am very new to this. Your help on this would be greatly appreciated.
Note: in the above subject line the number 30 is common in every subject line.
The last parameter of REGEXP_SUBSTR() indicates the sub-expression you want to pick. In this case you can't just match 30 then some more numbers as the second set of digits might have a 30. So, it's safer to match the following, where x are more digits.
SO# 30xxxxxx
As a regular expression this becomes:
SO#\s30\d+
where \s indicates a space \d indicates a numeric character and the + that you want to match as many as there are. But, we can use the sub-expression substringing available; in order to do that you need to have sub-expressions; i.e. create groups where you want to split the string:
(SO#\s)(30\d+)
Put this in the function call and you have it:
regexp_substr(str, '(SO#\s)(30\d+)', 1, 1, 'i', 2)
SQL Fiddle
I am trying to find the rows from a hive table where a particular column does not contain null values or \N values or STX character '\002'. The objective is to find which rows contain some characters other than these three.
I tried this hive query:
select column1,length(regexp_replace(column1,'\N|\002|NULL','')) as value
FROM table1 LIMIT 10;
I was expecting zero in the following cases but I am getting the following:
column1 value
NULL NULL
0
NULL NULL
0
\N\N\N\N\N\N\N\N 8
NULL NULL
\N\N\N\N\N\N\N\N 8
NULL NULL
NULL NULL
\N\N\N 3
Could someone please help me on the correct regex for the above case?
Thank you.
Ravi
It looks that hive is using Java's regular expression engine so the problem seems to be with the regex itself, more specifically in the escape sequences.
Try the following and if it doesn't work then please let me know:
(?:(?:\\\\N)+|\002|NULL)
I'm trying to validate mm-dd-(2012~2099) date format.
I have the following regular expression.
^(0[1-9]|1[0-2])-(0[1-9]|[10-31])-(20[12-99])$
when I run the following code, I get false. What's wrong with this regular expression?
var reg = new RegExp("^(0[1-9]|1[0-2])-(0[1-9]|[10-31])-(20[12-99])$")
reg.test("05-33-2012")**
When I take out the year part, and then test "05-33", it works.
As Oli said, [12-99] does not do what you think it does.
Specifically, the - refers to a range of characters, not numbers. So [12-99] matches...
1
2-9
9
The expression 20(1[2-9]|[2-9][0-9]) would work for dates 2012-2099