Using gregexpr to get position in a string

Using gregexpr to get position in a string - regex

What I want to do is to extract the position of a certain expression in a character string (length is 22588). I tried it in the following way:
This is the pattern I'm looking for:
\n,null,[null,null,12.27,800.54]\n,
\n,null,[null,null,12.58,670.84]\n,
\n,null,[null,null,13.45,750.25]\n,
And so on.
I try to give an example:
test = "some other stuff \n,null,[null,null,12.27,800.54]\n, other stuff a lot of characters \n,null,[null,null,12.58,670.84]\n, and again \n,null,[null,null,13.45,750.25]\n,"
Now I want to get the positions of the expressions. which have this pattern:
\n,null,[null,null,"decimal numbers""comma between decimal numbers""decimal numbers"]\n,
This is what I tried:
mypattern = "\\\\n,null,\\[\null,null,[:alnum:]\\]\\\\\n,"
gg = gregexpr(mypattern,datalines)
Unfortunately this does not work. In the middle I always have these coordinates. So I need a wildcard for them and I also gues R has a problem to read the metacharacter.
Thanks in advance!

You can try with this pattern:
"\\\n,null,\\[null,null,\\d+\\.\\d+\\,\\d+\\.\\d+\\]\\\n"
or this pattern if the numbers of digits before and after each "." stay the same:
"\\\n,null,\\[null,null,\\d{2}\\.\\d{2}\\,\\d{3}\\.\\d{2}\\]\\\n"
With your example:
gregexpr("\\\n,null,\\[null,null,\\d+\\.\\d+\\,\\d+\\.\\d+\\]\\\n",test)
gregexpr("\\\n,null,\\[null,null,\\d{2}\\.\\d{2}\\,\\d{3}\\.\\d{2}\\]\\\n",test)
#[[1]]
#[1] 18 84 129
#attr(,"match.length")
#[1] 32 32 32
#attr(,"useBytes")
#[1] TRUE

Related

How to regex German street addresses with numbers (infix)

I've got these two addresses:
Straße des 17 Juni 122a
Str. 545 3
See https://regex101.com/r/2WT48R/5
I need to filter for the street and number.
My desired output would be:
streets = [Straße des, Str. ]
numbers = [17 Juni 122a, 545 3]
This is my regex:
(?<street>[\S ]+?)\s*(?<number>\d+[\w\s\/-]*)$
Output should look like:
streets = [Straße des 17 Juni, Str. 545]
numbers = [122a, 3]

Looks like there's no spaces in the "numbers" part of your regex - you can use that to cut away those extra characters getting stuck in your second capture group.
(?<street>[\S ]+)\s(?<number>\d+\S*$)
By allowing no whitespace in the second capture group, it won't match the numbers 17 or 545 too early.
Demo
EDIT: after seeing your more detailed list of examples on your own demo, the following regex will match the complete set of your test cases:
(?<street>[\S \t]+?) ?(?<number>[\d\s]+[\w-\/]*?$)
Demo

I found one answer by myself:
(?<street>[\S ]+?)\s*(?<number>\d+\s*[a-zA-Z]*\s*([-\/]\s*\d*\s*\w?\s*)*)$
The demo includes several additional test cases.

I want to replace the second occurrence of the number in the string

I have a string say a url like below
"www.regexperl.com/1234/34/firstpage/home.php"
Now i need to replace the 34 number that is the second occurrence of a number in the string with 2.
The resultant string should be like
"www.regexperl.com/1234/2/firstpage/home.php"
The challenge I m facing is when i try to store the value 34 and replace it , It is replacing the 34 in the number 1234 and gives the result like below
"www.regexperl.com/122/34/firstpage/home.php"
Kindly let me know a proper regex to solve the problem.

Use \K.
^.*?\d+\b.*?\K\d+
Replace by your string.See demo.
https://regex101.com/r/lW2kK1/1

Well if the positions are constant then you can find and replace as follows.
Regex: (\.com\/\d+)(\/\d+)
Input string: www.regexperl.com/1234/34/firstpage/home.php
Replacement to do: Replace with \1/ followed by number of your choice. For example \1/2.
Output string: www.regexperl.com/1234/2/firstpage/home.php
Regex101 Demo

R - split string before two last digits in each column cell

I have a csv with usernames in a column, followed by each user's feedback rating, out of 100.
E.g. James89
I hope to find a way to split the name and the rating, e.g. by inserting a comma before the two last digits using regex. Is this possible? And/or is there a better way to do this?
df1 = data.frame(Product = c(rep("ARCH78"), rep("AUSFUNGUY91"), rep("AddiesAndXans96"), rep("AfroBro79")))
The code above is a tiny excerpt of the data I'm dealing with. I hope to get this output:
ARCH 78
AUSFUNGUY 91
AddiesAndXans 96
AfroBro 79
I've tried this code (inspired from this answer:
df1$P2 <- gsub("(.*?)(..)", "\\1", df1$Product)
It seems to be working, but there's something wrong with the output:
ARCH78 AR
AUSFUNGUY91 AUUNY
AddiesAndXans96 AdesdXs
AfroBro79 AfBr9

As for the following:
I hope to find a way to split the name and the rating, e.g. by inserting a comma before the two last digits using regex.
You can achieve it with a mere
df1 = data.frame(Product = c(rep("ARCH78"), rep("AUSFUNGUY91"), rep("AddiesAndXans96"), rep("AfroBro79")))
gsub("(\\d{2})$",",\\1",df1$Product)
## => [1] "ARCH,78" "AUSFUNGUY,91" "AddiesAndXans,96" "AfroBro,79"
See IDEONE demo
You can further adjust the replacement ",\\1" that features a backreference \1 to the last 2 digits.

Split sentence by words with regex in R

I'm using (or I'd like to use) R to extract some information. I have the following sentence and I'd like to split. In the end, I'd like to extract only the number 24.
Here's what I have:
doc <- "Hits 1 - 10 from 24"
And I want to extract the number "24". I know how to extract the number once I can reduce the sentence in "Hits 1 - 10 from" and "24". I tried using this:
n_docs <- unlist(str_split(key_n_docs, ".\\from"))[1]
But this leaves me with: "Hits 1 - 10"
Obviously the split works somehow, but I'm interested in the part after "from" not the one before. All the help is appreciated!

If you want to extract from a single character string:
strsplit(key_n_docs, "from")[[1]][2]
or the equivalent expression used by #BastiM (sorry I saw your answer after I submitted mine)
unlist(strsplit(key_n_docs, "from"))[2]
If you want to extract from a vector of character strings:
sapply(strsplit(key_n_docs, "from"),`[`, 2)

Usually the result of str_split would contain the number you're searching for at index 1, but since you wrap it with unlist it seems you have to increment the index by one. Using
unlist(strsplit("Hits 1 - 10 from 24", "from"))[2]
works like a charm for me.
demo # ideone

You can use str_extract from stringr:
library(stringr)
numbers <- str_extract(doc, "[0-9]+$")
This will give only the numbers in the end of the sentence.
numbers
"24"

You can use sub to extract the number:
sub(".*from *(\\d+).*", "\\1", doc)
# [1] "24"

How to find numbers and exclude any in parentheses using regex

I'm trying to write a regex pattern that will find numbers with two leading 00's in it in a string and replace it with a single 0. The problem is that I want to ignore numbers in parentheses and I can't figure out how to do this.
For example, with the string:
Somewhere 001 (2009)
I want to return:
Somewhere 01 (2009)
I can search by using [00] to find the first 00, and replace with 0 but the problem is that (2009) becomes (209) which I don't want. I thought of just doing a replace on (209) with (2009) but the strings I'm trying to fix could have a valid (209) in it already.
Any help would be appreciated!

Search one non digit (or start of line) followed by two zeros followed by one or more digits.
([^0-9]|^)00[0-9]+
What if the number has three leading zeros? How many zeros do you want it to have after the replacement? If you want to catch all leading zeros and replace them with just one:
([^0-9]|^)00+[0-9]+

Ideally, you'd use negative look behind, but your regex engine may not support it. Here is what I would do in JavaScript:
string.replace(/(^|[^(\d])00+/g,"$10");
That will replace any string of zeros that is not preceded by parenthesis or another digit. Change the character class to [^(\d.] if you're also working with decimal numbers.

?Regex.Replace("Somewhere 001 (2009)", " 00([0-9]+) ", " 0$1 ")
"Somewhere 01 (2009)"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using gregexpr to get position in a string - regex

Related

How to regex German street addresses with numbers (infix)

I want to replace the second occurrence of the number in the string

R - split string before two last digits in each column cell

Split sentence by words with regex in R

How to find numbers and exclude any in parentheses using regex

Categories

Resources