How to remove anything after a non-slash character in a string?

How to remove anything after a non-slash character in a string? - regex

The problem I am encountering is strange. Suppose I have:
a = "www.XXXXXXX.com"
b = "www.XXXXXXX.com/laskdfj/=*&9809f/12-613"
c = "www.XXXX.comllkjldfjlsadjfjldsf"
d = "http://www.XXXX.CoMmasldfjl"
e = "www.XXX.us/sdf"
f = "www.XXX.us0948klsdf"
If following after the ".com" or ".us" is not a slash, then remove it. So the result would be like:
a = "www.XXXXXXX.com"
b = "www.XXXXXXX.com/laskdfj/=*&9809f/12-613"
c = "www.XXXX.com"
d = "http://www.XXXX.CoM"
e = "www.XXX.us/sdf"
f = "www.XXX.us"
Regular expression is new to me, and I read several blogs about regular expression, none of them seem to talk about how to use if-statement to handle my situation... any hints?

You can utilize sub for this task:
sub('(.*\\.(?i:com|us))[^/]+', '\\1', x)
If you're wanting a more general approach, you can use:
sub('(.*\\.[[:alpha:]]{2,3})[^/]*', '\\1', x)
CodeBunk

Related

How to replace the captured group in Ruby

I would like to replace the captured group of a string with the elements of an array.
I am trying something like this:
part_number = 'R1L16SB#AA'
regex = (/\A(RM|R1)([A-Z])(\d\d+)([A-Z]+)#?([A-Z])([A-Z])\z/)
g = ["X","Y","Z"]
g.each do |i|
ren_m,ch_conf,bit_conf,package_type,packing_val,envo_vals = part_number.match(regex).captures
m = part_number.sub! packing_val,i
puts m
end
My code with array g = ["X","Y","Z"] is giving desired output as:
R1L16SB#XA
R1L16SB#YA
R1L16SB#ZA
The captured group packing_val is replaced with
g = ["X","Y","Z"]
But when the array has elements which are already present in the string then it is not working:
g = ["A","B","C"]
outputs:
R1L16SB#AA
R1L16SB#BA
R1L16SC#BA
But my expected output is:
R1L16SB#AA
R1L16SB#BA
R1L16SB#CA
What is going wrong and what could be the possible solution?

sub! will replace the first match every iteration on part_number which is outside of the loop.
What happens is:
In the first iteration, the first A will be replaced with A giving the same
R1L16SB#AA
^
In the second iteration, the first A will be replaced by B giving
R1L16SB#BA
^
In the third iteration, the first B will be replaced by C giving
R1L16SC#BA
^
One way to get the desired output is to put part_number = 'R1L16SB#AA' inside the loop.
Ruby demo

You mutated your part_number every iteration. That's the reason.
Just switch to sub without bang:
m = part_number.sub(packing_val, i)
You can do it without regex:
part_number = 'R1L16SB#AA'
g = %w[X Y Z]
g.each do |i|
pn = part_number.dup
pn[-2] = i
puts pn
end

Return first instance of capturing group if found, otherwise empty string

My inputs are strings that may or may not contain a pattern:
p = '(\d)'
s = 'abcd3f'
I want to return the capturing group for the first match of this pattern if it is found, and an empty string otherwise.
result = re.search(p, s)[1]
Will return the first match. But if s = 'abcdef' then search will return None and the indexing will throw an exception. Instead of doing that, I'd like it to just return an empty string. I can do:
g = re.search(p, s)
result = ''
if len(g) > 0: result = g[1]
Or even:
try:
result = re.search(p, s)[1]
except:
result = ''
But these both seem pretty complicated for something so simple. Is there a more elegant way of accomplishing what I want, preferably in one line?

You could use if YourString is None: to accomplish that. For example:
if s is None : s = ''
Example for Python:
import re
m = re.search('(\d)', 'ab1cdf')
if m is None : m = ''
print m.group(1)

How does regex capturing work in scala?

Here is an example:
object RegexTest {
def main (args: Array[String]): Unit = {
val input = "Enjoy this apple 3.14 times"
val pattern = """.* apple ([\d.]+) times""".r
val pattern(amountText) = input
val amount = amountText.toDouble
println(amount)
}
}
I understand what this does, but how does val pattern(amountText) = input actually work? It looks very weird to me.

What that line is doing is calling Regex.unapplySeq (which is also called an extractor) to deconstruct input into a list of captured groups, and then bind each group to a new variable. In this particular scenario, only one group is expected to be captured and bound to the value amountText.
Validation aside, this is kinda what's going on behind the scenes:
val capturedGroups = pattern.unapplySeq(input)
val amountText = capturedGroups(0)
// And this:
val pattern(a, b, c) = input
// Would be equivalent to this:
val capturedGroups = pattern.unapplySeq(input)
val a = capturedGroups(0)
val b = capturedGroups(1)
val c = capturedGroups(2)
It is very similar in essence to extracting tuples:
val (a, b) = (2, 3)
Or even pattern matching:
(2,3) match {
case (a, b) =>
}
In both of these cases, Tuple.unapply is being called.

I suggest you have a look at this page : http://docs.scala-lang.org/tutorials/tour/extractor-objects.html. It is the official tutorial regarding extractors which this the pattern you are looking for.
I find that looking at the source makes it clear how it works : https://github.com/scala/scala/blob/2.11.x/src/library/scala/util/matching/Regex.scala#L243
Then, note that your code val pattern(amountText) = input is perfectly working, but, you must be sure about the input and be sure that there is a match with the regex.
Otherwise, I recommend you to write it this way :
input match {
case pattern(amountText) => ...
case _ => ...
}

Shortcut to get a statement with certain pattern in R

I have to write the following as it is.
('trial1' = Ozone1, 'trial2' = Ozone2, trial3 = Ozone3,...........trial1000 = Ozone1000)
I want to write this with one command in R. How do I do it?
I tried it using paste0
Let us take only 5 as number of repetitions:
paste0("trial",1:5,"= Ozone", 1:5)
I get this as result.
"trial1= Ozone1" "trial2= Ozone2" "trial3= Ozone3" "trial4= Ozone4" "trial5= Ozone5"
But it is not the way I wanted it. I want the output to come out as it is like (not even in inverted commas):
('trial1' = Ozone1, 'trial2' = Ozone2, 'trial3' = Ozone3, 'trial4' = Ozone4, 'trial5 = Ozone5)
Also as you can see, it is not a string i.e. output should not come between inverted commas as "........". I want it as it is exactly.
How do i do it?

This will generate the string you want...
paste0('(',paste0("'trial",1:1000,"'= Ozone",1:1000,collapse=' ,'),')')
This will print the string without quotes...
print(paste0('(',paste0("'trial",1:10,"'= Ozone",1:10,collapse=' ,'),')'), quote=FALSE)
I hope it answered your question...

You need to escape the single quotes, ie \', and use the collapse argument of paste0:
paste0("(", paste0("\'trial",1:5,"\' = Ozone",1:5, collapse=", "), ")")
[1] "('trial1' = Ozone1, 'trial2' = Ozone2, 'trial3' = Ozone3, 'trial4' = Ozone4, 'trial5' = Ozone5)"

Simplify regular expression

I want to simplify this regular expression:
0*|0*1(ε|0*1)*00*
I used this identity:
(R+S)*=(R*S*)*=(R*+S*)*
and couldn't get better than this:
0*|0*1(0*1)*00* [(ε|0*1)*=(ε*0*1)*=(0*1)*]
Can this regular expression be simplified even more, and how? I have no clue what else to do. :)
EDIT 1: I altered + to | ,for + could stand for "one or more times", beside alternation which is now denoted by |.
Explanation of notation:
1) ε stands for empty word
2) * is Kleene star
3) AB is just a concatenation of languages of regular expressions A and B.
EDIT 2: Formal proof that this reduces to (0*1)*0+|ε:
0*|0*1(ε|0*1)*00* =
= 0*|0*1(0*1)*0+ =
= 0*|(0*1)+0+ =
= 0+|ε|(0*1)+0+ =
= ε0+|(0*1)+0+|ε
= (ε|(0*1)+)0+|ε
= (0*1)*0+|ε
Is there any way to reduce it further to (0|1)*0|ε?

I think it reduces to this (0*1)*0+|

(Update: See edit history for long, sad story of previous incorrect attempts).
I (now) believe this reduces to:
ε|(0|1)*0
in other words, either:
The empty string
Any string of ones and zeros ending in 0
Proving this is another matter altogether. ;-)

I managed to formally reduce given regular expression to ε|(0|1)*0.
This is the proof:
0*|0*1(ε|0*1)*00* =
= 0*|0*1(0*1)*0+ =
= 0*|(0*1)+0+ =
= 0+|ε|(0*1)+0+ =
= ε0+|(0*1)+0+|ε =
= (ε|(0*1)+)0+|ε =
= (0*1)*0+|ε =
= (0*1)*0*0|ε = #
= (0|1)*0|ε
The trick was to use the identity (A*B)*A* = (A|B)* of which I wasn't aware when the question was asked, in the step marked with #.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to remove anything after a non-slash character in a string? - regex

You can utilize sub for this task: sub('(.\\.(?i:com|us))[^/]+', '\\1', x) If you're wanting a more general approach, you can use: sub('(.\\.[[:alpha:]]{2,3})[^/]*', '\\1', x) CodeBunk

Related

How to replace the captured group in Ruby

Return first instance of capturing group if found, otherwise empty string

How does regex capturing work in scala?

Shortcut to get a statement with certain pattern in R

Simplify regular expression

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to remove anything after a non-slash character in a string? - regex

You can utilize sub for this task: sub('(.*\\.(?i:com|us))[^/]+', '\\1', x) If you're wanting a more general approach, you can use: sub('(.*\\.[[:alpha:]]{2,3})[^/]*', '\\1', x) CodeBunk

Related

How to replace the captured group in Ruby

Return first instance of capturing group if found, otherwise empty string

How does regex capturing work in scala?

Shortcut to get a statement with certain pattern in R

Simplify regular expression

Categories

Resources

You can utilize sub for this task: sub('(.\\.(?i:com|us))[^/]+', '\\1', x) If you're wanting a more general approach, you can use: sub('(.\\.[[:alpha:]]{2,3})[^/]*', '\\1', x) CodeBunk