I'm using groovy to write a script that replaces UNC server names and a part of the directory structure. I have the following:
def patternToFind = /\\\\([a-zA-Z0-9-]+)\\share\\([a-zA-Z]+)/
def patternToReplace = '\\\\\\\\SHARESERVER\\\\share\\\\OPS'
This works, but all those \'s are pretty ugly. I understand in the regex why \\\\ is used to find \\, but what is confusing me is why in the replacement I'm doing I have to use four \'s to equal one \.
If anyone has a nicer way to do this I would greatly appreciate it. The goal is to replace
\\<server>\share\<env>
with the correct value for <server> and <env>
Thanks!
EDIT: I guess I should clarify. SHARESERVER and OPS are actually variables. So truly the end result would be something like:
def serverName = //some passed in server
def env = //some passed in env
def patternToFind = /\\\\([a-zA-Z0-9-]+)\\NAS\\([a-zA-Z]+)/
def patternToReplace = '\\\\\\\\' + serverName + '\\\\share\\\\' + env
So the only way I think of doing it is building a string literal to replace the section I'm looking for with.
And I'll be the first to admit that I suck at reg ex, so if you can use them to capture a value in a string and replace just that value with another, I'm all ears.
Doesn't it work with
def patternToReplace = $/\\SHARESERVER\share\OPS/$
If you want to use a literal replacement string (as opposed to one that involves $n backreferences) with a regular expression in Java then the safest thing to do is use Matcher.quoteReplacement:
def patternToReplace = Matcher.quoteReplacement(/\\SHARESERVER\shares\OPS/)
Related
I have a challenge getting the desired result with RegEx (using C#) and I hope that the community can help.
I have a URL in the following format:
https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1
I want make two modifications, specifically:
1) Remove everything after 'value' e.g. '&ida=0&idb=1'
2) Replace 'category' with e.g. 'newcategory'
So the result is:
https://somedomain.com/subfolder/newcategory/?abc=text:value
I can remove the string from 1) e.g. ^[^&]+ above but I have been unable to figure out how to replace the 'category' substring.
Any help or guidance would be much appreciated.
Thank you in advance.
Use the following:
Find: /(category/.+?value)&.+
Replace: /new$1 or /new\1 depending on your regex flavor
Demo & explanation
Update according to comment.
If the new name is completely_different_name, use the following:
Find: /category(/.+?value)&.+
Replace: /completely_different_name$1
Demo & explanation
You haven't specified language here, I mainly work on python so the solution is in python.
url = re.sub('category','newcategory',re.search('^https.*value', value).group(0))
Explanation
re.sub is used to replace value a with b in c.
re.search is used to match specific patterns in string and store value in the group. so in the above code re.search will store value from "https to value" in group 0.
Using Python and only built-in string methods (there is no need for regular expressions here):
url = r"https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1"
new_url = (url.split('value')[0] + "value").replace("category", 'newcategory')
print(new_url)
Outputs:
https://somedomain.com/subfolder/newcategory/?abc=text:value
file app_ids.txt is of the following format:
app1 = "0123456789"
app2 = "1234567890"
app3 = "2345678901"
app4 = "3456789012"
app5 = "4567890123"
printing the lines containing the given regex with the following code in file, find_app_id.jl:
#! /opt/julia/julia-1.1.0/bin/julia
function find_app_id()
app_pattern = "r\"app2.*\"i"
open("/path/to/app_ids.txt", "r") do apps
for app in eachline(apps)
if occursin(app_pattern, app)
println(app)
end
end
end
end
find_app_id()
$/home/julia/find_app_id.jl, does not print the second line though it contains the regex!
How do I solve this problem?
Your regular expression looks odd. If you change the line which assigns to app_pattern to
app_pattern = r"app2.*"
it should work better.
For example, the following prints "Found it" when run:
app_pattern = r"app2.*"
if occursin(app_pattern, "app2 = blah-blah-blah")
println("Found it")
else
println("Nothing there")
end
Best of luck.
I'm not sure, how regex matching works in Julia, this post might help you to figure it out.
However, in general, your pattern is quite simple, and you probably do not need regular expression matching to do this task.
This RegEx might help you to design your expression.
^app[0-9]+\s=\s\x22([0-9]+)\x22$
There is a simple ([0-9]+) in the middle where your desired app ids are, and you can simply call them using $1:
This graph shows how the expression would work:
As I write this I realise there are two parts to this question, however I think I am only really stuck on the first part and therefore the second is only provided for context:
Part A:
I need to search the contents of each value returned by a for loop (where each value is a url) for the following:
href="/dir/Sub_Dir/dir/163472311232-text-text-text-text/page-n"
where:
the numerals 163472311232 could be any length (ie it could be 5478)
-text-text-text-text could be any number of different words
where page-n could be from page-2 up until any number
where matches are not returned more than once, ie only unique matches are returned and therefore only one of the following would be returned:
href="/dir/Sub_Dir/dir/5422-la-la/page-4
href="/dir/Sub_Dir/dir/5422-la-la/page-4
Part B:
So the logic would be something like:
list_of_urls = original_list
for url in list_of_urls:
headers = {'User-Agent' : 'Mozilla 5.0'}
request = urllib2.Request(url, None, headers)
url_for_re = urllib2.urlopen(request).read()
another_url = re.findall(r'href="(/dir/Sub_dir\/dir/[^"/]*)"', url_for_re, re.I)
file.write(url)
file.write('\n')
file.write(another_url)
file.write('\n')
Which i am hoping will give me output similar to:
a.html
a/page-2.html
a/page-3.html
a/page-4.html
b.html
b/page-2.html
b/page-3.html
b/page-4.html
So my question is (assuming the logic in part B is ok):
What is the required regex pattern to use for part A?
I am a newbie to python and regex so this will limit my understanding somewhat in regards to relatively complicated regex suggestions etc.
update:
after suggestions i tried to test the following regex which did not produce any results:
import re
content = 'href="/dir/Sub_Dir/dir/5648342378-text-texttttt-texty-text-text/page-2"'
matches = re.findall(r'href="/dir/Sub_Dir/dir/[0-9]+-[a-zA-Z]+-[a-zA-Z]+-[a-zA-Z]+-[a-zA-Z]+/page-([2-9]|[1-9][0-9]+)"', content, re.I)
prefix = 'http://www.test.com'
for match in matches:
i = prefix + match + '\n'
print i
solution:
i think this is the regex that will work:
matches = re.findall(r'href="(/dir/Sub_Dir/dir/[^"/]*/page-[2-9])"', content, re.I)
You can have... most of what you want. Regexes don't really do the distinct thing, so I suggest you just use them to get all the URLs, and then remove duplicates yourself.
Off the top of my head it would be something like this:
href="/dir/Sub_Dir/dir/[0-9]+-[a-zA-Z]+-[a-zA-Z]+-[a-zA-Z]+-[a-zA-Z]+/page-([2-9])|([1-9][0-9]+)"
Plus or minus escaping rules, specifics on what words are allowed, etc. I'm a Windows guy, there's a great tool called Expresso which is helpful for learning regexes. I hope there's an equivalent for whatever platform you're using, it comes in handy.
I am currently working in an application where I need to find all occurrences of strings like ${[0-9-a-zA-Z]} in a bigger string. Here is my method:
def countVariables(str) {
def pattern = ~'${sss}'
def matcher = str =~ pattern
print matcher.count
}
Now the problem.
When I pass a string like "asidb ${sss} asodniasndin", I get:
groovy.lang.MissingPropertyException: No such property: sss for class: ConsoleScript83
I think that, given that in Groovy ${} are properties, I'm having these conflicts.
In this case, I would have to run the whole text searching for the dollar sign and replacing it for something else? Or is there a simpler way to do this?
Regards!
Are you using single quotes so groovy doesn't do the expansion and just gives you a string?
Ie:
countVariables( 'asidb ${sss} asodniasndin' )
I have I huge backup of posts of my blog. All posts has images like:
"http://www.mysite.com/nonono-nonono.jpg"
or
"http://www.mysite.com/nonono-nonono.gif"
or even
"http://www.mysite.com/nonono.jpg"
But I have other links for urls on the same domain like ""http://www.mysite.com/category/post.html" and I just want to replace urls for the images (luckly all images are on the root of the website).
I need to learn RegExp to do that? Is there any powerful tool to find and replace texts like this? Thanks
Regular expressions will be your best bet... maybe something like this (based on the one from strfriend)?
^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+#)?([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.(jpg|gif|png))?
Regular expressions are certainly one way to do it, and probably the most flexible. But if all of your image urls start with "http://www.mysite.com/" and end with ".jpg", then you can use string manipulation functions. For example, if you have a string variable called s, that you want to test:
const string mysite = "http://www.mysite.com/";
const string jpg = ".jpg";
string newString = string.Empty;
if (s.BeginsWith(mysite))
{
if (s.EndsWith(jpg))
{
string textToReplace = s.SubString(mysite.Length, s.Length - mysite.Length - jpg.Length);
newString = s.Replace(textToReplace, "whatever you want to replace it with.");
}
}
It's a rather brute force method, but it'll work.
I'm using RegExp on EditPad Pro. I'll find a good tutorial for beginners also. Thanks for the tip #CalvinR
It's possible with regular expressions, but I'd probably write a Python script using Beautiful Soup:
# fix_imgs.py
import sys
from BeautifulSoup import BeautifulSoup
for filename in sys.argv[1:]:
contents = open(filename).read()
soup = BeautifulSoup(contents)
# replacing each img tag
for img in soup.findAll('img'):
img.src = img.src.replace("http://www.mysite.com", "http://www.example.com")
new_contents = str(soup)
output_filename = "replaced." + filename
open(output_filename, "w").write(new_contents)
Honestly I think you should learn regular expressions regardless, it's a great tool to have up your sleeve especially in situations such as this. They are an extremely powerful tool for string manipulation, Perl is also a great language to learn at the same time as it makes using Reg Exps a breeze.
To replace all filenames by 'new_image_name_here' in image urls:
$ perl -pe's~(http://.*?/)[^/]+?\.(jpg|gif)\b~$1new_image_name_here.$2~g' huge_file.html > output.html
To replace a netloc part by 'www.othersite.org' in 'http://<netloc>/<image_path>':
$ perl -pe's~(?<=http://)[^/]+(?=/(?:[^/]+/)*[^/]+?\.(?:jpg|gif)\b)~www.othersite.org~g' huge_file.html > output.html
These regexs are simple therefore they are easily fooled. Use more specific regexs for your input data.