Having trouble doing a search and replace in Ruby - regex

I’m using Rails 4.2.3 and trying to do a regular expression search and replace. If my variable starts out like so …
url = “http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0=“
and then I run that through
display_start = url.match(/iDisplayStart=(\d+)/).captures[0]
display_start = display_start.to_i + 1000
url = url.gsub(/iDisplayStart=(\d+)/) { display_start }
The result is
http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&1001&iDisplayLength=100&mDataProp_0=
But what I want is to simply replace the “iDisplayStart” parameter with my new value, so I would like the result to be
http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&1001&iDisplayStart=1001&iDisplayLength=100&mDataProp_0=
How do I do this?

You can achieve what you want with
url = "http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0="
display_start = url.sub(/(?<=iDisplayStart=)\d+/) {|m| m.to_i+1000}
puts display_start
See the IDEONE demo
Since you replace 1 substring, you do not need gsub, a sub will do.
The block takes the whole match (that is, 1 or more digits that are located before iDisplayStart), m, and converts to an int value that we add 1000 to inside the block.
Another way is to use your regex (or add \b for a safer match) and access the captured vaalue with Regexp.last_match[1] inside the block:
url = "http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0="
display_start = url.sub(/\biDisplayStart=(\d+)/) {|m| "iDisplayStart=#{Regexp.last_match[1].to_i+1000}" }
puts display_start
See this IDEONE demo

Related

The regex in string.format of LUA

I use string.format(str, regex) of LUA to fetch some key word.
local RICH_TAGS = {
"texture",
"img",
}
--\[((img)|(texture))=
local START_OF_PATTER = "\\[("
for index = 1, #RICH_TAGS - 1 do
START_OF_PATTER = START_OF_PATTER .. "(" .. RICH_TAGS[index]..")|"
end
START_OF_PATTER = START_OF_PATTER .. "("..RICH_TAGS[#RICH_TAGS].."))"
function RichTextDecoder.decodeRich(str)
local result = {}
print(str, START_OF_PATTER)
dump({string.find(str, START_OF_PATTER)})
end
output
hello[img=123] \[((texture)|(img))
dump from: [string "utils/RichTextDecoder.lua"]:21: in function 'decodeRich'
"<var>" = {
}
The output means:
str = hello[img=123]
START_OF_PATTER = \[((texture)|(img))
This regex works well with some online regex tools. But it find nothing in LUA.
Is there any wrong using in my code?
You cannot use regular expressions in Lua. Use Lua's string patterns to match strings.
See How to write this regular expression in Lua?
Try dump({str:find("\\%[%("))})
Also note that this loop:
for index = 1, #RICH_TAGS - 1 do
START_OF_PATTER = START_OF_PATTER .. "(" .. RICH_TAGS[index]..")|"
end
will leave out the last element of RICH_TAGS, I assume that was not your intention.
Edit:
But what I want is to fetch several specific word. For example, the
pattern can fetch "[img=" "[texture=" "[font=" any one of them. With
the regex string I wrote in my question, regex can do the work. But
with Lua, the way to do the job is write code like string.find(str,
"[img=") and string.find(str, "[texture=") and string.find(str,
"[font="). I wonder there should be a way to do the job with a single
pattern string. I tryed pattern string like "%[%a*=", but obviously it
will fetch a lot more string I need.
You cannot match several specific words with a single pattern unless they are in that string in a specific order. The only thing you could do is to put all the characters that make up those words into a class, but then you risk to find any word you can build from those letters.
Usually you would match each word with a separate pattern or you match any word and check if the match is one of your words using a look up table for example.
So basically you do what a regex library would do in a few lines of Lua.

How to use regex to extract such url

Here is a text:
<a class="mkapp-btn mab-download" href="javascript:void(0);" onclick="zhytools.downloadApp('C100306099', 'appdetail_dl', '24', 'http://appdlc.hicloud.com/dl/appdl/application/apk/f4/f44d320c2c1b466389e6f6b3d3f5cff4/com.uniquestudio.android.iemoji.1806141014.apk?sign=portal#portal1531621480529&source=portalsite' , 'v1.1.4');">
I want to extract
http://appdlc.hicloud.com/dl/appdl/application/apk/f4/f44d320c2c1b466389e6f6b3d3f5cff4/com.uniquestudio.android.iemoji.1806141014.apk?sign=portal#portal1531621480529&source=portalsite
I use below code to extract it.
m = re.search("mkapp-btn mab-download.*'http://[^']'", apk_page)
In my opinion, I can use .* to match the string between mkapp-btn mab-download and http. However I failed.
EDIT
I also tried.
m = re.search("(?<=mkapp-btn mab-download.*)http://[^']'", apk_page)
You need to add + after exclusion ([^']) because is more than one character. Also, you need to group using parenthesis to extract only the part you want.
m = re.search("mkapp-btn mab-download.*'(http[^']+)'", apk_page)
m.groups()
And the output will be
('http://appdlc.hicloud.com/dl/appdl/application/apk/f4/f44d320c2c1b466389e6f6b3d3f5cff4/com.uniquestudio.android.iemoji.1806141014.apk?sign=portal#portal1531621480529&source=portalsite',)

Get digits between slashes or on the end in URL

I need a reg expression (for groovy) to match 7 digits between 2 slashes (in a url) or on the end of the url. So fe:
https://stackoverflow.com/questions/6032324/problem-with-this-reg-expression
I need 6032324 but it should also match:
https://stackoverflow.com/questions/6032324
If it has 1 digit more/less, I should not match.
Maybe its an easy reg exp but Im not so familiar with this :)
Thanks for you help!
Since you are parsing a URL, it makes sense to use an URL parser to first grab the path part to split with /. Then, you will have direct access to the slash-separated path parts that you may test against a very simple [0-9]{7} pattern and get them all with
def results = new URL(surl).path.split("/").findAll { it.matches(/\d{7}/) }
You may also take the first match:
def results = new URL(surl).path.split("/").findAll { it.matches(/\d{7}/) }.first()
Or last:
def results = new URL(surl).path.split("/").findAll { it.matches(/\d{7}/) }.last()
See the Groovy demo:
def surl = "https://stackoverflow.com/questions/6032324/problem-with-this-reg-expression"
def url = new URL(surl)
final result = url.path.split("/").findAll { it.matches(/\d{7}/) }.first()
print(result) // => 6032324

Regular Expression(^string\s\w+) in python

I'd like to know how to search using regular expression.
this is my code
import re
data = "python one test code"
p = re.compile("^python\s\w+")
result
print(p.findall(data))
['python one']
The result I want to get is as below
print(p.findall(data))
['python one test code']
I can get the above result if as below
p = re.compile("^python\s\w+\s\w+\s\w+")
but I don't want to repeat "\s\w+" like as "^python\s\w+\s\w+\s\w+"
How can I get the result using * or + in "\s\w+" ?
You can try this:
^python(?:\s\w+)+
Explanation
^python string starts with python
?: the () group will not be captured
\s\w+ will match a space and the immediate word
(?:\s\w+)+ the outer plus will match all occurrence of no.3 where + means one or more
You can try following
p = re.compile("^python[\w\s]*")

Replace using RegEx outside of text markers

I have the following sample text and I want to replace '[core].' with something else but I only want to replace it when it is not between text markers ' (SQL):
PRINT 'The result of [core].[dbo].[FunctionX]' + [core].[dbo].[FunctionX] + '.'
EXECUTE [core].[dbo].[FunctionX]
The Result shoud be:
PRINT 'The result of [core].[dbo].[FunctionX]' + [extended].[dbo].[FunctionX] + '.'
EXECUTE [extended].[dbo].[FunctionX]
I hope someone can understand this. Can this be solved by a regular expression?
With RegLove
Kevin
Not in a single step, and not in an ordinary text editor. If your SQL is syntactically valid, you can do something like this:
First, you remove every string from the SQL and replace with placeholders. Then you do your replace of [core] with something else. Then you restore the text in the placeholders from step one:
Find all occurrences of '(?:''|[^'])+' with 'n', where n is an index number (the number of the match). Store the matches in an array with the same number as n. This will remove all SQL strings from the input and exchange them for harmless replacements without invalidating the SQL itself.
Do your replace of [core]. No regex required, normal search-and-replace is enough here.
Iterate the array, replacing the placeholder '1' with the first array item, '2' with the second, up to n. Now you have restored the original strings.
The regex, explained:
' # a single quote
(?: # begin non-capturing group
''|[^'] # either two single quotes, or anything but a single quote
)+ # end group, repeat at least once
' # a single quote
JavaScript this would look something like this:
var sql = 'your long SQL code';
var str = [];
// step 1 - remove everything that looks like an SQL string
var newSql = sql.replace(/'(?:''|[^'])+'/g, function(m) {
str.push(m);
return "'"+(str.length-1)+"'";
});
// step 2 - actual replacement (JavaScript replace is regex-only)
newSql = newSql.replace(/\[core\]/g, "[new-core]");
// step 3 - restore all original strings
for (var i=0; i<str.length; i++){
newSql = newSql.replace("'"+i+"'", str[i]);
}
// done.
Here is a solution (javascript):
str.replace(/('[^']*'.*)*\[core\]/g, "$1[extended]");
See it in action