get number after matching string in ruby with regex - regex

I'm trying to get the number of tracking_number from the string.
string = '{:rate_type=>"PAYOR_ACCOUNT_PACKAGE", :rated_weight_method=>"ACTUAL", tracking_number=>"795856589804"}, :group_number=>"0", :package_rating=>{:actual_rate_type=>"PAYOR_ACCOUNT_PACKAGE", :package_rate_details=>{:rate_type=>"PAYOR_ACCOUNT_PACKAGE", :rated_weight_method=>"ACTUAL", :minimum_charge_type=>"CUSTOMER_FREIGHT_WEIGHT", :billing_weight=>{:units=>"LB", :value=>"1.0"}}'
I have tried /tracking_number=>['"]((.*?)['"])*/ but getting all the string after the match.
Can anybody help me on this.
I have tried this at https://rubular.com/r/ZcmJinTHDQSDsZ
Output I want is 795856589804

Remove * from end of your regex. This is the reason you are getting all the string after match.
And If you want to get just the number part then use this regex.
/tracking_number=>"(\d+)"/

Related

parse comma seperated values in argumentlist that's seperated by commas

So i have this regex:
=([0-9A-Za-z_-]+),?
and i need have a string like:
foo=bar,pine=apple,tree,bar=bie
or
foo=bar,pine=apple,tree
or
pine=apple,tree
the regex works for cases where i only have 1 value.
but since we have comma's in the list of values for the key.
the regex just craps out and my code does half of what i want it to do but doesn't get the 2nd value.
How do i fix my regex to take both values regardless of where in the string it is?
alone, between 2 others, at the end.
i tried some stuff but couldn't figure it out.
Attempt 1:
=([0-9A-Za-z,_-]+),=?
In this case, it matches the one where it's in the middle but it fails on the others because = does not exist.
Attempt 2:
=[0-9A-Za-z_-]+([,]+[0-9A-Za-z_-]*),?
Matches too bar,pine and tree,bar for example
EDIT::
This seems to work maybe....
=('[0-9A-Za-z,_-]+'),*|=([0-9A-Za-z_-]+),*
if i use quotes for multi values..
You can split on variable names - that will leave only the values:
s := regexp.MustCompile("[^,\\s]+=").Split("foo=bar,pine=apple,tree,bar=bie", -1)
fmt.Println(s)
# => [ "bar", "apple,tree", "bie"]
Go Demo
Regex Demo

Extract string of numbers from URL using regex PIG

I'm using PIG to generate a list of URLs that have been recently visited. In each of the URLs, there is a string of numbers that represents the product page visited. I'm trying to use a regex_extract_all() function to extract just the string of numbers, which vary in length from 6-8. The string of digits can be found directly after jobs2/view/ and usually ends with +&cd but sometimes they may end with ).
Here are a few example URLs:
(http://a.com/search?q=cache:QD7vZRHkPQoJ:ca.xyz.com/jobs2/view/17069404+&cd=1&hl=en&ct=clnk&gl=ca)
(http://a.com/search?q=cache:G9323j2oNbAJ:ca.xyz.com/jobs2/view/5977065+&cd=1&hl=en&ct=clnk&gl=ca)
(http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk)
(http://a.com/search?q=cache:aNspmG11AJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk)
(http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=cl k&gl=hk)
Here is the current regex I am using:
J = FOREACH jpage GENERATE FLATTEN(REGEX_EXTRACT_ALL(TEXTCOLUMN, '\/view\/(\d+)\+\&')) as (output:chararray)
I have also tried other forms such as:
'[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]', 'view.([0-9]+)', 'view\/([\d]+)\+',
'[0-9][0-9][0-9]+', and
'[0-9][0-9][0-9]*'; none of which work.
Can anybody assist here or have another way of going about it?
Much appreciated,
MM
Reason for"Unexpected character 'D'" is, you need to put double backslash instead of single backslash. eg just replace [\d+] to [\\d+]
Here your solution, please validate all your inputs strings
input.txt
http://a.com/search?q=cache:QD7vZRHkPQoJ:ca.xyz.com/jobs2/view/17069404+&cd=1&hl=en&ct=clnk&gl=ca
http://a.com/search?q=cache:G9323j2oNbAJ:ca.xyz.com/jobs2/view/5977065+&cd=1&hl=en&ct=clnk&gl=ca
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk
http://a.com/search?q=cache:aNspmG11AJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clk&gl=hk
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928)=2&hl=zh-TW&ct=clk&gl=hk
http://webcache.googleusercontent.com/search?q=cache:http://my.linkedin.com/jobs2/view/9919248
Updated Pigscript:
A = LOAD 'input.txt' as line;
B = FOREACH A GENERATE REGEX_EXTRACT(line,'.*/view/(\\d+)([+|&|cd|)?]+)?',1);
dump B;
(17069404)
(5977065)
(16988928)
(16988928)
(16988928)
(16988928)
I'm not familiar with PIG, but this regex will match your target:
(?<=/jobs2/view/)\d+
By using a (non-consuming) look behind, the entire match (not just a group of the match) is your number.

How to include 2 words within Regex and result must be based on only those 2 words VB.NET

I would like to know how to include only 2 or more keywords within a Regex. and ending results should only show those words defined, not only one word.
What I currently have works with multiple keywords but I want it to use BOTH words not either one of the other.
For example:
Dim pattern As String = "(?i)[\t ](?<w>((arma)|(crapo))[a-z0-9]*)[\t ]"
Now the code works fine by including 'arma' or 'crapo'. I only want it to include BOTH 'arma' AND 'crapo' otherwise do not show any results.
Dealing with finding certain keywords within a PDF document and I only want to be shown results if the PDF document includes BOTH 'arma' and 'crapo' (Works fine by showing results for 'arma' OR 'crapo' I want to see results based on 'arma' AND 'crapo'.
Sorry for sounding so repetitive.
Edit: Here is my code. Please read comment.
Dim filesz() As String = GetPatternedFiles("c:\temp\", New String() {"tes*.pdf", "fes*.pdf", "Bas*.pdf"})
'The getpatterenedfiles is a function" also gettextfromPDF is another function.
For Each s As String In filesz
Dim thetext As String = Nothing
Dim pattern As String = "(?i)[\t ](?<w>(crapo)|(arma)[a-z0-9]*)[\t ]"
thetext = GetTextFromPDF(s)
For Each m As Match In Regex.Matches(thetext, pattern)
ListBox1.Items.Add(s)
Next
Next
You can use this regex:
\barma\b.*?\bcrapo\b|\bcrapo\b.*?\barma\b
Working demo
The idea is to match arma whatever crapo or crapo whatever arma and use word boundaries to avoid words like karma.
However, if you want to match karma or crapotos as you asked in your comment you can use:
arma.*?crapo|crapo.*?arma

Python regex to get string in front of hyphen and plus signs

I have a string as shown in the code. I want to get the final result as: ['AA', 'BB','CC'].
But what I have got here is ['AA', 'BB']. Could you please give me some suggestion? Thank you.
s = "AA-ZZ, BB+ZZ, CC"
a = re.findall(r'(\w+)[-|\\+\\]\w',s)
Use lookahead to see whether the string is in front of +, - or at the end of string.
a = re.findall(r'(\w+)(?=[-+]|$)',s)

Regex to find segment of string searching from end

I'm in Java and have a string that will always be in this format:
;<b>gerg(1314)</b><br> (KC)<br>
This number 461610734 will change and may be any length.. I'd like to pick that number out and use it. As you can see the number is next to a ' (the first one working backwards) and a hash # (again, the first one working backwards).
I can find the numbers after the hash by using ([^\#]+$) and I can find up to the last ' by using ([^\']+$) (but this would be on the wrong side of the '...)
I'm lost... Anyone know how to join these two together and nudge the ' along one to the left to just get the numbers?
Actually, I believe that you could simply extract "the digits that immediately follow a #".
You could then use the following regex: (?<=#)\d+.
On the other hand, if you really want to specify that your digits are following a # and followed by a ', you could (should?) make use of the look-arounds.
The following regex should be what you're looking for:
(?<=#)\d+(?=')
You can see it live by clicking this link.
Try this:
String str = ";<b>gerg(1314)</b><br> (KC)<br>";
Pattern pattern = Pattern.compile("onClick=\"return CCL\\(this,'#([0-9]+)'");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1)); // Prints 461610734
}