I have been messing with this for a while now, and decided to post on here to see if anyone could help out. I even messed around with the RegExr tool (with no luck):
Anyway, I have a String that contains the verbiage (without the quotes):
"13.5 to 14.1"
I need to create a var with the first number: 13.5 and a var with the second number: 14.1
So I want the following result:
var firstVal:String = 13.5;
var secondVal:String = 14.1;
I got it to work by doing the following for the first number:
var lowRegExp:RegExp=/\d[0-9].\d[0-9]/;
And for the second number I did this:
var highRegExp:RegExp=/\d[0-9].\d[0-9]$/;
My problem here is that I will not know the format of the String. It could also look like this (two digits trailing the decimal):
13.57 to 14.10
So I need to make sure that it works using the following combinations:
13.50 to 14.1, 13.5 to 14.10, 3.50 to 4.10, 3.5 to 4.1 (all combinations must work)
Any help is much appreciated!
Here is what I got to work. I am not sure how clean this is, and I am not a fan of hard coding, but it works for all scenarios. If someone knows a clean way to do this, please let me know.
var myString:String="13.5 to 14.1";
var firstVal:String=myString.substring(0, myString.search(" to "));
var secondVal:String=myString.substring((myString.search(" to ") + 4));
Should be pretty straight forward, you want the following:
- Any # of digits, followed by a period literal, followed by any # of digits.
Pattern: \d+\.\d+
So use something similar:
var mystr:String = "15.4 to 153.93";
var tokens:Array = mystr.match(/\d+\.\d+/g);
Also, I have gotten myself in the habit of using regexpal.com which is way faster than iterative testing in your application. ;)
Getting stuck on how to read and pretty up these values from a multiline cell via arrayformula.
Im using regex as preceding line can vary.
just formulas please, no custom code
The first column looks like a set of these:
name = the_name
texture = blah.dds
cost = 1000
value = 1000
type = ATTR_A
value = 8
type = ATTR_B
name = feature_blah
0 = comp_one,1
res_one = 1
res_five = 1
res_four = 1
Where to be useful elsewhere, at minimum it needs each [tag] set ([effect\d], [feature\d], ect) to be in one column each, for example the 'effects' column would look like:
and so on.
Desired output can also be seen in the included spreadsheet
<b>Here is the example spreadsheet:</b>
Kinda works, finds each 'type' and 'value' great, just cant figure out how to extract just that from the rest, tried capture (and non-capturing) groups before and after but didnt work
=ARRAYFORMULA(REGEXREPLACE($A3:$A,"[\n.][effect\d][\n.](.)\n(.)","1:$1 2:$2"))
A different approach entirely, also kinda works, longer form though and left with having to parse the values out of that string, where got stuck again. Idea was to use this to simplify, then regexreplace like above. Getting stuck removing content around the final matches though, and if can do that then above approach is fine too.
// First ran a substitute
// Then variation of this (gave up on single line 'effect/d' so broke it up to try and get it working)
// Then use regexreplace like above
=ARRAYFORMULA(REGEXREPLACE($B3:$B,"value = (.);type = (.);;","1:$1 2:$2"))
Also, as my updated 'Desired Output' sheet shows (see timestamped comment below), bonus kudos if you can also extract just the values of matching 'type's to those extra columns (see spreadsheet).
All good if you cant though, just realized would need that too for lookups.
**--END OF EDIT--**
Ive tried dozens of things, discarding each in turn, had a quick look in version history to grab out two promising attempts and shared them in separate sheets.
One of these also used SUBSTITUTE to simplify input column, im happy for a solution using either RAW or the SUBSTITUTE results.
I also have looked at dozens of stackoverflow and google support pages, so tried both REGEXEXTRACT and REGEXREPLACE, both promising but missing that final tweak. And i tried dozens of tweaks already on both.
Any help would be great, and hopefully help others in future since examples with spreadsheets are great since every new REGEX seems to be a new adventure ;)
paste in B3:
IF(C3:E<>"", C2:E2&":"&C3:E, )),,999^99))), " ", ", "))
paste in C3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&C2)))
paste in D3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&D2)))
paste in E3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&E2)))
paste in F3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[feature\d+\]\nname = (.*)")))
paste in G3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[components\]\n\d+ = (.*)")))
paste in H3:
REGEXREPLACE(A3:A, "\n", ", "), "\[resources\], (.*)"), "["),,1), ", , $", )))
spreadsheet demo
Caveat first: I have added some "input data". Examples:
name = feature_active_spoiler2
0 = spoiler,1
1 = spoilerA, 2
So the output has "extra" output.
See the tab ADW's Solution.
I have an angular app using the mongodb sdk for js.
I would like to suggest some words on a input field for the user from my words collection, so I did:
getSuggestions(term: string) {
var regex = new stitch.BSON.BSONRegExp('^' +term , 'i');
return from(this.words.find({ 'Noun': { $regex: regex } }).execute());
The problem is that if the user type for example Bie, the query returns a lot of documents but the most accurated are the last ones, for example Bier, first it returns the bigger words, like Bieberbach'sche Vermutung. How can I deal to return the closests documents first?
A regular-expression is probably not enough to do what you are intending to do here. They can only do what they're meant to do – match a string. They might be used to give you a candidate entry to present to the user, but can't judge or weigh them. You're going to have to devise that logic yourself.
I am trying to capture image url's from inside tweets.
REGISTER 'hdfs:///user/cloudera/elephant-bird-pig-4.1.jar';
REGISTER 'hdfs:///user/cloudera/elephant-bird-core-4.1.jar';
REGISTER 'hdfs:///user/cloudera/elephant-bird-hadoop-compat-4.1.jar';
--Load Json
loadJson = LOAD '/user/cloudera/tweetwall' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map []);
B = FOREACH loadJson GENERATE flatten(json#'tweets') as (m:map[]);
tweetText = FOREACH B GENERATE FLATTEN(m#'text') as (str:chararray);
intermediate date looks like this:
(#somenameontwitter your nan makes me laugh with some of the things she comes out with like http://somepics.com/my.jpg)
then I try to do the following to get only the image url back :
x = foreach tweetText generate REGEX_EXTRACT_ALL(str, '((http)(.*)(.jpg|.bmp|.png))');
dump x;
but that doesn't seem to work. I have also been trying with filter to no avail.
Even when trying the above with .* it returns empty results () or (())
I'm not good with regex and pretty new to Pig so it could be that I'm missing something simple here that I'm just not seeing.
example input data
{"tweets":[{"created_at":"Sat Nov 01 23:15:45 +0000 2014","id":5286804225,"id_str":"5286864225","text":"#Beace_ your nan makes me laugh with some of the things she comes out with blabla http://t.co/b7hjMWNg is an url, but not a valid one http://www.something.com/this.jpg should be a valid url","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":52812992878592,"in_reply_to_status_id_str":"522","in_reply_to_user_id":398098,"in_reply_to_user_id_str":"3","in_reply_to_screen_name":"Be_","user":{"id":425,"id_str":"42433395","name":"SAINS","screen_name":"sa3","location":"Lincoln","profile_location":null,"description":"","url":null,"entities":{"description":{"urls":[]}},"protected":false,"followers_count":92,"friends_count":526,"listed_count":0,"created_at":"Mon May 25 16:18:05 +0000 2009","favourites_count":6,"utc_offset":0,"time_zone":"London","geo_enabled":true,"verified":false,"statuses_count":19,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"EDECE9","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme3\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme3\/bg.gif","profile_background_tile":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/52016\/DGDCj67z_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/526\/DGDCj67z_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/424395\/13743515","profile_link_color":"088253","profile_sidebar_border_color":"D3D2CF","profile_sidebar_fill_color":"E3E2DE","profile_text_color":"634047","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":false,"follow_request_sent":false,"notifications":false},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":1,"entities":{"hashtags":[],"symbols":[],"user_mentions":[{"screen_name":"e_","name":"\u2601\ufe0f effy","id":3998,"id_str":"398","indices":[0,15]}],"urls":[]},"favorited":false,"retweeted":false,"lang":"en"}]}
Try this and let me know if this works
x = foreach tweetText generate REGEX_EXTRACT(str,'.*(http://.*.[jpg|bmp|png])',1);
I managed to get it working (though I doubt it is totally optimal)
x = foreach tweetText generate REGEX_EXTRACT(str,'(http://.*(.jpg|.bmp|.png))',1) as image;
filtered = FILTER x BY $0 is not null;
dump filtered;
so the initial problem was just the regex (and my lack of knowledge on the subject).
Thanks for the assistance sivasakthi jayaraman!
I am new in swift, I have been working with it only few weeks and now I am trying to parse something like a price list from incoming string. It has the next format:
2.99 X 3.00 = 10 A
Some text here
1.22 X 1.5 10 A
And the hardest part is that sometime A or some digit is missing but X should be in the place.
I would like to find out how it is possible to use regex in swift (or something like that if it does not exist) to write a template for parsing the next value
d.dd X d.d SomeValueIfExists
I would very appreciate any useful information, topics to read or any other resources to get more knowledge about swift.
PS. I have access to the dev. forums but I've never used them before.
I did an example recentl, and maybe a little harder than necessary, to demonstrate RegEx use in Swift:
let str1: NSString = "I run 12 miles"
let str2 = "I run 12 miles"
let match = str1.rangeOfString("\\d+", options: .RegularExpressionSearch)
let finalStr = str1.substringWithRange(match).toInt()
let n: Double = 2.2*Double(finalStr!)
let newStr = str2.stringByReplacingOccurrencesOfString("\\d+", withString: "\(n)", options: NSStringCompareOptions.RegularExpressionSearch, range: nil)
println(newStr) //I run 26.4 miles
Two of these have "RegularExpressionSearch". If you put this in a playground you can see what each line does. Note the double \ escapes. One for the normal RegEx use and anther because \ is a special character in Swift.
Also a good article:
I am converting a CoreText based app to Swift and I am facing an issue when getting the matches to a regular expression in the text.
This is the sample code
let regexOptions = NSRegularExpressionOptions.CaseInsensitive | NSRegularExpressionOptions.DotMatchesLineSeparators
let regex = NSRegularExpression.regularExpressionWithPattern("(.*?)(<[^>]+>|\\Z)", options: regexOptions, error: nil)
var results: Array<NSTextCheckingResult> = regex.matchesInString(text, options: 0, range: NSMakeRange(0, countElements(text)))
According to the documentation, the matchesInString function returns an array of NSTextCheckingResults, but the compiler complains stating that "The Expression of type anyObject[] can´t be converted to "NSMatchingOptions". Any idea of what might be wrong here?
Try assigning to your results variable like this:
var results = regex.matchesInString(text, options: nil, range: NSMakeRange(0, countElements(text))) as Array<NSTextCheckingResult>
the return type is Array<AnyObject>[]!, you can cast here (as in the above example) or later when you check the members of the collection
in Swift options take nil to represent an empty option set (vs. 0 in Objective-C)
I just sat with a problem related to some regexes and thought I would add a warning to the answer submitted above. My regexes matches seemed to be cut short and it turned out that the range i supplied was incorrect. I generated the range in the way described by #fqdn. It turned out that my strings contained carriage returns (\u{A}) and that these were not counted by the countElements function.
I countered this by calling .unicodeScalars on the string which seems to correctify the lenght.
println(countElements("\u{A}\u{A}\u{A}\n\u{D}\n\u{D}\n\u{D}\n\u{D}\n")) //8
println(countElements("\u{A}\u{A}\u{A}\n\u{D}\n\u{D}\n\u{D}\n\u{D}\n".unicodeScalars)) //12
Disclaimer: This is quite probably a swift-bug and might get fixed in a later version.