Is it possible to make two actions after a then in Ocaml ?
I try to search and I found that I could use a semicolon.
Should I use it like this ? :
let test (a:int)=
if a = 0
then print_int(1);print_int(2)
else()
;;
It's just an example. In my case I want to launch a function and give a tuple like that :
let move_square(x,y:int*int):int*int=
..
let direction : int = Random.int(5);
if direction = 0
then draw_square(x,y+1);x,y+1
else ..
Thanks for helping me
You can refer to §Séquence of https://caml.inria.fr/pub/old_caml_site/FAQ/qrg-fra.html.
Generally you have to group ocaml statement in an if-then-else structure,
either by using explicitly beginand end keywords, or by using parenthesis to group your sequence.
I am trying to capture image url's from inside tweets.
REGISTER 'hdfs:///user/cloudera/elephant-bird-pig-4.1.jar';
REGISTER 'hdfs:///user/cloudera/elephant-bird-core-4.1.jar';
REGISTER 'hdfs:///user/cloudera/elephant-bird-hadoop-compat-4.1.jar';
--Load Json
loadJson = LOAD '/user/cloudera/tweetwall' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map []);
B = FOREACH loadJson GENERATE flatten(json#'tweets') as (m:map[]);
tweetText = FOREACH B GENERATE FLATTEN(m#'text') as (str:chararray);
intermediate date looks like this:
(#somenameontwitter your nan makes me laugh with some of the things she comes out with like http://somepics.com/my.jpg)
then I try to do the following to get only the image url back :
x = foreach tweetText generate REGEX_EXTRACT_ALL(str, '((http)(.*)(.jpg|.bmp|.png))');
dump x;
but that doesn't seem to work. I have also been trying with filter to no avail.
Even when trying the above with .* it returns empty results () or (())
I'm not good with regex and pretty new to Pig so it could be that I'm missing something simple here that I'm just not seeing.
update
example input data
{"tweets":[{"created_at":"Sat Nov 01 23:15:45 +0000 2014","id":5286804225,"id_str":"5286864225","text":"#Beace_ your nan makes me laugh with some of the things she comes out with blabla http://t.co/b7hjMWNg is an url, but not a valid one http://www.something.com/this.jpg should be a valid url","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":52812992878592,"in_reply_to_status_id_str":"522","in_reply_to_user_id":398098,"in_reply_to_user_id_str":"3","in_reply_to_screen_name":"Be_","user":{"id":425,"id_str":"42433395","name":"SAINS","screen_name":"sa3","location":"Lincoln","profile_location":null,"description":"","url":null,"entities":{"description":{"urls":[]}},"protected":false,"followers_count":92,"friends_count":526,"listed_count":0,"created_at":"Mon May 25 16:18:05 +0000 2009","favourites_count":6,"utc_offset":0,"time_zone":"London","geo_enabled":true,"verified":false,"statuses_count":19,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"EDECE9","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme3\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme3\/bg.gif","profile_background_tile":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/52016\/DGDCj67z_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/526\/DGDCj67z_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/424395\/13743515","profile_link_color":"088253","profile_sidebar_border_color":"D3D2CF","profile_sidebar_fill_color":"E3E2DE","profile_text_color":"634047","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":false,"follow_request_sent":false,"notifications":false},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":1,"entities":{"hashtags":[],"symbols":[],"user_mentions":[{"screen_name":"e_","name":"\u2601\ufe0f effy","id":3998,"id_str":"398","indices":[0,15]}],"urls":[]},"favorited":false,"retweeted":false,"lang":"en"}]}
Try this and let me know if this works
x = foreach tweetText generate REGEX_EXTRACT(str,'.*(http://.*.[jpg|bmp|png])',1);
DUMP x;
I managed to get it working (though I doubt it is totally optimal)
x = foreach tweetText generate REGEX_EXTRACT(str,'(http://.*(.jpg|.bmp|.png))',1) as image;
filtered = FILTER x BY $0 is not null;
dump filtered;
so the initial problem was just the regex (and my lack of knowledge on the subject).
Thanks for the assistance sivasakthi jayaraman!
s = {"densityThreshold": 2.4543288981124E+14}
I was thinking something like this
re.search(".[A-Za-z]*.:\s\d\.\d+..\d+", k) or if re.search(".[A-Za-z]*.:\s\d\.\w+.\d+", k):
but neither seem to work..
I need to group "densityThreshold" and "2.4543288981124E+14" to create another dictionary.. I would usually use group() but i m stuck at search!
x='s = {"densityThreshold": 2.4543288981124E+14}'
k=re.search(".[A-Za-z]*.:\s\d\.\d+..\d+", x)
print k.group()
You can do this if you want the whole thing in one group.Or if you want separately use
x='s = {"densityThreshold": 2.4543288981124E+14
k=re.search("(.[A-Za-z]*.):(\s\d\.\d+..\d+)", x)
print k.groups()
Ive been following the tutorial on how to use mallet in R to create topic models. My text file has 1 sentence per line. It looks like this and has about 50 sentences.
Thank you again and have a good day :).
This is an apple.
This is awesome!
LOL!
i need 2.
.
.
.
This is my code:
Sys.setenv(NOAWT=TRUE)
#setup the workspace
# Set working directory
dir<-"/Users/jxn"
Dir <- "~/Desktop/Chat/malletR/text" # adjust to suit
require(mallet)
documents1 <- mallet.read.dir(Dir)
View(documents1)
stoplist1<-mallet.read.dir("~/Desktop/Chat/malletR/stoplists")
View(stoplist1)
**mallet.instances <- mallet.import(documents1$id, documents1$text, "~/Desktop/Chat/malletR/stoplists/en.txt", token.regexp ="\\p{L}[\\p{L}\\p{P}]+\\p{L}")**
Everything works except for the last line of the code
**`**mallet.instances <- mallet.import(documents1$id, documents1$text, "~/Desktop/Chat/malletR/stoplists/en.txt", token.regexp ="\\p{L}[\\p{L}\\p{P}]+\\p{L}")**`**
I keep getting this error :
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.lang.NoSuchMethodException: No suitable method for the given parameters
According to the package, this is how the function should be:
mallet.instances <- mallet.import(documents$id, documents$text, "en.txt",
token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")
I believe it has something to do with the token.regexp argument as
documents1 <- mallet.read.dir(Dir) works just fine which means that the first 3 arguments supplied to mallet.instances was correct.
This is a link to the git repo that i was following the tutorial from.
https://github.com/shawngraham/R/blob/master/topicmodel.R
Any help would be much appreciated.
Thanks,
J
I suspect the problem is with your text file. I have encountered the same error and resolved it by using the as.character() function as follows:
mallet.instances <- mallet.import(as.character(documents$id),
as.character(documents$text),
"en.txt",
FALSE,
token.regexp="\\p{L}[\\p{L}\\p{P}]+\\p{L}")
Are you sure you converted the id field also to character ? It is easy to overlook the advice and leave it as an integer.
Also there is a typo in the code sample: the backslashes have to be escaped:
token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}"
This usually occurs because the html text editor eats up one backslash.
Hi so I'm trying to use find and replace in notepad++ with regular expression to do the following:
I have two set of lines
first set:
[c][eu][e]I37ANKCB[/e]
[c][eu][e]OIL8ZEPW[/e]
[c][eu][e]4OOEL75O[/e]
[c][eu][e]PPNW5FN4[/e]
[c][eu][e]E2BXCWUO[/e]
[c][eu][e]SD9UQNT8[/e]
[c][eu][e]E6BK6IGO[/e]
second set:
[u]7ubju2jvioks[u2]_261
[u]89j408tah1lz[u2]_262
[u]j673xnd49tq0[u2]_263
[u]dv73osmh1wzu[u2]_264
[u]twz3u4yiaeqr[u2]_265
[u]cuhtg6r71kud[u2]_266
[u]yts0ktvt9a3r[u2]_267
now I want to the second set to by places after each of the first set like this:
[c][eu][e]I37ANKCB[/e][u]7ubju2jvioks[u2]_261
[c][eu][e]OIL8ZEPW[/e][u]89j408tah1lz[u2]_262
[c][eu][e]4OOEL75O[/e][u]j673xnd49tq0[u2]_263
[c][eu][e]PPNW5FN4[/e][u]dv73osmh1wzu[u2]_264
[c][eu][e]E2BXCWUO[/e][u]twz3u4yiaeqr[u2]_265
[c][eu][e]SD9UQNT8[/e][u]cuhtg6r71kud[u2]_266
[c][eu][e]E6BK6IGO[/e][u]yts0ktvt9a3r[u2]_267
any suggestions?
You can mark the second block in column mode using ALT and the left mouse button. Then just copy paste it at the end of the first row.
No need/Not possible using regex.
I would solve this via a simple script written in Python or Ruby or something equally quick. This works, for example:
import os
path = os.path.dirname(__file__)
with open(os.path.join(path, 'file1')) as file1:
with open(os.path.join(path, 'file2')) as file2:
lines = zip(file1.readlines(), file2.readlines())
print ''.join([a.rstrip() + b for a, b in lines])
Running it gives the correct result:
> python join.py
[c][eu][e]I37ANKCB[/e][u]7ubju2jvioks[u2]_261
[c][eu][e]OIL8ZEPW[/e][u]89j408tah1lz[u2]_262
[c][eu][e]4OOEL75O[/e][u]j673xnd49tq0[u2]_263
[c][eu][e]PPNW5FN4[/e][u]dv73osmh1wzu[u2]_264
[c][eu][e]E2BXCWUO[/e][u]twz3u4yiaeqr[u2]_265
[c][eu][e]SD9UQNT8[/e][u]cuhtg6r71kud[u2]_266
[c][eu][e]E6BK6IGO[/e][u]yts0ktvt9a3r[u2]_267
Customize to suit your needs.