Riak simple SearchMapReduce throws IOException - mapreduce

I am trying to fetch Raik objects using simple filters.
I have enabled search on the bucket before storing objects to it, and I try the following:
MapReduceResult result = riakClient.
mapReduce("serviceProvider", "name:oved1").
addMapPhase(new NamedJSFunction("Riak.mapValuesJson"), true).execute();
I get this exception:
com.basho.riak.client.RiakException: java.io.IOException: {"error":"map_reduce_error"}
at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:80)
at com.att.cso.omss.datastore.riak.controllers.RiakBaseController.getAllServiceProvider(RiakBaseController.java:339)
at com.att.cso.omss.datastore.riak.App.serviceProviderTests(App.java:64)
at com.att.cso.omss.datastore.riak.App.main(App.java:38)
Caused by: java.io.IOException: {"error":"map_reduce_error"}
at com.basho.riak.client.raw.http.ConversionUtil.convert(ConversionUtil.java:588)
at com.basho.riak.client.raw.http.HTTPClientAdapter.mapReduce(HTTPClientAdapter.java:386)
at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:78)
... 3 more
any idea what am I missing?

Was able to fix this issue...
apparently you need to do 2 things prior to storing objects that need to be searched in the future:
Enabled search in app.config (/etc/riak):
{riak_search, [{enabled, true}]}
Enable search on the bucket:
Bucket bucket = riakClient.createBucket(bucketName).enableForSearch().execute();
After doing that, this returns values:
MapReduceResult result = riakClient.
mapReduce(bucketName, "name:9").
addMapPhase(new NamedJSFunction("Riak.mapValuesJson"), true).
execute();

Related

Creation of azure_native frontdoor fails with "Frontdoor location must be global."

migrating my frontdoor from the to the azure-native package I am facing a strange error message that I cannot make sense of:
azure-native:network:FrontDoor (frontDoor):
error: Code="BadRequest" Message="Frontdoor location must be global."
I took almost 1 to 1 the example at https://www.pulumi.com/registry/packages/azure-native/api-docs/network/frontdoor/ I only changed subId and rg
For the record, I am migration to azure-native package because 1) it is advised and 2) I want to add waf policy and I was not able to do with the azure.network package.
Does that ring a bell?
Actually, the location must be set specifically to global. Something like
location: "global",
I did not know of this location and it is not one of the values in the location enumeration.

IllegalArgumentException, Wrong FS when specifying input/output from s3 instead of hdfs

I have been running my Spark job on a local cluster which has hdfs from where the input is read and the output is written too. Now I have set up an AWS EMR and an S3 bucket where I have my input and I want my output to be written to S3 too.
The error:
User class threw exception: java.lang.IllegalArgumentException: Wrong
FS: s3://something/input, expected:
hdfs://ip-some-numbers.eu-west-1.compute.internal:8020
I tried searching for the same issue and there are several questions regarding this issue. Some suggested that it's only for the output, but even when I disable output I get the same error.
Another suggestion is that there is something wrong with FileSystem in my code. Here are all of the occurances of input/output in my program:
The first occurance is in my custom FileInputFormat, in getSplits(JobContext job) which I have not actually modified myself but I can:
FileSystem fs = path.getFileSystem(job.getConfiguration());
Similar case in my custom RecordReader, also have not modified myself:
final FileSystem fs = file.getFileSystem(job);
In nextKeyValue() of my custom RecordReader which I have written myself I use:
FileSystem fs = FileSystem.get(jc);
And finally when I want to detect the number of files in a folder I use:
val fs = FileSystem.get(sc.hadoopConfiguration)
val status = fs.listStatus(new Path(path))
I assume the issue is with my code, but how can I modify the FileSystem calls to support input/output from S3?
This is what I have done to solve this when launching a spark-job on EMR :
val hdfs = FileSystem.get(new java.net.URI(s"s3a://${s3_bucket}"), sparkSession.sparkContext.hadoopConfiguration)
Make sure to replace s3_bucket by the name of your bucket
I hope it's going to be helpful for someone
The hadoop filesystem apis do not provide support for S3 out of the box. There are two implementations of the hadoop filesystem apis for S3: S3A, and S3N. S3A seems to be the preferred implementation. To use it you have to do a few things:
Add the aws-java-sdk-bundle.jar to your classpath.
When you create the FileSystem include values for the following properties in the FileSystem's configuration:
fs.s3a.access.key
fs.s3a.secret.key
When specify paths on S3 don't use s3:// use s3a:// instead.
Note: create a simple user and try things out with basic authentication first. It is possible to get it to work with AWS's more advanced temporary credential mechanisms, but it's a bit involved and I had to make some changes to the FileSystem code in order to get it to work when I tried.
Source of info is here
EMR is configured to avoid the use of keys in the code or in your job configuration.
The problem there is how the FileSystem is created in your example.
The default FileSystem that Hadoop create is the one for the hdfs schema.
So next code will not work if that path schema is s3://.
val fs = FileSystem.get(sc.hadoopConfiguration)
val status = fs.listStatus(new Path(path))
To create the right FileSystem, you need to use the path with the schema that you will use. For example, something like this:
val conf = sc.hadoopConfiguration
val pObj = new Path(path)
val status = pObj.getFileSystem(conf).listStatus(pObj)
From the Hadoop code:
Implementation in the FileSystem.get
public static FileSystem get(Configuration conf) throws IOException {
return get(getDefaultUri(conf), conf);
}
Implementation using Path.getFileSystem:
public FileSystem getFileSystem(Configuration conf) throws IOException {
return FileSystem.get(this.toUri(), conf);
}
Try setting the default URI for the FileSystem:
FileSystem.setDefaultUri(spark.sparkContext.hadoopConfiguration, new URI(s"s3a://$s3bucket"))
After specifying the key and secret using
fs.s3a.access.key
fs.s3a.secret.key
And getting file system as noted:
val hdfs = FileSystem.get(new java.net.URI(s"s3a://${s3_bucket}"), sparkSession.sparkContext.hadoopConfiguration)
I would still get the error
java.lang.IllegalArgumentException: Wrong FS: s3a:// ... , expected: file:///
To check the default filesystem, you can look at the above created hdfs FileSystem:
hadoopfs.getUri which for me still returned file:///
In order to get this to work correctly, prior to running FileSystem.get, set the default URI of the filesystem.
val s3URI = s"s3a://$s3bucket"
FileSystem.setDefaultUri(spark.sparkContext.hadoopConfiguration, new URI(s3URI))
val hdfs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration)

Timeout error when listing S3 buckets using erlcloud

I'm trying to use the erlcloud library for S3 uploads in my app. As a test, I'm trying to get it to list buckets via an iex console:
iex(4)> s3 = :erlcloud_s3.new("KEY_ID", "SECRET_KEY")
...
iex(5)> :erlcloud_s3.list_buckets(s3)
** (ErlangError) erlang error: {:aws_error, {:socket_error, :timeout}}
(erlcloud) src/erlcloud_s3.erl:909: :erlcloud_s3.s3_request/8
(erlcloud) src/erlcloud_s3.erl:893: :erlcloud_s3.s3_xml_request/8
(erlcloud) src/erlcloud_s3.erl:238: :erlcloud_s3.list_buckets/1
I've checked that inets, ssl, and erlcoud are all started, and I know the credentials work fine, because I've tested them in a similar fashion with a Ruby library in irb.
I've tried configuring it with a longer timeout, but no matter how high I set it I still get this error.
Any ideas? Or approaches I could take to debug this?
I could simulate the same error, and could resolve it by replacing double-quote with single-quote.
> iex(4)> s3 = :erlcloud_s3.new('KEY_ID', 'SECRET_KEY')
> iex(5)> :erlcloud_s3.list_buckets(s3)
Assuming the double-quote was used, it may be caused by a type mismatch between string and char-list.

Failing to fetch CategorizedFacebookType

I have an application which I developed about a year ago and I'm
fetching facebook accounts like this:
facebookClient = new DefaultFacebookClient(access_token);
Connection<CategorizedFacebookType> con = facebookClient.fetchConnection("me/accounts", CategorizedFacebookType.class);
fbAccounts = con.getData();
It worked fine until about a month ago, but now it returns the
fbAccounts list empty. Why is that?
I was hoping moving from restfb-1.6.2.jar to restfb-1.6.9.jar would
help but no luck, it comes up empty on both.
What am I missing?
EDIT, to provide the code for another error I have with this API. The following code used to work:
String id = page.getFbPageID(); // (a valid facebook page id)
FBInsightsDaily daily = new FBInsightsDaily(); // an object holding some insights values
try {
Parameter param = Parameter.with("asdf", "asdf"); // seems like the param is required
JsonObject allValues = facebookClient.executeMultiquery(createQueries(date, id), JsonObject.class, param);
daily.setPageActiveUsersDaily((Integer)(((JsonArray)allValues.opt("page_active_users_daily")).getJsonObject(0)).opt("value"));
...
This throws the following exception:
com.restfb.json.JsonException: JsonArray[0] not found.
at com.restfb.json.JsonArray.get(JsonArray.java:252)
at com.restfb.json.JsonArray.getJsonObject(JsonArray.java:341)
Again, this used to work fine but now throws this.
You need the manage_pages permission from the user to access their list of adminned pages - a year ago I'm not sure you did - check that you're obtaining that permission from your users
{edit}
Some of the insights metrics were also deprecated, the specific values you're checking may no longer exist - https://developers.facebook.com/docs/reference/fql/insights/ should have the details of what is available now
Try to check your queries manually in the Graph API Explorer to eliminate any issues in your code and hopefully get more detailed error messages that your SDK may be swallowing

Firing up a cluster using whirr

I'm new to whirr and AWS so apologies in advance if I'm asking something silly.
I'm following the directions here to set up whirr and
bin/whirr launch-cluster --config hadoop.properties
fails with the following:
[~/src/cloudera/whirr-0.1.0+23]$ bin/whirr version rvm:ruby-1.8.7-p299
Apache Whirr 0.1.0+23
[~/src/cloudera/whirr-0.1.0+23]$ bin/whirr launch-cluster --config hadoop.properties rvm:ruby-1.8.7-p299
Launching myhadoopcluster cluster
Exception in thread "main" com.google.inject.CreationException: Guice creation errors:
1) No implementation for java.lang.String annotated with #com.google.inject.name.Named(value=jclouds.credential) was bound.
while locating java.lang.String annotated with #com.google.inject.name.Named(value=jclouds.credential)
for parameter 2 at org.jclouds.aws.filters.FormSigner.<init>(FormSigner.java:91)
at org.jclouds.aws.config.AWSFormSigningRestClientModule.provideRequestSigner(AWSFormSigningRestClientModule.java:66)
1 error
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:410)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:166)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:118)
at com.google.inject.InjectorBuilder.build(InjectorBuilder.java:100)
at com.google.inject.Guice.createInjector(Guice.java:95)
at com.google.inject.Guice.createInjector(Guice.java:72)
at org.jclouds.rest.RestContextBuilder.buildInjector(RestContextBuilder.java:141)
at org.jclouds.compute.ComputeServiceContextBuilder.buildInjector(ComputeServiceContextBuilder.java:53)
at org.jclouds.aws.ec2.EC2ContextBuilder.buildInjector(EC2ContextBuilder.java:101)
at org.jclouds.compute.ComputeServiceContextBuilder.buildComputeServiceContext(ComputeServiceContextBuilder.java:66)
at org.jclouds.compute.ComputeServiceContextFactory.buildContextUnwrappingExceptions(ComputeServiceContextFactory.java:72)
at org.jclouds.compute.ComputeServiceContextFactory.createContext(ComputeServiceContextFactory.java:114)
at org.apache.whirr.service.ComputeServiceContextBuilder.build(ComputeServiceContextBuilder.java:41)
at org.apache.whirr.service.hadoop.HadoopService.launchCluster(HadoopService.java:84)
at org.apache.whirr.service.hadoop.HadoopService.launchCluster(HadoopService.java:61)
at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:61)
at org.apache.whirr.cli.Main.run(Main.java:65)
at org.apache.whirr.cli.Main.main(Main.java:91)
My hadoop.properties file has an AWS Access Key and Secret Access Key.
Any pointers on what I might have done wrong and what I need to do to fix this?
Thanks!
Okay so this appears to be a problem with the syntax in my hadoop.properties file. In the process of copying my keys across from the AWS management console, "Whirr.credential" got truncated to "Whirr.cred."
A classic face palm moment!
Anyway, leaving this up so that anyone googling for this error message knows to go triple check their hadoop.properties file!