Taskid in MapReduce - mapreduce

I am newbie to MapReduce and Java programming. I am trying to get taskid of each map() function. Basically I need to use taskid of each mapper as offset for fetching some data from a common file.
Please help me getting taskid of individual map() task.
Thanks,
Vanamala

Maybe its too late to answer, but you can get the task attempt id and then task id when you set up your map using the Context object. The getTaskAttemptID() does just that for you.

Related

How to query for Polkadot transaction info using only txHash?

I cannot find any description on how to get the transaction info using just the txHash returned by signAndSend() in the API documentation. I think it's a basic function which is really weird that it's not there.
As I can see the only way to track the status of a transaction is using the callback functions of signAndSend() which only viable if the transaction was created from my side. However in block explorers like polkadot.subscan.io or polkascan.io, I can easily find the transaction using just the txHash. Any idea just briefly on how can I implement such a function?
Please consider using a solution such as Substrate Archive to help you index transactions on a Substrate-based chain.

How can I post a Watson Machine Learning scoring request with a sparse matrix as a parameter

Because of the current limitation regarding the publish of scikit-learn models on Watson ML service, which does not allow any custom transformer etc (https://datascience.ibm.com/docs/content/analyze-data/ml-scikit-learn.html) in the pipeline, I ended up deploying a pipeline that only contains the SVC classifier, and not the TfidfVectorizer as well.
Which means, I need to "transform" my raw test data with the TfidfVectorizer before invoking the model on Watson ML.
This is working fine as long as I don't try the online deployment approach (which I need, since I want an app to POST a request to my model).
How should I serialise my sparse matrix from the TfidfVectorizer.transform and pass it as a json payload to the WML service ?
Thanks !
So actually, I am answering my question ;-)
If you get into that situation where you have to send a sparse matrix to WML, then you can use
<yourmatrix>.todense().tolist()
So, to put it back in context of my initial issue, I can send the result of the transform as such:
valuesList = tfidf_vectorizer.transform(test).todense().tolist()
payload_scoring = {"values": [[valuesList]]}
response_scoring = requests.post(scoringUrl, json=payload_scoring, headers=header)

Tensorboard logging non-tensor (numpy) information (AUC)

I would like to record in tensorboard some per-run information calculated by some python-blackbox function.
Specifically, I'm envisioning using sklearn.metrics.auc after having run sess.run().
If "auc" was actually a tensor node, life would be simple. However, the setup is more like:
stuff=sess.run()
auc=auc(stuff)
If there is a more tensorflow-onic way of doing this I am interested in that. My current setup involves creating separate train&test graphs.
If there is a way to complete the task as stated above, I am interested in that as well.
You can make a custom summary with your own data using this code:
tf.Summary(value=[tf.Summary.Value(tag="auc", simple_value=auc)]))
Then you can add that summary to the summary writer yourself. (Don't forget to add a step).

Check if multiple user ids are friends with a particular person

I want to know which people in a list of people are friends with this user. Is there a graph api call that can return the subset of ids that is the user's friends? I've tried:
/me/friends/?ids=xxxxx,xxxx
I know I can use a batch call an do something like this:
/me/friends/xxxx
/me/friends/xxxxx
but it would be nice to do it in one call.
There wasn't an easy way to do this with the graph api, but I was able to do it with an FQL query:
query = '{
"are_friends":"SELECT+uid2+FROM+friend+WHERE+uid1=me()+and+uid2+in('+people_array.join()+')+limit+10",
"friend_meta":"SELECT+uid,first_name,last_name,name,pic_square+FROM+user+where+uid+in(SELECT+uid2+FROM+%23are_friends)"}'
The friend_meta json object in the result will have all the meta info you are looking for. It's one call, and more efficient and cleaner than the batch calls.
Did you try the mutualfriends option?
me/mutualfriends/xxxxx
https://developers.facebook.com/docs/reference/api/user/

Is there a signal or anything similar to a "pre_select" in django?

I'm creating a system in django and it'd be really helpful to have a signal that is called every time a SQL "select" query is done on the database. In other words, does anyone know if there is something like a "pre_select" or "post_select" signal method?
I found the signal "connection_created" in the django docs, but couldn't find any clues of how to use it and less about accessing the model that called it. The official documentation just say that it exists but don't give a simple using example... =/
EDIT:
The connection_created just works when the connection is created (how its name says), so, I still without a solution =/.
An example of what I want would be the execution of this queries on distinct objects:
ExampleObject1.objects.filter(attribute=somevalue)
ExampleObject2.objects.filter(attribute=somevalue)
ExampleObject3.objects.filter(attribute=somevalue)
So a function is called receiving the data from each them just before each query being sent to the database in order to threat data, log, etc.
I imagine that exists some functionality like that in django because django log system appears to use something alike.
Any help is welcome. Thanks in advance!
From http://dabapps.com/blog/logging-sql-queries-django-13/
It's not in the form of signal, but it allows you to track all queries. Tracking specific selects should be doable by providing customized log handlers.
import logging
l = logging.getLogger('django.db.backends')
l.setLevel(logging.DEBUG)
l.addHandler(logging.StreamHandler())
#make your queries now...