I am trying to use aws sdk2 java for s3 select operations but not able to get extract the final data. Looking for an example if someone has implemented it. I got some idea from [this post][1] but not able to figure out how to get and read the full data .
Fetching specific fields from an S3 document
Basically, equivalent of v1 sdk:
``` InputStream resultInputStream = result.getPayload().getRecordsInputStream(
new SelectObjectContentEventVisitor() {
#Override
public void visit(SelectObjectContentEvent.StatsEvent event)
{
System.out.println(
"Received Stats, Bytes Scanned: " + event.getDetails().getBytesScanned()
+ " Bytes Processed: " + event.getDetails().getBytesProcessed());
}
/*
* An End Event informs that the request has finished successfully.
*/
#Override
public void visit(SelectObjectContentEvent.EndEvent event)
{
isResultComplete.set(true);
System.out.println("Received End Event. Result is complete.");
}
}
);```
///IN AWS SDK2, how do get ResultOutputStream ?
```public byte[] getQueryResults() {
logger.info("V2 query");
S3AsyncClient s3Client = null;
s3Client = S3AsyncClient.builder()
.region(Region.US_WEST_2)
.build();
String fileObjKeyName = "upload/" + filePath;
try{
logger.info("Filepath: " + fileObjKeyName);
ListObjectsV2Request listObjects = ListObjectsV2Request
.builder()
.bucket(Constants.bucketName)
.build();
......
InputSerialization inputSerialization = InputSerialization.builder().
json(JSONInput.builder().type(JSONType.LINES).build()).build()
OutputSerialization outputSerialization = null;
outputSerialization = OutputSerialization.builder().
json(JSONOutput.builder()
.build()
).build();
SelectObjectContentRequest selectObjectContentRequest = SelectObjectContentRequest.builder()
.bucket(Constants.bucketName)
.key(partFilename)
.expression(query)
.expressionType(ExpressionType.SQL)
.inputSerialization(inputSerialization)
.outputSerialization(outputSerialization)
.scanRange(ScanRange.builder().start(0L).end(Constants.limitBytes).build())
.build();
final DataHandler handler = new DataHandler();
CompletableFuture future = s3Client.selectObjectContent(selectObjectContentRequest, handler);
//hold it till we get a end event
EndEvent endEvent = (EndEvent) handler.receivedEvents.stream()
.filter(e -> e.sdkEventType() == SelectObjectContentEventStream.EventType.END)
.findFirst()
.orElse(null);```
//Now, from here how do I get the response bytes ?
///////---> ISSUE: How do I get ResultStream bytes ????
return <bytes>
}```
// handler
private static class DataHandler implements SelectObjectContentResponseHandler {
private SelectObjectContentResponse response;
private List receivedEvents = new ArrayList<>();
private Throwable exception;
#Override
public void responseReceived(SelectObjectContentResponse response) {
this.response = response;
}
#Override
public void onEventStream(SdkPublisher<SelectObjectContentEventStream> publisher) {
publisher.subscribe(receivedEvents::add);
}
#Override
public void exceptionOccurred(Throwable throwable) {
exception = throwable;
}
#Override
public void complete() {
}
} ```
[1]: https://stackoverflow.com/questions/67315601/fetching-specific-fields-from-an-s3-document
i came to your post since I was working on the same issue as to avoid V1.
After hours of searching i ended up with finding the answer at. https://github.com/aws/aws-sdk-java-v2/pull/2943/files
The answer is located at SelectObjectContentIntegrationTest.java File
services/s3/src/it/java/software/amazon/awssdk/services/SelectObjectContentIntegrationTest.java
The way to get the bytes is by using the RecordsEvent class, please note for my use case I used CSV, not sure if this would be different for a different file type.
in the complete method you have access to the receivedEvents. this is where you get the first index to get the filtered returned results and casting it to the RecordsEvent class. then this class provides the payload as bytes
#Override
public void complete() {
RecordsEvent records = (RecordsEvent) this.receivedEvents.get(0)
String result = records.payload().asUtf8String();
}
Related
Sorry for my stupid question and thank you in advance.
I need to replace the outputvalue in reduce stage(or map stage). However, it will case too many connection in zookeeper. I don't know how to deal with it.
This is my reduce method:
public static class HbaseToHDFSReducer extends Reducer<Text,Text,Text, Text> {
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
HashSet<String> address = new HashSet<>();
for(Text item :values){
String city = getDataByRowKey("A1",item.toString());
address.add(city);
}
context.write(key,new Text(String.valueOf(address).replace("\"", "")));
}
This is the query method:
public static String getDataByRowKey(String tableName, String rowKey) throws IOException {
Table table = ConnectionFactory.createConnection(conf).getTable(TableName.valueOf(tableName));
Get get = new Get(rowKey.getBytes());
String data = new String();
if (!get.isCheckExistenceOnly()) {
Result result = table.get(get);
for (Cell cell : result.rawCells()) {
String colName = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
String value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
if (colName.equals(rowKey)) {
data = value;
}
}
}
table.close();
return data;
}
What should I do to solve it?
Thank you again
You created one connection per query, and connection creation is a heavy-weight operation. Maybe you can get a Connection in your reduce method and change
getDataByRowKey(String tableName, String rowKey) to this
getDataByRowKey(Connection connection, String tableName, String rowKey).
I have a pre-request script that I gathered from another post on StackOverflow, but I'm still getting invalid credentials.
Attempted to do this just with str_1 but it's not working. Not sure what request.data is supposed to do as it keeps returning NaN. I think that the problem might be there, but still at a loss. I've attempted converting all variables to a string, but that still returned the same error.
URL = https://gateway.marvel.com/v1/public/characters?ts={{timeStamp}}&apikey={{apiKey}}&hash={{hash}}
// Access your env variables like this
var ts = new Date();
ts = ts.getUTCMilliseconds();
var str_1 = ts + environment.apiKey + environment.privateKey;
// Or get your request parameters
var str_2 = request.data["timeStamp"] + request.data["apiKey"];
console.log('str_2 = ' + str_2);
// Use the CryptoJS
var hash = CryptoJS.MD5(str_1).toString();
// Set the new environment variable
pm.environment.set('timeStamp', ts);
pm.environment.set('hash', hash);
{
"code": "InvalidCredentials",
"message": "That hash, timestamp and key combination is invalid."
}
If someone can comment on why this is the solution, I would appreciate it. Here is what the issue was. The order of the hash actually matters. So had to flip the order of pvtkey + pubkey to pubkey + pvtkey. Why is this?
INCORRECT
var message = ts+pubkey+pvtkey;
var a = CryptoJS.MD5(message);
pm.environment.set("hash", a.toString());
CORRECT
var message = ts+pvtkey+pubkey;
var a = CryptoJS.MD5(message);
pm.environment.set("hash", a.toString());
I created in Android Studio, a new java class named MD5Hash, following the steps of https://javarevisited.blogspot.com/2013/03/generate-md5-hash-in-java-string-byte-array-example-tutorial.html
I just simplified his (her) code, only to use it with Java utility MessageDigest
public class MD5Hash {
public static void main(String args[]) {
String publickey = "abcdef"; //your api key
String privatekey = "123456"; //your private key
Calendar calendar=Calendar.getInstance();
String stringToHash = calendar
.getTimeInMillis()+ privatekey + publickey;
System.out.println("hash : " + md5Java(stringToHash));
System.out.println("ts : "+ calendar.getTimeInMillis());
}
public static String md5Java(String message){
String digest = null;
try {
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] hash = md.digest(message.getBytes("UTF-8"));
//converting byte array to Hexadecimal String
StringBuilder sb = new StringBuilder(2*hash.length);
for(byte b : hash){
sb.append(String.format("%02x", b&0xff));
}
digest = sb.toString();
} catch (UnsupportedEncodingException ex) {
} catch (NoSuchAlgorithmException ex) {
}
return digest;
}
}
As you can see, if you copy paste this code, it has a green arrow on the left side of the class declaration, clicking it you can run MD5Hash.main() and you'll have printed in your Run Screen the values for the time (ts) and for the hash.
Then go to verify directly into the internet :
https://gateway.marvel.com/v1/public/characters?limit=20&ts=1574945782067&apikey=abcdef&hash=4bbb5dtf899th5132hjj66
This is the method that I am testing. This method gets some Bytes from a Hbase Database based on an specific id, in this case called dtmid. The reason I why I want to return some specific values is because I realized that there is no way to know if an id will always be in Hbase. Also, the column Family and column name could change.
#Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
try {
if (tuple.size() > 0) {
Long dtmid = tuple.getLong(0);
byte[] rowKey = HBaseRowKeyDistributor.getDistributedKey(dtmid);
Get get = new Get(rowKey);
get.addFamily("a".getBytes());
Result result = table.get(get);
byte[] bidUser = result.getValue("a".getBytes(),
"co_created_5076".getBytes());
collector.emit(new Values(dtmid, bidUser));
}
} catch (IOException e) {
e.printStackTrace();
}
}
On my main class when this method is called I want to return a specific value. The method should return some bytes.
byte[] bidUser = result.getValue("a".getBytes(),
"co_created_5076".getBytes());
This is what I have on my Unit Test.
#Test
public void testExecute() throws IOException {
long dtmId = 350000000770902930L;
final byte[] COL_FAMILY = "a".getBytes();
final byte[] COL_QUALIFIER = "co_created_5076".getBytes();
//setting a key value pair to put in result
List<KeyValue> kvs = new ArrayList<KeyValue>();
kvs.add(new KeyValue("--350000000770902930".getBytes(), COL_FAMILY, COL_QUALIFIER, Bytes.toBytes("ExpedtedBytes")));
// I create an Instance of result
Result result = new Result(kvs);
// A mock tuple with a single dtmid
Tuple tuple = mock(Tuple.class);
bolt.table = mock(HTable.class);
Result mcResult = mock(Result.class);
when(tuple.size()).thenReturn(1);
when(tuple.getLong(0)).thenReturn(dtmId);
when(bolt.table.get(any(Get.class))).thenReturn(result);
when(mcResult.getValue(any(byte[].class), any(byte[].class))).thenReturn(Bytes.toBytes("Bytes"));
BasicOutputCollector collector = mock(BasicOutputCollector.class);
// Execute the bolt.
bolt.execute(tuple, collector);
ArgumentCaptor<Values> valuesArg = ArgumentCaptor
.forClass(Values.class);
verify(collector).emit(valuesArg.capture());
Values d = valuesArg.getValue();
//casting this object in to a byteArray.
byte[] i = (byte[]) d.get(1);
assertEquals(dtmId, d.get(0));
}
I am using this down here to return my bytes.For some reason is not working.
when(mcResult.getValue(any(byte[].class), any(byte[].class))).thenReturn(Bytes
.toBytes("myBytes"));
For some reason when I capture the values, I still get the bytes that I specified here:
List<KeyValue> kvs = new ArrayList<KeyValue>();
kvs.add(new KeyValue("--350000000770902930".getBytes(),COL_FAMILY, COL_QUALIFIER, Bytes
.toBytes("ExpedtedBytes")));
Result result = new Result(kvs);
How about replacing
when(bolt.table.get(any(Get.class))).thenReturn(result);
with...
when(bolt.table.get(any(Get.class))).thenReturn(mcResult);
Searched quite a bit. The problem with these errors is that while the text might appear the same, the problem is always different.
My service takes ONE string value and returns a string response. Here is my code:
private class UploadStats extends AsyncTask<String, Void, String>
{
private static final String WSDL_TARGET_NAMESPACE = "http://tempuri.org/";
private static final String SOAP_ADDRESS = "http://192.168.1.101/rss/RSS_Service.asmx?WSDL";
private static final String INSERTURLACTION = "http://192.168.1.101/rss/RSS_Service.asmx/InsertURL";
private static final String INSERTURLMETHOD = "InsertURL";
#Override
protected String doInBackground(String... url)
{
String status = "";
SoapObject request = new SoapObject(WSDL_TARGET_NAMESPACE,
INSERTURLMETHOD);
request.addProperty("url", "www.yahoo.com");
SoapSerializationEnvelope envelope = new SoapSerializationEnvelope(SoapEnvelope.VER11);
envelope.dotNet = true;
envelope.setOutputSoapObject(request);
HttpTransportSE httpTransport = new HttpTransportSE(SOAP_ADDRESS);
try
{
httpTransport.call(INSERTURLACTION, envelope);
SoapObject response = (SoapObject) envelope.bodyIn;
status = response.getProperty(0).toString();
}
catch (Exception exception)
{
Log.d("Error", exception.toString());
}
if(status.equals("1"))
{
Log.d("InsertURL", "New URL Inserterd");
}
else if(status.equals("0"))
{
Log.d("InsertURL", "URL Exists. Count incremented");
}
else
{
Log.d("InsertURL", "Err... No");
}
return status;
}
}
I get the error:
java.lang.classcastexception org.ksoap2.SoapFault
What am I doing wrong? If any more details are needed, I can add them.
The error was related to the webservice.
An incorrect namespace on the service side can cause this error (as can a lot of other problems).
Best way to check is to run the webservice on the local machine (where the service is hosted).
We've created a custom webservice in Umbraco to add (async) files and upload them. After upload the service is called with node and file-information to add a new node to the content tree.
At first our main problem was that the service was running outside of the Umbraco context, giving strange errors with get_currentuser.
Now, we inherit the umbraco BaseWebService from the umbraco.webservices dll and we've set all acces information in the settings file; we authenticatie before doing anything else using (correct and ugly-hardcoded) administrator.
When we now execute the webservice (from the browser or anything else) we get:
at umbraco.DataLayer.SqlHelper`1.ExecuteReader(String commandText, IParameter[] parameters)
at umbraco.cms.businesslogic.CMSNode.setupNode()
at umbraco.cms.businesslogic.web.Document.setupNode()
at umbraco.cms.businesslogic.CMSNode..ctor(Int32 Id)
at umbraco.cms.businesslogic.Content..ctor(Int32 id)
at umbraco.cms.businesslogic.web.Document..ctor(Int32 id)
at FileUpload.AddDocument(String ProjectID, String NodeID, String FileName)*
Where AddDocument is our method. The node (filename w/o extension) does not exist in the tree (not anywhere, it's a new filename/node). We've cleared the recycle bin, so it's not in there either.
Are we missing something vital, does anyone has a solution?
Below is the source for the webservice;
using umbraco.cms.businesslogic.web;
using umbraco.BusinessLogic;
using umbraco.presentation.nodeFactory;
using umbraco.cms.businesslogic.member;
using umbraco.cms;
/// <summary>
/// Summary description for FileUpload
/// </summary>
[WebService(Namespace = "http://umbraco.org/webservices/")]
[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
public class FileUpload : umbraco.webservices.BaseWebService //System.Web.Services.WebService
{
private string GetMimeType(string fileName)
{
string mimeType = "application/unknown";
string ext = System.IO.Path.GetExtension(fileName).ToLower();
Microsoft.Win32.RegistryKey regKey = Microsoft.Win32.Registry.ClassesRoot.OpenSubKey(ext);
if (regKey != null && regKey.GetValue("Content Type") != null)
mimeType = regKey.GetValue("Content Type").ToString();
return mimeType;
}
[WebMethod]
public string HelloWorld() {
return "Hello World";
}
[WebMethod]
public void AddDocument(string ProjectID, string NodeID, string FileName)
{
Authenticate("***", "***");
string MimeType = GetMimeType(FileName); //"application/unknown";
// Create node
int nodeId = 1197;
string fileName = System.IO.Path.GetFileNameWithoutExtension(#"*****\Upload\" + FileName);
string secGroups = "";
//EDIT DUE TO COMMENT: Behavior remains the same though
Document node = umbraco.cms.businesslogic.web.Document.MakeNew(fileName.Replace(".", ""), new DocumentType(1049), umbraco.BusinessLogic.User.GetUser(0), nodeId);
secGroups = "Intern";
StreamWriter sw = null;
try
{
//EXCEPTION IS THROWN SOMEWHERE HERE
Document doc = NodeLevel.CreateNode(fileName, "Bestand", nodeId);
doc.getProperty("bestandsNaam").Value = fileName;
byte[] buffer = System.IO.File.ReadAllBytes(#"****\Upload\" + FileName);
int projectId = 0;
int tempid = nodeId;
//EXCEPTION IS THROWN TO THIS POINT (SEE BELOW)
try
{
Access.ProtectPage(false, doc.Id, 1103, 1103);
Access.AddMembershipRoleToDocument(doc.Id, secGroups);
}
catch (Exception ex)
{
// write to file
}
try
{
doc.Publish(umbraco.BusinessLogic.User.GetUser(0));
umbraco.library.UpdateDocumentCache(doc.Id);
umbraco.content.Instance.RefreshContentFromDatabaseAsync();
}
catch (Exception ex)
{
// write to file
}
System.IO.File.Delete(FileName);
}
catch (Exception ex)
{
// THIS EXCEPTION IS CAUGHT!!
}
}
public override umbraco.webservices.BaseWebService.Services Service
{
get { return umbraco.webservices.BaseWebService.Services.DocumentService; }
}
}
If anyone has a solution, pointer, hint or whatever; help is appreciated!!
TIA,
riffnl
We've rewritten the whole procedure (dumped all code and restart) and we've got it working now.
I think we've been messing around with the old code so much in trying to get it to work we were missing some key issues, because it functions.
Thanks for thinking along anyway!