Update and Delete Map/Reduce in HBase - mapreduce

I have a table that contains about have billion records. I want to change the key of these records i.e fetch a records change its key somehow, delete what was fetched save the new records ! Let us say for example my key is [time-accountId] and I want to change it to [account-time]
I want to fetch entity create new with different key, delete the entity with [time-account] and save the new entity with [accout-time]
What is the best way to accomplish this task ?
I am thinking of M/R but how can I delete entities with M/R ?

You need a mapreduce which will produce a Put and a Delete for each row of your table. Only a mapper is needed here since you don't need aggregation on your data, so skip the reducer:
TableMapReduceUtil.initTableReducerJob(
table, // output table
null, // reducer class
job);
Your mapper has to generate both Put and Delete, so the output value class to used is the Mutation (https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/Mutation.html):
TableMapReduceUtil.initTableMapperJob(
table, // input table
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
ImmutableBytesWritable.class, // mapper output key
Mutation.class, // mapper output value
job);
Then your mapper will look like this:
Delete delete = ...
context.write(oldKey, delete);
Put put = ...
context.write(newKey, put);

Related

Spanner setAllowPartialRead(true) usage and purpose

from offical code snippet example of Spanner Java Client :
https://github.com/GoogleCloudPlatform/java-docs-samples/blob/HEAD/spanner/spring-data/src/main/java/com/example/spanner/SpannerTemplateSample.java
I can see the usage of
new SpannerQueryOptions().setAllowPartialRead(true)):
#Component
public class SpannerTemplateSample {
#Autowired
SpannerTemplate spannerTemplate;
public void runTemplateExample(Singer singer) {
// Delete all of the rows in the Singer table.
this.spannerTemplate.delete(Singer.class, KeySet.all());
// Insert a singer into the Singers table.
this.spannerTemplate.insert(singer);
// Read all of the singers in the Singers table.
List<Singer> allSingers = this.spannerTemplate
.query(Singer.class, Statement.of("SELECT * FROM Singers"),
new SpannerQueryOptions().setAllowPartialRead(true));
}
}
I didn't find any explanation on it. Anyone can help?
Quoting from the documentation:
Partial read is only possible when using Queries. In case the rows returned by query have fewer columns than the entity that it will be mapped to, Spring Data will map the returned columns and leave the rest as they of the columns are.

delete the record based on ID in one table if all the records of same Id's deleted in another table

pls help me with logic.
I have two tables of customers and transactions and there is column action I, U, D. If column action is I or U upsert the data if it is D delete the data in transactions tables.If all records of same transaction id are deleted then delete customers record else delete the transactions record
We can do insert,upsert,delete using Update strategy in transaction table but how can we delete the customer record if the same transaction IDs deleted
You need to create a logic ( like you said ) to delete from customer table. And its safer to either create a new pipeline in same mapping or a brand new mapping.
So, you will read customer_key from customer, do a lookup into transaction table(condition on customer_key), if you see no row found, delete that customer.
Read all customer_key from customer table.
Lookup on transaction table on customer_key. return customer_key.
Use update strategy, link customer_key from SQ #1 and customer_key from lookup. create a condition like this
IIF ( lkp_customer_key is null, DD_DELETE)
Link customer_key from SQ #1 to the customer target.
You can do this using left join too in source qualifier as well.
most of database servers on delete, it cascade the update on respective tables

What does DynamoDB scan return if item with Exclusive Start Key does not exist in the table?

I'm trying to implement pagination for my API. I have a DynamoDB table with a simple primary key.
Since the ExclusiveStartKey in a DynamoDB scan() operation is nothing but the primary key of the last item fetched in the scan operation before, I was wondering what would DynamoDB return if I perform a scan() with an ExclusiveStartKey that does not exist in the table?
# Here response contains the same list of items for the same
# primary key passed to the scan operation
response = table.scan(ExclusiveStartKey=NonExistentPrimaryKey)
I expected DynamoDB to return no items (correct me if this assumption of mine is what's wrong), i.e the scanning should resume from the ExclusiveStartKey, if it exists in the table. If not, it should return no items.
But what I do see happening is, the scan() still returns items. When I give the same non-existent primary key, it keeps returning me a list starting from the same item.
Does DynamoDB simply apply the hash function on the ExclusiveStartKey and from the result of this hash decide from which partition it has to start returning items or something?
# My theory as to what DynamoDB does in a paginated scan operation
partitionId = dynamodbHashFunction(NonExistentPrimaryKey)
return fetchItemsFromPartition(partitionId)
My end goal is that when an invalid ExclusiveStartKey is provided by the user (i.e a non-existent primary key), I want to return nothing or even better, return a message that the ExclusiveStartKey is invalid.
Looks like you want to return items based on a value. If that value does not exist, then you want to have an empty result set. This is possible with the
Java V2 DynamoDbTable object's scan method:
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/enhanced/dynamodb/DynamoDbTable.html
For this solution, one way is to scan an AmazonDB table and return a result set based on the value of specific column (including the key). You can use an Expression object. This lets you set the value that you want to return in a result set.
For example, here is Java logic that returns all items where a date column is 2013-11-15. If there are no items that meet this condition, then no items are returned. There is no need for a pre-check, etc. You need to setup the ScanEnhancedRequest properly.
public static void scanIndex(DynamoDbClient ddb, String tableName, String indexName) {
System.out.println("\n***********************************************************\n");
System.out.print("Select items for "+tableName +" where createDate is 2013-11-15!");
try {
// Create a DynamoDbEnhancedClient and use the DynamoDbClient object.
DynamoDbEnhancedClient enhancedClient = DynamoDbEnhancedClient.builder()
.dynamoDbClient(ddb)
.build();
// Create a DynamoDbTable object based on Issues.
DynamoDbTable<Issues> table = enhancedClient.table("Issues", TableSchema.fromBean(Issues.class));
// Setup the scan based on the index.
if (indexName == "CreateDateIndex") {
System.out.println("Issues filed on 2013-11-15");
AttributeValue attVal = AttributeValue.builder()
.s("2013-11-15")
.build();
// Get only items in the Issues table for 2013-11-15.
Map<String, AttributeValue> myMap = new HashMap<>();
myMap.put(":val1", attVal);
Map<String, String> myExMap = new HashMap<>();
myExMap.put("#createDate", "createDate");
Expression expression = Expression.builder()
.expressionValues(myMap)
.expressionNames(myExMap)
.expression("#createDate = :val1")
.build();
ScanEnhancedRequest enhancedRequest = ScanEnhancedRequest.builder()
.filterExpression(expression)
.limit(15)
.build();
// Get items in the Issues table.
Iterator<Issues> results = table.scan(enhancedRequest).items().iterator();
while (results.hasNext()) {
Issues issue = results.next();
System.out.println("The record description is " + issue.getDescription());
System.out.println("The record title is " + issue.getTitle());
}
}
} catch (DynamoDbException e) {
System.err.println(e.getMessage());
System.exit(1);
}
}

Transform a Sales Order into a Invoice with SuiteTalk

In NetSuite's SuiteTalk how do I transform a record from a Sales Order to an Invoice? It looks like there is a function in SuiteScript, but I can't find anything similar in SuiteTalk.
SuiteScript:
nlapiTransformRecord(type, id, transformType, transformValues)
Initializes a new record using data from an existing record of a
different type and returns an nlobjRecord. This function can be useful
for automated order processing such as creating item fulfillment
transactions and invoices off of orders.
SuiteTalk has an analogous initialize method. With the Java library you'd use it like:
ReadResponse initCS = nsClient.getPort().initialize(new InitializeRecord(InitializeType.cashSale, new InitializeRef(null, InitializeRefType.salesOrder, soId, null), null));
CashSale cs = (CashSale)initCS.getRecord();

SyncFramework: How to sync all columns from a table?

I create a program to sync tables between 2 databases.
I use this common code:
DbSyncScopeDescription myScope = new DbSyncScopeDescription("myscope");
DbSyncTableDescription tblDesc = SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", onPremiseConn);
myScope.Tables.Add(tblDesc);
My program creates the tracking table only with Primary Key (id column).
The sync is ok to delete and insert rows.
But updating don't. I need update all the columns and they are not updated (For example: a telephone column).
I read that I need to add the columns I want to sync MANUALLY with this code:
Collection<string> includeColumns = new Collection<string>();
includeColumns.Add("telephone");
...
includeColumns.Add(Last column);
And changing the table descripcion in this way:
DbSyncTableDescription tblDesc = SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", includeColumns, onPremiseConn);
Is there a way to add all the columns of the table automatically?
Something like:
Collection<string> includeColumns = GetAllColums("Table");
Thanks,
SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", onPremiseConn) will include all the columns of the table already.
the tracking tables only stores the PK and filter columns and some Sync Fx specific columns.
the tracking is at row level, not column level.
during sync, the tracking table and its base table are joined to get the row to be synched.