Can we append data to a Vector in FlatBuffer? - c++

I am working on this project where we store the log details for each run.
We are planning to use Flatbuffers for the same
This is my flatbuffer schema
table logData
{
Id:int;
attemptId:int;
line:string;
}
table log
{
maxLimit: int;
counter: int;
job:[logData]; //Vector of Tables
}
Now for the first run we just add data to the table using the helper functions provided by the Auto-Generated files
logBuilder build(builder);
builder.add_maxLimit(10);
auto data = builder.CreateVector(some_vector)
builder.add_job(data);
Now for the second run we have new data so, Is there any way to append more data to the vector job, whilst keeping the old data intact?

Related

Solid: correct way to store a JSON object

I want to store a standard JSON object in the user’s Solid pod. After working through the Solid getting started tutorial, I found I could get/set the object in the VCARD.note parameter, but I suspect this is not the right way to do it.
Any advice on how to do this properly? The JSON object will be updated regularly and will typically have ~10-100 key pairs.
There are two options here.
Option 1 - store as RDF (recommended)
Generally, the recommended thing to do is not to store data as a standard JSON object, but rather to save the data as RDF. So for example, if you have a JSON object like
const user = {
name: "Vincent",
};
assuming you're using the JavaScript library #inrupt/solid-client, you'd create what it calls a "Thing" like this:
import { createThing, addStringNoLocale } from "#inrupt/solid-client";
import { foaf } from "rdf-namespaces";
let userThing = createThing();
userThing = addStringNoLocale(userThing, foaf.fn, "Vincent");
You can read more about this approach at https://docs.inrupt.com/developer-tools/javascript/client-libraries/tutorial/read-write-data/
Option 2 - store a JSON blob directly
The other option is indeed to store a JSON file directly in the Pod. This works, although it goes a bit against the spirit of Solid, and requires you to overwrite the complete file every time, rather than allowing you to just update individual properties whenever you update the data. You could do that as follows:
import { overwriteFile } from "#inrupt/solid-client";
const user = {
name: "Vincent",
};
// This is assuming you're working in the browser;
// in Node, you'll have to create a Buffer instead of a Blob.
overwriteFile(
"https://my.pod/location-of-the-file.json",
new Blob([
JSON.stringify(user),
]),
{ type: "application/json" },
).then(() => console.log("Saved the JSON file.}));
You can read more about this approach here: https://docs.inrupt.com/developer-tools/javascript/client-libraries/tutorial/read-write-files/

Google cloud dataflow - batch insert in bigquery

I was able to create a dataflow pipeline which reads data from pub/sub and after processing it writes to big query in streaming mode.
Now instead of stream mode i would like to run my pipeline in batch mode to reduce the costs.
Currently my pipeline is doing streaming inserts in bigquery with dynamic destinations. I would like to know if there is a way to perform a batch insert operation with dynamic destinations.
Below is the
public class StarterPipeline {
public interface StarterPipelineOption extends PipelineOptions {
/**
* Set this required option to specify where to read the input.
*/
#Description("Path of the file to read from")
#Default.String(Constants.pubsub_event_pipeline_url)
String getInputFile();
void setInputFile(String value);
}
#SuppressWarnings("serial")
public static void main(String[] args) throws SocketTimeoutException {
StarterPipelineOption options = PipelineOptionsFactory.fromArgs(args).withValidation()
.as(StarterPipelineOption.class);
Pipeline p = Pipeline.create(options);
PCollection<String> datastream = p.apply("Read Events From Pubsub",
PubsubIO.readStrings().fromSubscription(Constants.pubsub_event_pipeline_url));
PCollection<String> windowed_items = datastream.apply(Window.<String>into(new GlobalWindows())
.triggering(Repeatedly.forever(
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(300))))
.withAllowedLateness(Duration.standardDays(10)).discardingFiredPanes());
// Write into Big Query
windowed_items.apply("Read and make event table row", new
ReadEventJson_bigquery())
.apply("Write_events_to_BQ",
BigQueryIO.writeTableRows().to(new DynamicDestinations<TableRow, String>() {
public String getDestination(ValueInSingleWindow<TableRow> element) {
String destination = EventSchemaBuilder
.fetch_destination_based_on_event(element.getValue().get("event").toString());
return destination;
}
#Override
public TableDestination getTable(String table) {
String destination =
EventSchemaBuilder.fetch_table_name_based_on_event(table);
return new TableDestination(destination, destination);
}
#Override
public TableSchema getSchema(String table) {
TableSchema table_schema =
EventSchemaBuilder.fetch_table_schema_based_on_event(table);
return table_schema;
}
}).withCreateDisposition(CreateDisposition.CREATE_NEVER)
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors()));
p.run().waitUntilFinish();
log.info("Events Pipeline Job Stopped");
}
}
Batch or Streaming are determined by the PCollection, so you would need to transform your data stream PCollection from Pub/Sub into a batch PCollection to write to BigQuery. The transform that allows to do this is GroupIntoBatches<K,InputT>.
Note that since this Transform uses Key-Value pairs, batches will contain only elements of a single key. For non-KV elements, check this related answer.
Once you have created your PCollection as batch using this transform, then apply the BigQuery write with Dynamic Destinations as you did with the stream PCollection.
You can limit the costs by using file loads for Streaming jobs. The Insertion Method section states that BigQueryIO.Write supports two methods of inserting data into BigQuery specified using BigQueryIO.Write.withMethod (org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.Method). If no method is supplied, then a default method will be chosen based on the input PCollection. See BigQueryIO.Write.Method for more information about the methods.
The different insertion methods provide different tradeoffs of cost, quota, and data consistency. Please see BigQuery documentation for more information about these tradeoffs.

How (and when) do I use iCloud's encodeSystemFields method on CKRecord?

encodeSystemFields is supposed to be used when I keep records locally, in a database.
Once I export that data, must I do anything special when de-serializing it?
What scenarios should I act upon information in that data?
As a variation (and if not covered in the previous question), what does this information help me guard against? (data corruption I assume)
encodeSystemFields is useful to avoid having to fetch a CKRecord from CloudKit again to update it (barring record conflicts).
The idea is:
When you are storing the data for a record retrieved from CloudKit (for example, retrieved via CKFetchRecordZoneChangesOperation to sync record changes to a local store):
1.) Archive the CKRecord to NSData:
let record = ...
// archive CKRecord to NSData
let archivedData = NSMutableData()
let archiver = NSKeyedArchiver(forWritingWithMutableData: archivedData)
archiver.requiresSecureCoding = true
record.encodeSystemFieldsWithCoder(with: archiver)
archiver.finishEncoding()
2.) Store the archivedData locally (for example, in your database) associated with your local record.
When you want to save changes made to your local record back to CloudKit:
1.) Unarchive the CKRecord from the NSData you stored:
let archivedData = ... // TODO: retrieved from your local store
// unarchive CKRecord from NSData
let unarchiver = NSKeyedUnarchiver(forReadingWithData: archivedData)
unarchiver.requiresSecureCoding = true
let record = CKRecord(coder: unarchiver)
2.) Use that unarchived record as the base for your changes. (i.e. set the changed values on it)
record["City"] = "newCity"
3.) Save the record(s) to CloudKit, via CKModifyRecordsOperation.
Why?
From Apple:
Storing Records Locally
If you store records in a local database, use the encodeSystemFields(with:) method to encode and store the record’s metadata. The metadata contains the record ID and change tag which is needed later to sync records in a local database with those stored by CloudKit.
When you save changes to a CKRecord in CloudKit, you need to save the changes to the server's record.
You can't just create a new CKRecord with the same recordID, set the values on it, and save it. If you do, you'll receive a "Server Record Changed" error - which, in this case, is because the existing server record contains metadata that your local record (created from scratch) is missing.
So you have two options to solve this:
Request the CKRecord from CloudKit (using the recordID), make changes to that CKRecord, then save it back to CloudKit.
Use encodeSystemFields, and store the metadata locally, unarchiving it to create a "base" CKRecord that has all the appropriate metadata for saving changes to said CKRecord back to CloudKit.
#2 saves you network round-trips*.
*Assuming another device hasn't modified the record in the meantime - which is also what this data helps you guard against. If another device modifies the record between the time you last retrieved it and the time you try to save it, CloudKit will (by default) reject your record save attempt with "Server Record Changed". This is your clue to perform conflict resolution in the way that is appropriate for your app and data model. (Often, by fetching the new server record from CloudKit and re-applying appropriate value changes to that CKRecord before attempting the save again.)
NOTE: Any time you save/retrieve an updated CKRecord to/from CloudKit, you must remember to update your locally-stored archived CKRecord.
As of iOS 15 / Swift 5.5 this extension might be helpful:
public extension CKRecord {
var systemFieldsData: Data {
let archiver = NSKeyedArchiver(requiringSecureCoding: true)
encodeSystemFields(with: archiver)
archiver.finishEncoding()
return archiver.encodedData
}
convenience init?(systemFieldsData: Data) {
guard let una = try? NSKeyedUnarchiver(forReadingFrom: systemFieldsData) else {
return nil
}
self.init(coder: una)
}
}

New migration without erasing old data

We're using FluentMigrator in one project. Let's say I got code like this one below.
So every time when we run new migration all the previous data deleting. Is there are way to avoid it and keep safe the data in places which are not changing?
public class Migration1 : Migration
{
public override void Up() {
Create.Table("Project")
.WithColumn("id").AsInt64().PrimaryKey().Identity()
.WithColumn("name").AsString(30).Nullable()
.WithColumn("author").AsString(30).Nullable()
.WithColumn("date").AsDate().Nullable()
.WithColumn("description").AsString(1000).Nullable();
Create.Table("Data")
.WithColumn("id").AsInt64().PrimaryKey().Identity()
.WithColumn("project_id").AsInt64().ForeignKey("Project", "id")
.WithColumn("a").AsInt32().Nullable()
.WithColumn("b").AsInt32().Nullable()
.WithColumn("c").AsInt32().Nullable()
.WithColumn("d").AsInt32().Nullable();
}
public override void Down() {
Delete.Table("data");
Delete.Table("project");
}
}
As part of you Down method you could create some backup tables which are identical to the table you are deleting but are post fixed with a timestamp. Eg:
Project_201407091059
Data_201407091059
You could then copy all the data from the tables being deleted to these tables.

How to manually set a primary key in Doctrine2

I am importing data into a new Symfony2 project using Doctrine2 ORM.
All new records should have an auto-generated primary key. However, for my import, I would like to preserve the existing primary keys.
I am using this as my Entity configuration:
type: entity
id:
id:
type: integer
generator: { strategy: AUTO }
I have also created a setter for the id field in my entity class.
However, when I persist and flush this entity to the database, the key I manually set is not preserved.
What is the best workaround or solution for this?
The following answer is not mine but OP's, which was posted in the question. I've moved it into this community wiki answer.
I stored a reference to the Connection object and used that to manually insert rows and update relations. This avoids the persister and identity generators altogether. It is also possible to use the Connection to wrap all of this work in a transaction.
Once you have executed the insert statements, you may then update the relations.
This is a good solution because it avoids any potential problems you may experience when swapping out your configuration on a live server.
In your init function:
// Get the Connection
$this->connection = $this->getContainer()->get('doctrine')->getEntityManager()->getConnection();
In your main body:
// Loop over my array of old data adding records
$this->connection->beginTransaction();
foreach(array_slice($records, 1) as $record)
{
$this->addRecord($records[0], $record);
}
try
{
$this->connection->commit();
}
catch(Exception $e)
{
$output->writeln($e->getMessage());
$this->connection->rollBack();
exit(1);
}
Create this function:
// Add a record to the database using Connection
protected function addRecord($columns, $oldRecord)
{
// Insert data into Record table
$record = array();
foreach($columns as $key => $column)
{
$record[$column] = $oldRecord[$key];
}
$record['id'] = $record['rkey'];
// Insert the data
$this->connection->insert('Record', $record);
}
You've likely already considered this, but my approach would be to set the generator strategy to 'none' for the import so you can manually import the existing id's in your client code. Then once the import is complete, change the generator strategy back to 'auto' to let the RDBMS take over from there. A conditional can determine whether the id setter is invoked. Good luck - let us know what you end up deciding to use.