Should the Protobufs from two different repositories be aligned - c++

Problem Description
We have two codes on different repositories. One is in Java and the Other is in C++. We share a common protobuf. The problem is that on our side which is the C++ side we have less members that the one on the JAVA side. As you can see, on our work is assigned id 4, whereas on Java side it is assigned id 5. Both members have the same name which is work.
Question
If the protobufs are not aligned what problems can we have? is it ok for the protobufs not be aligned?
message CPPContext {
optional string date = 1;
optional string time = 2;
optional string hour = 3;
optional string work = 4;
}
message JAVAContext {
optional string date = 1;
optional string time = 2;
optional string hour = 3;
optional string currency = 4;
optional string work = 5;
}

Protobuf serialize and deserialize messages based on field numbers not field names.
For example if CPPContext message gets deserialized on the other side as JAVAContext then your work field will be treated as currency field on the other side.
It is better to use same proto files on both communicating sides. Or at least (backward-)compatible proto files. For example it is fine to add new optional fields with new field ids in proto files on one side first (they will be ignored on the other side) but it is not fine to change id of a field or remove required field.

Related

How to normalize fields delimited by colon thats into a single column in informatica cloud

I need help to normalize the field "DSC_HASH" inside a single column delimeted by colon.
Input:
Outuput:
I achieved what I needed with java transformation:
1) In java transformation I created 4 output columns: COD1_out, COD2_out, COD3_out and DSC_HASH_out
2) Then I put the following code:
String [] column_split;
String column_delimiter = ";";
String [] column_data;
String data_delimiter = ":" ;
Column_split = DSC_HASH.split(column_delimiter);
COD1_out = COD1;
COD2_out = COD2;
COD3_out = COD3;
for (int I =0; i < column_split.length; i++){
column_data = column_split[i].split(data_delimiter);
DSC_HASH_out = column_data[0];
generateRow();
}
There are no generic parsers or loop construct in Informatica that can take one record and output an arbitrary number of records.
There are some ways you can bypass this limitation:
Using the Java Transformation, as you did, which is probably the easiest... if you know Java :) There may be limitations to performance or multi-threading.
Using a Router or a Normalizer with a fixed number of output records, high enough to cover all your cases, then filter out empty records. The expressions to extract fields are a bit complex to write (an maintain).
Using the XML Parser, but you have to convert your data to XML before, and design an XML schema. For example your first line would be changed in (on multiple lines for readability):
<e><n>2320</n><h>-1950312402</h></e>
<e><n>410</n><h>103682488</h></e>
<e><n>4301</n><h>933882987</h></e>
<e><n>110</n><h>-2069728628</h></e>
Using SQL Transformation or Stored Procedure Transformation to use database standard or custom functions, but that would result in an SQL query for each input row, which is bad performance-wise
Using a Custom Transformation. Does anyone want to write C++ for that ?
The Java Transformation is clearly a good solution for this situation.

Can protocol buffer be partial updated?

I am new to protobuf and here is my question: Can protocol buffer support partial update?
For example, I have such messages:
package model.test;
message Person{
required int32 id = 1;
required string name = 2;
repeated PhoneNumber phone = 3;
}
enum PhoneType{
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber{
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
Now the data I have like that:
model::test::Person person;
person.set_id(1);
person.set_name("Jack");
model::test::PhoneNumber* _phone3 = person.add_phone();
_phone3->set_number("123567");
_phone3->set_type(model::test::MOBILE);
model::test::PhoneNumber* _phone4 = person.add_phone();
_phone4->set_number("347890");
_phone4->set_type(model::test::WORK);
The case is that when only work phone number is changed, I have to update the whole person object with the following codes.
fstream out("User.txt", ios::out | ios::binary | ios::trunc);
person.SerializePartialToOstream(&out);
But it is not efficient to do that. I want to only update the PhoneNumber, Is there any partial update in protolbuf or something like that?
Protocol buffers are actually designed such that concatenation is the same as merge, and that the last field wins when merging (except for repeated, which are added). In your case, you should actually be able to serialize a blob containing just the phone-number set, and append this data, and it will over-ride the earlier value. This, however, only works well for the root object. Which yours: isn't. And it doesn't work for repeated, which yours: is.
I don't think there is any support for what you want to do. If you think about it, it doesn't really make sense that some sort of partial update serialization would exist in the first place. For protobuf to be able to manipulate an object that is serialized in a file on disk, it needs to read and deserialize the whole object so it knows what fields have been previously populated. Then when serializing and writing the updated object back to disk, you're going to have to overwrite the old file no matter what you do (i.e. you can't shove extra bytes into a file on the file system without overwriting the original file completely).

Best way to compare phone numbers using Regex

I have two databases that store phone numbers. The first one stores them with a country code in the format 15555555555 (a US number), and the other can store them in many different formats (ex. (555) 555-5555, 5555555555, 555-555-5555, 555-5555, etc.). When a phone number unsubscribes in one database, I need to unsubscribe all references to it in the other database.
What is the best way to find all instances of phone numbers in the second database that match the number in the first database? I'm using the entity framework. My code right now looks like this:
using (FusionEntities db = new FusionEntities())
{
var communications = db.Communications.Where(x => x.ValueType == 105);
foreach (var com in communications)
{
string sRegexCompare = Regex.Replace(com.Value, "[^0-9]", "");
if (sMobileNumber.Contains(sRegexCompare) && sRegexCompare.Length > 6)
{
var contact = db.Contacts.Where(x => x.ContactID == com.ContactID).FirstOrDefault();
contact.SMSOptOutDate = DateTime.Now;
}
}
}
Right now, my comparison checks to see if the first database contains at least 7 digits from the second database after all non-numeric characters are removed.
Ideally, I want to be able to apply the regex formatting to the point in the code where I get the data from the database. Initially I tried this, but I can't use replace in a LINQ query:
var communications = db.Communications.Where(x => x.ValueType == 105 && sMobileNumber.Contains(Regex.Replace(x.Value, "[^0-9]", "")));
Comparing phone numbers is a bit beyond the capability of regex by design. As you've discovered there are many ways to represent a phone number with and without things like area codes and formatting. Regex is for pattern matching so as you've found using the regex to strip out all formatting and then comparing strings is doable but putting logic into regex which is not what it's for.
I would suggest the first and biggest thing to do is sort out the representation of phone numbers. Since you have database access you might want to look at creating a new field or table to represent a phone number object. Then put your comparison logic in the model.
Yes it's more work but it keeps the code more understandable going forward and helps cleanup crap data.

<Binary> in sql

I want to select all the binary data from a column of a SQL database (SQL Server Enterprise) using C++ query. I'm not sure what is in the binary data, and all it says is .
I tried this (it's been passed onto me to study off from) and I honestly don't 100% understand the code at some parts, as I commented):
SqlConnection^ cn = gcnew SqlConnection();
SqlCommand^ cmd;
SqlDataAdapter^ da;
DataTable^ dt;
cn->ConnectionString = "Server = localhost; Database=portable; User ID = glitch; Pwd = 1234";
cn->Open();
cmd=gcnew SqlCommand("SELECT BinaryColumn FROM RawData", cn);
da = gcnew SqlDataAdapter(cmd);
dt = gcnew DataTable("BinaryTemp"); //I'm confused about this piece of code, is it supposed to create a new table in the database or a temp one in the code?
da->Fill(dt);
for(int i = 0; i < dt->Rows->Count-1; i++)
{
String^ value_string;
value_string=dt->Rows[i]->ToString();
Console::WriteLine(value_string);
}
cn->Close();
Console::ReadLine();
but it only returns a lot of "System.Data.DataRow".
Can someone help me?
(I need to put it into a matrix form after I extract the binary data, so if anyone could provide help for that part as well, it'd be highly appreciated!)
dt->Rows[i] is indeed a DataRow ^. To extract a specific field from it, use its indexer:
array<char> ^blob=dt->Rows[i][0];
This extracts the first column (since you have only one) and returns an array representation of it.
To answer the question in your code, the way SqlDataAdapter works is like this:
you build a DataTable to hold the data to retrieve. You can fill in its columns, but you're not required to. Neither are you required to give it a name.
you build the adapter object, giving it a query and a connection object
you call the Fill method on the adapter, giving it the previously created DataTable to fill with whatever your query returns.
and you're done with the adapter. At this point you can dispose of it (for example inside a using statement if you're using C#).

Validating a Salesforce Id

Is there a way to validate a Salesforce ID, maybe using RegEx? They are normally 15 chars or 18 chars but do they follow a pattern that we can use to check that it's a valid id.
There are two levels of validating salesforce id:
check format using regular expression [a-zA-Z0-9]{15}|[a-zA-Z0-9]{18}
for 18-characted ids you can check the the 3-character checksum:
Code examples provided in comments:
C#
Go
Javascript
Ruby
Something like this should work:
[a-zA-Z0-9]{15,18}
It was suggested that this may be more correct because it prevents Ids with lengths of 16 and 17 characters to be rejected, also we try to match against 18 char length first with 15 length as a fallback:
[a-zA-Z0-9]{18}|[a-zA-Z0-9]{15}
Just use instanceOf to check if the string is an instance of Id.
String s = '1234';
if (s instanceOf Id) System.debug('valid id');
else System.debug('invalid id');
The easiest way I've come across, is to create a new ID variable and assign a String to it.
ID MyTestID = null;
try {
MyTestID = MyTestString; }
catch(Exception ex) { }
If MyTestID is null after trying to assign it, the ID was invalid.
This regex has given me the optimal results so far.
\b[a-z0-9]\w{4}0\w{12}|[a-z0-9]\w{4}0\w{9}\b
You can also check for 15 chars, and then add an extra 3 chars optional, with an expression similar to:
^[a-z0-9]{15}(?:[a-z0-9]{3})?$
on i mode, or not:
^[A-Za-z0-9]{15}(?:[A-Za-z0-9]{3})?$
Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Javascript: /^(?=.*?\d)(?=.*?[a-z])[a-z\d]{18}$/i
These were the Salesforce Id validation requirements for me.
18 characters only
At least one digit
At least one alphabet
Case insensitive
Test cases
Should fail
1
a
1234
abgcde
1234aDcde
12345678901234567*
123456789012345678
abcDefghijabcdefgh
Should pass
1234567890abcDeFgh
1234abcd1234abcd12
abcd1234abcd1234ab
1abcDefhijabcdefgf
abcDefghijabcdefg1
12345678901234567a
a12345678901234567
For understanding the regex, please refer this thread
The regex provided by Daniel Sokolowski works perfectly to verify if the id is in the correct format.
If you want to verify if an id corresponds to an actual record in the database, you'll need to first find the object type from the first three characters (commonly known as prefix) and then query the object type:
boolean isValidAndExists(String key) {
Map<String, Schema.SObjectType> objTypes = Schema.getGlobalDescribe();
for (Schema.SObjectType objType : objTypes.values()) {
Schema.DescribeSObjectResult objDesc = objType.getDescribe();
if (objDesc.getKeyPrefix() == key.substring(0,3)) {
String objName = objDesc.getName();
String query = 'SELECT Id FROM ' + objName + ' WHERE Id = \'' + key + '\'';
SObject[] objs = Database.query(query);
return !objs.isEmpty();
}
}
return false;
}
Be aware that Schema.getGlobalDescribe can be an expensive operation and degrade the performance of your application if you use that often.
If you need to check that often, I recommend creating a Custom Setting or Custom Metadata to store the relation between prefixes and object types.
Assuming you want to validate Ids in Apex, there are a few approaches discussed in the other answers. Here is an alternative, with notes on the various approaches.
The try-catch method (credit to #matt_k) certainly works, but some folks worry about overhead, especially if testing many Ids.
I used instanceof Id for a long time (credit to #melani_s), until I discovered that it sometimes gives the wrong answer (e.g., '481D0B74-41CF-47E9').
Multiple answers suggest regexen. As the accepted answer correctly points out (credit to #zacheusz), 18 character Ids are only valid if their checksums are correct, which means the regex solutions can be wrong. That answer also helpfully provides code in several languages to test Id checksums. But not in Apex.
I was going to implement the checksum code in Apex, but then I realized the Salesforce had already done the work, so instead I just convert 18 digit Ids to 15 digit Ids (via .to15() which uses the checksum to fix capitalization, as opposed to truncating the string) and then back to 18 digits to let SF do the checksum calc, then I compare the original checksum and the new one. This is my method:
static Pattern ID_REGEX = Pattern.compile('[a-zA-Z0-9]{15}(?:[A-Z0-5]{3})?');
/**
* #description Determines if a string is a valid SalesforceId. Confirms checksum of 18 digit Ids.
* Works for cases where `x instanceof id` returns the wrong answer, like '481D0B74-41CF-47E9'.
* Does NOT check for the existence of a record with the given Id.
* #param s a string to validate
*
* #return true if the string `s` is a valid Salesforce Id.
*/
public static Boolean isValidId(String s) {
Matcher m = ID_REGEX.matcher(s);
if (m.matches() == false) return false; // if it doesn't match the regex it cannot be valid
if (s.length() == 15) return true; // if 15 char string matches the regex, assume it must be valid
String check = (Id)((Id)s).to15(); // Convert to 15 char Id, then to Id and back to string, giving correct 18-char Id
return s.right(3) == check.right(3); // if 18 char string matches the regex, valid if checksum correct
}
Additionally checking getSObjectType() != null would be perfect if we are dealing with Salesforce records
public static boolean isRecordId(string recordId){
try{
return string.isNotBlank(recordId) && ((Id)recordId.trim()).getSObjectType() != null;
}catch(Exception ex){
return false;
}
}