Deidentification using FPE Primitive Transform and ignoring certain characters - google-cloud-platform

is there a way to have the de-identify DLP ignore certain characters? Currently, encrypting EMails with a custom alphabet that includes the "-" sign ends up encrypting as below. Ideally the encrypted text would all be of the format "XXX-XX-XXXX" I noticed there was a CharsToIgnore call that could be made but not sure where to put that... maybe metadata field in the API that's called (deidentify_content?) or at some other place.
Thanks!
325-7959452
31424943-6

Officially you should reconsider using FPE as it isn't considered the best option by security standards. Only use it if you are in legacy system that is going to be strict about formatting, otherwise deterministic crypto is much better for emails, another technique available in the API will give you better control.
But as for FPE, there is not the ability to skip characters when using FPE, that feature only exists for masking.

Related

How to explicitly specify MachineKey with FormsAuthentication.Decrypt()

I would like to decrypt a FormsAuthentication cookie but it might have been encrypted using different machine keys. I would like to be able to try decrypting successively with let's say 3 machine keys and check if one of them is working. It would be easy if FormsAuthentication.Decrypt() would accept not only the encrypted cookie but also the machine key to use but there is no way to do this (the machine key is always retrieved from the config file). Is there a way to achieve what I'm trying to do ?
There's no way to specify multiple keys in the <machineKey> element. However, if you have a crypto background, you can implement your own DataProtector which allows for key rotation. See http://blogs.msdn.com/b/webdev/archive/2012/10/23/cryptographic-improvements-in-asp-net-4-5-pt-2.aspx (section Introducing DataProtector) for more information.
Warning: writing your own DataProtector is an extremely advanced scenario and should only be attempted if you have a security background and are comfortable working with cryptographic primitives. It's very easy to introduce subtle bugs which could undermine your site security.

Should output be encoded at the API or client level?

We are moving our Web app architecture to being microservice based. We have an internal debate as to whether an REST API that provides content (in JSON, let's say) should be looking to encode content to make it safe, or whether the consumers that take that content and display it (in HTML, for example, or otherwise use it) should be responsible for that encoding. The use case is to prevent XSS attacks and similar.
The provider stance is "Well, we can't know how to encode it for everyone, or how you're going to use the content, so of course the consumers should encode the content."
The consumer stance is "There is one provider and multiple consumers, so it's more secure to do it once in the providing API than to hope that every consumer does it."
Are there any generally accepted best practices on this and why?
As a rule, data when passing through "internal" processes (whatever that might mean to use) should be stored or encoded in whatever "internal" format makes sense. The format chosen is typically designed to minimize encoding/decoding steps and to prevent data loss.
Then, upon output, data is encoded using whatever output format makes sense. Preventing data loss is important, but also proper escaping and formatting is key here.
So for example, with internal APIs, data in binary format may be sufficent. But when you output JSON or HTML or XML or PDF, you have to encode and escape your data appropriately to fit the output format.
The important point here is that different output formats have different concepts of "safe". What's "safe" for HTML may not be safe for JSON, and what's safe for JSON may not be safe for SQL. Data is encoded upon output specifically so that you can use the proper encoding for the task. You cannot assume that this step is done for you ahead of time, nor should you put your output function in the position to determine whether or not encoding must be done. If you stick with the rule: "output function ALWAYS encodes for safety", then you will never have to worry about data injection attacks.
I would say that the two important points are the following:
The encoding used by the provider MUST be specified with extreme clarity and precision in a reference document, so that all consumer implementors can know what to expect.
Whatever default encoding is used by the provider MUST keep all needed information, i.e. still be amenable to transcoding by any consumer who would wish to do it.
If you follow these two rules then you will have done 95% of the job for reliability and security.
As for your specific question, a good practice is a middle-ground: the provider follows by default a "generic" encoding, but consumers can ask (optionally) for a specific encoding which the provider may then apply -- this allows the provider to support a number of dumb, lightweight clients of possibly different kinds and can be extended later on with extra encodings without breaking the API.
I firmly believe it is both the consumer and the provider that need to do their part in being good citizens in the security space.
As the provider I want to make sure I deliver a secure product. I don't need to know the context in which my client is going to use my product, all I need to know is how I am going to deliver it. If my delivery is in JSON, then I can use that context to escape my data before sending it off, similarly for XML, plain text, etc. Further more there are transport methods that aid in security already. JSONP is one such delivery method. This ensures the payload is consumed appropriately.
As the consumer, which by the way in our environment no one is the final consumer, we are all providers to the final end client (the end users via a web browser mostly.). Because of this we have to also secure the data at this end. I would never trust a black box API to do this job for me, I would always make a point to ensure a secure payload. There are many tools out there, the ESAPI project from OWASP comes to mind, that will aid in the sanitization by context of data. Remember that you are eventually sending this data on to the end-user (browser) and if there is something awry you won't be able to pass the buck. Your service will be viewed as the vulnerable one regardless of where the flaw lies. Additionally, as the consumer, you may not always be able to rely on the black box provider to fix their flaws in a timely fashion. What if their support is lacking or they have higher priorities. Does that mean you continue to provide a known flaw to your end-users?
Security is about layers, and having safeguards at the source and end-points is always preferable.

the best approaches for logging localization using c++

I am working on a multinational project where target audience for logs might be from two nationalities. Therefore it is becoming important to log in more than one language , I am thinking about writing to 2 different log folders based on language every time I am logging something, but I am also wondering if there's some out of the box functionality that is coming along with logging frameworks like log4cpp?
As other commenters have mentioned, it sounds like you are going down the wrong track by looking to do multilingual logging.
My recommendation would be to use English (which is the standard for technical information, and which I guess is the language you know best) and to make sure that the language you use is clear, grammatically correct and unambiguous. Then if one of the technicians cannot understand it, they can very easily and efficiently run it through a machine translation engine such as Google Translate. Or indeed they could process the logs and run everything through Google Translate to append translated text, particularly if you annotate the logs to mark the language content.
Assuming that the input language is well-written, machine transation usually gives a good result which the end user can understand. If the message isn't clear, has typos or abbreviations, then that's where machine translation fails spectacularly.
Writing log naturally brings down the speed of execution due to file open, seek and write operations involved as part of it.
This is one primary reason why many developer and architects suggest to write log at different levels.Increasing the depth of log entries as level increases to trace down the problems better. At higher level, you will notice that your process speed drops due to more log entries getting generated.
Rather suggest you to use services that can translate from one language to other.
I'm sure there are libraries free or paid which does this translation. You can create a small utility program that runs in the background and does this conversion during process idle time.
Well one suggestion is you can use a different process/thread which listens for your log messages, which you can log it from there ..
This reduces I/O logging time in your main process/thread and you can make all changes related to Logging language over there..
For multi - Lingual support I think you can try writing with widechar string .. though I am not sure..
the best approaches for logging localization using c++
Install Qt 4 and use QObject::tr/ tr() macro for strings. Write strings in whatever language you want. Hire/Get a translator to localize strings using QT Linguist.
Please note that perfect translation is impossible, so there will be many "amusing" misunderstandings, even if your translator is a genius. So it might be a better idea to select main language for programming team.
--EDIT--
Didn't notice this part before:
in more than one language
One way to approach it is to implement log reader. Instead of writing plaintext messages, you could dump message ids (generated by some kind of macros) and string arguments if strings are formatted. "Log reader" will allow user to select desired language while viewing log file, and translate messages based on their ids/arguments using mechanism similar to QTranslator. The good thing about this approach is that you'll be able to add more languages later - so it'll be possible to retranslate old logs. The bad thing is that this format will be harder to read for "normal human", although you can add plaintext messages in addition to message ids and arguments and you'll need to write log viewer.
Qt 4 has most of this framework implemented (there are routines for dumping variants into text/data streams, and so on) along with translation tool. See QTranslator documentation and Linguist manual for more info.

How to safely store strings (i.e. password) in a C++ application?

I'm working on a wxWidgets GUI application that allows the user to upload files to an FTP server and a pair of username/password is required to access the FTP server.
As far as I know, STL strings or even char* strings are visible to end user even the program is compiled already, using hex editors or maybe string extractors like Sysinternals String Utility.
So, is there a safe/secure way to store sensitive informations inside a C++ application?
PS. I cannot use .NET for this application.
This is actually independent of the programming language used.
FTP is a protocol that transfers its password in plain text. No amount of obfuscation will change that, and an attacker can easily intercept the password as it is transmitted.
And no amount of obfuscation, no matter the protocol used, will change the fact that your application has to be able to decode that password. Any attacker with access to the application binary can reverse-engineer that decoding, yielding the password.
Once you start looking at secure protocols (like SFTP), you also get the infrastructure for secure authentication (e.g. public/private key) when looking at automated access.
Even then you are placing the responsibility of not making that key file accessable to anyone else on the file system, which - depending on the operating system and overall setup - might not be enough.
But since we're talking about an interactive application, the simplest way is to not make the authentication automatic at all, but to query the user for username and password. After all, he should know, shouldn't he?
Edit: Extending on the excellent comment by Kate Gregory, in case that users share a common "technical" (or anonymous) account accessing your server, files uploaded by your app should not be visible on the server before some kind of filtering was done by you. The common way to do this is having an "upload" directory where files can be uploaded to, but not be downloaded from. If you do not take these precautions, people will use your FTP server as turntable for all kind of illegal file sharing, and you will be the one held legally responsible for that.
I'm not sure if that is possible at all, and if, than not easy. If the password is embedded and your program can read it, everybody with enough knowledge should be able to do.
You can improve security against lowlevel attempts (like hexeditor etc.) by encrypting or obfuscating (eg two passwords which generate the real password by XOR at runtime and only at the moment you need it).
But this is no protection against serious attacks by experienced people, which might decompile you program or debug it (well, there are ways to detect that, but it's like cold-war - mutual arms race of debugging-techniques against runtime-detection of these).
edit: If one knows an good way with an acceptable amount of work to protect the data (in c++ and without gigantic and/or expensive frameworks), please correct me. I would be interested in that information.
While it's true that you cannot defend against someone who decompiles your code and figures out what you're doing, you can obscure the password a little bit so that it isn't in plain text inside the code. You don't need to do a true encryption, just anything where you know the secret. For example, reverse it, rot13 it, or interleave two literal strings such as "pswr" and "asod". Or use individual character variables (that are not initialized all together in the same place) and use numbers to set them (ascii) rather than having 'a' in your code.
In your place, I would feel that snooping the traffic to the FTP server is less work than decompiling your app and reading what the code does with the literal strings. You only need to defeat the person who opens the hex and sees the strings that are easily recognized as an ID and password. A littel obscuring will go a long way in that case.
As the others said, storing a password is never really save but if you insist you can use cryptlib for encryption and decryption.
Just a raw idea for you to consider.
Calculate the md5 or SHA-2 of your password and store it in the executable.
Then do the same for input username/password and compare with stored value.
Simple and straightforward.
http://en.wikipedia.org/wiki/MD5
http://en.wikipedia.org/wiki/SHA-2

Allowing code snippets in form input while preventing XSS and SQL injection attacks

How can one allow code snippets to be entered into an editor (as stackoverflow does) like FCKeditor or any other editor while preventing XSS, SQL injection, and related attacks.
Part of the problem here is that you want to allow certain kinds of HTML, right? Links for example. But you need to sanitize out just those HTML tags that might contain XSS attacks like script tags or for that matter even event handler attributes or an href or other attribute starting with "javascript:". And so a complete answer to your question needs to be something more sophisticated than "replace special characters" because that won't allow links.
Preventing SQL injection may be somewhat dependent upon your platform choice. My preferred web platform has a built-in syntax for parameterizing queries that will mostly prevent SQL-Injection (called cfqueryparam). If you're using PHP and MySQL there is a similar native mysql_escape() function. (I'm not sure the PHP function technically creates a parameterized query, but it's worked well for me in preventing sql-injection attempts thus far since I've seen a few that were safely stored in the db.)
On the XSS protection, I used to use regular expressions to sanitize input for this kind of reason, but have since moved away from that method because of the difficulty involved in both allowing things like links while also removing the dangerous code. What I've moved to as an alternative is XSLT. Again, how you execute an XSL transformation may vary dependent upon your platform. I wrote an article for the ColdFusion Developer's Journal a while ago about how to do this, which includes both a boilerplate XSL sheet you can use and shows how to make it work with CF using the native XmlTransform() function.
The reason why I've chosen to move to XSLT for this is two fold.
First validating that the input is well-formed XML eliminates the possibility of an XSS attack using certain string-concatenation tricks.
Second it's then easier to manipulate the XHTML packet using XSL and XPath selectors than it is with regular expressions because they're designed specifically to work with a structured XML document, compared to regular expressions which were designed for raw string-manipulation. So it's a lot cleaner and easier, I'm less likely to make mistakes and if I do find that I've made a mistake, it's easier to fix.
Also when I tested them I found that WYSIWYG editors like CKEditor (he removed the F) preserve well-formed XML, so you shouldn't have to worry about that as a potential issue.
The same rules apply for protection: filter input, escape output.
In the case of input containing code, filtering just means that the string must contain printable characters, and maybe you have a length limit.
When storing text into the database, either use query parameters, or else escape the string to ensure you don't have characters that create SQL injection vulnerabilities. Code may contain more symbols and non-alpha characters, but the ones you have to watch out for with respect to SQL injection are the same as for normal text.
Don't try to duplicate the correct escaping function. Most database libraries already contain a function that does correct escaping for all characters that need escaping (e.g. this may be database-specific). It should also handle special issues with character sets. Just use the function provided by your library.
I don't understand why people say "use stored procedures!" Stored procs give no special protection against SQL injection. If you interpolate unescaped values into SQL strings and execute the result, this is vulnerable to SQL injection. It doesn't matter if you are doing it in application code versus in a stored proc.
When outputting to the web presentation, escape HTML-special characters, just as you would with any text.
The best thing that you can do to prevent SQL injection attacks is to make sure that you use parameterized queries or stored procedures when making database calls. Normally, I would also recommend performing some basic input sanitization as well, but since you need to accept code from the user, that might not be an option.
On the other end (when rendering the user's input to the browser), HTML encoding the data will cause any malicious JavaScript or the like to be rendered as literal text rather than executed in the client's browser. Any decent web application server framework should have the capability.
I'd say one could replace all < by <, etc. (using htmlentities on PHP, for example), and then pick the safe tags with some sort of whitelist. The problem is that the whitelist may be a little too strict.
Here is a PHP example
$code = getTheCodeSnippet();
$code = htmlentities($code);
$code = str_ireplace("<br>", "<br>", $code); //example to whitelist <br> tags
//One could also use Regular expressions for these tags
To prevent SQL injections, you could replace all ' and \ chars by an "innofensive" equivalent, like \' and \, so that the following C line
#include <stdio.h>//'); Some SQL command--
Wouldn't have any negative results in the database.