Web Services Design Question - Logging messages

Web Services Design Question - Logging messages - web-services

We had a debate in the office with respect to audit logging of messages received and sent via Web Services.
I am of the opinion that the entire SOAP message should not be logged in the application audit logs unless there is a requirement that states that this is required. Only salient elements of the request need to be part of the audit log as this provides evidence that is required in the audit trail.
My reasons are:
(1) Audit logs by definition are always turned on and should not be turned off. So if we take the decision of logging the entire message for audit trail they will be turned on always and can cause a huge performance impact during production runs (particularly during peak loads)
(2) If the business/technical requirement does not explicitly state this as a requirement this is an un-necessary overhead. If information is required, the run-time engines tracing capability can be used to turn on/off to get the SOAP messages.
What are the generic thoughts of experts in this space.
Thanks,
Manglu

Don't confuse auditing with logging. If there is a requirement for auditing then you need to perform auditing.
Since auditing is typically required for legal or policy reasons you need to understand what actions and activities need to be logged as well as what data needs to be logged. This is not a technical decision but needs to be determined by the business. Once you have your requirements then you can project your audit volumes and design your application to take these into account (e.g. performance, storage, etc.).
If you think you have an auditing requirement but it is not explicitly stated then ask for clarification. You don't want to find this out only after you have been sued.
If you truly have an auditing requirement then you should probably audit the entire soap request message as well as the response. This is to support non-repudiation.
As an example let's say that you have a health care application and only audit the key information: personal identifiers (e.g. SSN) and whether the patient is allergic to penicillin. But what happens when a patient dies because is allergic to penicillin was false when it shouldn't have been? The audit logs are checked and you say that you were sent a value of false for that patient but the other system says that they actually sent you a value of true and that you must have a problem with your system. In this scenario what you need to do is to show the exact message that was sent to the web service and that because it was signed by the service consumer you can prove that it came from them and also prove that the data in the message is correct. Then you would follow that information through your system via the audit logs.
Of course, it all goes back to the requirements; if the business finds that only auditing x and y satisfies whatever legislation or policies then go with that.

I know from experience that logging it all can lead to pretty huge files or a lot of data if kept on database. It's very helpful during development time, but in production it becomes a problem. I would suggest logging as you said. But be aware of a situation I came across: We were providing a webservice for 3rd-party companies use. When there's some dispute about who's fault is the error. We needed the exact SOAP message to prove that it wasn't our fault. I don't know if this scenario applies to you.

Related

Does SNS retain my data?

I am evaluating push notification services and cannot use services on the cloud as laws prohibit customer identification data being stored off-premise.
Question
Is there any chance data will be stored off-premise if I use AWS-SNS API (not the console) to send push notifications to end user devices via code hosted on-premise(using AWS SDK)? In other words, will SNS retain my data or will it forget it right after it send the notification?
What have I tried so far?
Combed through the documentation as much as I could, but couldn't find anything to be 100% sure.
Would appreciate any pointers on this. TIA.

I would pose this question directly to AWS as it pertains to a legal requirement. I would clarify if the laws you need to comply with are in relation to data at rest or in transit, or both. Additionally if there are any circumstances where it would be ok for one or both of the aforementioned if there was certain security aspects that have been met.
Knowing no real detail about your use case I will say that AWS has a Region specifically for use by the US Government. If your solution is for the US Government then you should be making use of this Region as it ticks off a lot of compliance forms for you well in advance.
You can open a support ticket in the AWS console.
Again if there is a legal requirement for your data I thoroughly recommend that you ask AWS directly so that you may reference their answer in writing in the future.

Even if they didn't store it, how can you prove that to auditors?
Besides, what is the difference between storing something in memory (which they obviously have to do) and storing something on disk? One is volatile and the other isn't I guess. But from a compliance point of view, an admin on the box can get both, so who cares if the hardware with your data on it is a stick of RAM or a disk plugged into a SATA port?

How to relieve a rate-limited API?

We run a website which heavily relies on the Amazon Product Advertising API (APAA). What happens is that when we experience a sudden spike in users it happens that we hit the rate-limit and all functions relying on the APAA shut down for a while. What can we do so that doesn't happen?
So, obviously we have some basic caching in place, but the APAA doesn't allow us to cache data for a very long time, and APAA queries can vary a lot so there may not be any cached data at all to query.

I think that your only option is to retry the API calls until they work — but do so in a smart way. Unfortunately, that's what everybody that gets throttled does and AWS expects people to handle that themselves.
You can implement an exponential backoff and add jitter to prevent cluster calls. AWS has a great blog post about solutions for this kind of problem: https://www.awsarchitectureblog.com/2015/03/backoff.html

Web services, architectural design advice for central logging

We have a certain number of SOAP and REST Web Services, which provide legal information for clients. Management demands to log all the information which is requested by this services. Using logs they want to collect statistics and bill clients.
My colleague offered to use central relational database for logging.
I don’t like this solution, because number of services are growing and I think such architecture will be bottleneck for productivity.
Can you advise me what architectural design will be good for such kind of task ?

When you say the central database will be a bottleneck, do you mean that it will be too slow to keep up with the logging requests? Or are you saying you are expecting database changes for each logging request type?
I would define a more generic payload for logging (figure out your minimum standardized fields), and then create a database for those logs.
<log><loglevel>INFO</loglevel><systemName>ClientValueActualizer</systemName><userIp>123.123.123.432</userIp><logpayload><![CDATA[useful payload for billing]]</logpayload></log>
If you are worried about capacity, you could throw a queue in front of it, which would have the advantage of not bogging down the client if the logs are busy.
You can decouple the consumption of these messages into separate systems. each of which can understand the various payloads. The risk here is if you want to add new attributes, it will be difficult to control what systems are sending what. But that's just a general issue with decoupled services.

you can consider Apache Kafka as distributed commit log. This be good for performance wise as it scales out horizontally and it can deliver messages only when client pulls those messages.

Facebook Graph API-Account suspension

I have a .Net application that uses list of names/email addresses and finds there match on Facebook using the graph API. During testing, my list had 900 names...I was checking facebook matches for each name in in a loop...The process completed...After that when I opened my Facebook page...it gave me message that my account has been suspended due to suspicious activities?
What am I doing wrong here? Doesn't facebook allow to search large number requests to their server? And 900 doesn't seem to be a big number either..

per the platform policies: https://developers.facebook.com/policy/ this may be the a suspected breach of their "Principals" section.
See Policies I.5
If you exceed, or plan to exceed, any of the following thresholds
please contact us by creating confidential bug report with the
"threshold policy" tag as you may be subject to additional terms: (>5M
MAU) or (>100M API calls per day) or (>50M impressions per day).
Also IV.5
Facebook messaging (i.e., email sent to an #facebook.com address) is
designed for communication between users, and not a channel for
applications to communicate directly with users.
Then the biggie, V. Enforcement. No surprise, it's both automated and also monitored by humans. So maybe seeing 900+ requests coming from your app.
What I'd recommend doing:
Storing what you can client side (in a cache or data store) so you make fewer calls to the API.
Put logging on your API calls so you, the developer, can see exactly what is happening. You might be surprise at what you find there.

Web Service Implementation Changes

To what degree should web service providers limit implementation changes without creating a new service version? One view is that as long as the contract is upheld, the service owner should be free to update the implementation as needed. Schemas are not always air tight and it is foreseeable that changes within the service implementation affect the service output while still upholding the contract.
To what degree should consumers be notified of implementation changes? Its one thing to notify consumers of updates to your own web service implementation. How feasible is it to track implementation changes to all downstream dependencies? Should service owners create a new version when they know that a change may affect consumers? And try to be a good citizen and notify consumers of all other changes?
Lots of questions and I doubt there is one size fits all answer. It could just depend on the situation. Maybe this is what SLAs are for.

Good questions, and I think you've already answered it. Yes, these details would be in an SLA and I think that if the contract/WSDL is the same that why would the service need to notify its' consumers? Unless of course changes to the service impact response times and performance. Maybe the service would notify consumers when another contract is introduced (in addition to the original). Consumers become aware of any new capabilities and can adjust their clients accordingly if desired.

I'm in an environment where SLAs don't exist for internal clients, so absent an SLA, the following are some common sense guidelines
Attempt to limit number of modifications to services
Communicate service implementation releases so consumers can plan test cycles
Provide consumers with the list of direct downstream dependencies and location to find their schedules and release notes
Consider a new version if an implementation change will semantically affect consumer

A lot depends on your specific circumstances. Speaking generally, here are a few top considerations.
The service contract and schema are all that a service and client share in common. A service implementation change that does not change the contract or schema (e.g., fixing a bug in the implementation logic) should not necessitate notifying the clients, nor should it be considered a new version.
OTOH, if you have a poorly constructed, overly-loose contract, such as passing all of the data as one big string, where the client had to do extensive interpretation to consume the service, and now you're looking to exploit that overly-loose contract in a way that would likely break the client, you owe it to all parties to change the contract (and improve it!) and publish that as a new version of the service.
Since services are often used to enable loose coupling between services, it is sometimes not practical or even possible to identify all of the clients of a service. Producing a new version of a service in these situations often entails maintaining multiple versions of a service for some period of time, often as directed by some governance body.
Providing details about service implementations, implementation dependencies, etc., encourages creating tight coupling by disclosing non-contract related details that the client may then take a dependency on. That can limit the ability of the service to change independently of the client.
The book Web Service Contract Design and Versioning for SOA
by Thomas Erl is a good resource on the topic, and details several common scenarios.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js