I used the Google admin token audit logs and they have not been returning data as the events show up. This has been going on. Is this a known issue with the audit api?
You may refer with this Admin console audit log support page. You may find that your Admin console reports and audit logs are not fully populated with the latest data. Be noted that reports do not reflect real-time data, and some reports may take longer to display updated information. There are specific lag times before collected data is available.
It is also stated that retrieving report or audit log data for very old dates or large time ranges may take so much time that once results are available the most recent log entry may no longer be fresh. For applications that require real-time log monitoring use a small time range.
Related
I am currently working on a distributed crawling service. When making this, I have a few issues that need to be addressed.
First, let's explain how the crawler works and the problems that need to be solved.
The crawler needs to save all posts on each and every bulletin board on a particular site.
To do this, it automatically discovers crawling targets and publishes several messages to pub/sub. The message is:
{
"boardName": "test",
"targetDate": "2020-01-05"
}
When the corresponding message is issued, the cloud run function is triggered, and the data corresponding to the given json is crawled.
However, if the same duplicate message is published, duplicate data occurs because the same data is crawled. How can I ignore the rest when the same message comes in?
Also, are there pub/sub or other good features I can refer to for a stable implementation of a distributed crawler?
because PubSub is, by default, designed to deliver AT LEAST one time the messages, it's better to have idempotent processing. (Exact one delivery is coming)
Anyway, your issue is very similar: twice the same message or 2 different messages with the same content will cause the same issue. There is no magic feature in PubSub for that. You need an external tool, like a database, to store the already received information.
Firestore/datastore is a good and serverless place for that. If you need low latency, Memory store and it's in memory database is the fastest.
I have a Books API project, and the GCP shows "No data is available for the selected time frame" for the last 30 days. This message appears on both the "Metrics" and "Quotas" pages. See screenshots below.
Clearly there is data, which I can see via my app analytics reports.
Any suggestions on how to fix it?
UPDATE 1:
Following are some points that were missing on the original post:
The Google Books API is used by an iOS app, which is available on the App Store and widely used across many iOS devices (iPhone and iPads) in many countries.
There are thousands of iOS devices running my app so the Google Books API calls are invoked from thousands of endpoints with different locations and different IPs. All endpoints are using the same API_KEY.
The Google Books API calls are performed successfully from the iOS devices and there is no API issue (I can clearly see that using analytics tool).
The only issue I have, is with GCP console not showing the number of the API calls (and other metrics) associated with my API_KEY. As you can see in the previous screenshots, I get "No data is available for the selected time frame" anywhere.
This is a regression issue since until recently I could successfully view the actual data of the API usage. I didn't change anything in this period.
When going to GCP > IAM & Admin > Quotas, you can clearly see that the app indeed consumes API calls (see screenshot below).
Any suggestion why would the GCP console tell that no data is available, while data is indeed available?
As the documentation [1], Google Books respects copyright, contract, and other legal restrictions associated with the end user's location. As a result, some users might not be able to access book content from certain countries. For example, certain books are "previewable" only in the United States; we omit such preview links for users in other countries. Therefore, the API results are restricted based on your server or client application's IP address.
On the other hand, I hope link [2] could be helpful for you which seems similar to the issue you are facing. Also, documentation [3] [4] could be helpful for us to have more information about books API to use in the Google Cloud Platform.
[1] https://developers.google.com/books/docs/v1/using#UserLocation
[2] Google books api always returns nothing
[3] https://developers.google.com/books/docs/v1/using
[4] https://developers.google.com/books/docs/v1/getting_started
If you embed the Stackdrvier client library in your application and the Google stack driver API has downtime (Google documentation indicates 99.95% or 21.92 minutes of downtime/month)
My question is: What will happen to my application during the downtime? Will logging info build up in memory? Will it cause application errors or will it discard the log data and continue on?
Logging API downtimes can have different root causes and consequences. Google System Engineers have mechanisms in place to track and take mitigation actions so the downtime and its consequences are minimal but Google cannot guarantee data loss prevention in all outages all the time related to logging API.
Hopefully your application and pipeline can withstand up to (21.56 minutes) expected downtime a month (SLA 99.95%) as per the internal SLOs and SLAs of GCP.
The three scenarios you listed are plausible. In this period, your application sending the logs may have 500 responses from the network so it has to be able to deal with this kind of issue.
If the logging data manages to reach Google's platform but an outage prevents the data to be accessible, then Google's team will try their best to release backlogs, repopulate data, etc. They will post general notice on https://status.cloud.google.com/
If the issue is caused by the logging agent not sending data to our platform, then logging data may not be retrievable (but it could still be an infrastructural outage with one of the GCP products) or linked to something other than an outage like your application or its underlying host running out of resources or the logging agent being corrupted which is not covered by GCP Stackdriver SLA [1].
If the pipeline that ingests data from Logging API is backlogged, it could cause an outage but GCP team will try their best to make the data accessible after the outage ends.
If you suspect issues with Logging API malfunctioning, please contact support or file issue tracker or inspect open issues where Google's product team will provide updates live. Links below:
[1] https://cloud.google.com/stackdriver/sla#sla_exclusions
[2]
create new incident:
https://issuetracker.google.com/issues/new?component=187203&template=0
[3]
open issues:
https://issuetracker.google.com/savedsearches/559764
I operate a number of content websites that have several million user sessions and need a reliable way to monitor some real-time metrics on particular pieces of content (key metrics being: pageviews/unique pageviews over time, unique users, referrers).
The use case here is for the stats to be visible to authors/staff on the site, as well as to act as source data for real-time content popularity algorithms.
We already use Google Analytics, but this does not update quickly enough (4-24 hours depending on traffic volume). Google Analytics does offer a real-time reporting API, but this is currently in closed beta (I have requested access several times, but no joy yet).
New Relic appears to offer a few analytics products, but they are quite expensive ($149/500k pageviews - we have several times this).
Other answers I found on StackOverflow suggest building your own, but this was 3-5 years ago. Any ideas?
Heard some good things about Woopra and they offer 1.2m page views for the same price as Relic.
https://www.woopra.com/pricing/
If that's too expensive then it's live loading your logs and using an elastic search service to read them to get he data you want but you will need access to your logs whilst they are being written to.
A service like Loggly might suit you which would enable you to "live tail" your logs (view whilst being written) but again there is a cost to that.
Failing that you could do something yourself or get someone on freelancer to knock something up for you enabling logs to be read and displayed in a format you recognise.
https://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the metrics that you need to track are just limited to the ones that you have listed (Page Views, Unique Users, Referrers) you may think of collecting the logs of your web servers and using a log analyzer.
There are several free tools available on the Internet to get real-time statistics out of those logs.
Take a look at www.elastic.co, for example.
Hope this helps!
Google Analytics offers real time data viewing now, if that's what you want?
https://support.google.com/analytics/answer/1638635?hl=en
I believe their API is now released as we are now looking at incorporating this!
If you have access to web server logs then you can actually set up Elastic Search as a search engine and along with log parser as Logstash and Kibana as Front end tool for analyzing the data.
For more information: please go through the elastic search link.
Elasticsearch weblink
I have a real time web analytics problem to address, and I'm wondering if some of the WSO2 products might be an appropriate solution.
An ecommerce web site shows pages of products to a browser user, and the web site vendor wants to collect details of what products were viewed in a list, what products were selected from the list for more info, what products were put into the basket, and what products were actually purchased - all in real time. I can use web page tagging to generate logging events for the four states (I.e. In list, view detail, in basket, purchased). The web site vendor wants too see results summarized by product and by rolling time band (e.g. Last hour, last 6 hours, last 24 hours, last 72 hours) by the four product states.
As a complete WSO2 newbie I'm hoping somebody can help with some pointers on how to address this. I've been reading about the BAM module to capture events. Is that a good place to start? Also can anybody suggest a good in memory data store to hold the event data aggregated by event type and rolling time period?
TIA
Yes, BAM is more kind batch processing, monitoring and complex engine and using it you can capture data, process and then present. In architectural point of view, the product states that are changed by the browser user will be captured by the web server and publish to BAM server.
A good point to start is learning about data publishing. Once you define the data [in BAM it is known as stream definition] to be published, you can write a hive script to process it and present. You can pump all data to BAM and then you can use hive script to process and store it in the manner you wanted. Later you can retrieve and present.