Stackdriver Dashboards eating all our memory - google-cloud-platform

I often have to build Stackdriver dashboards. The browser tab of an ordinary dashboard with six charts clocks in at 1.8 GB (!) in Google Chrome Task Manager.
This would normally be a question for the product team, but next to its sluggish behaviour (e.g. compared to Grafana dashboards) and it crippling machines, this hasn't changed since its initial introduction to GCP years ago.
So my question: How to reduce the load on project members' workstations? Via dashboard design, a specific way of using the dashboards, or maybe browser plugins?

Related

Need help building an uptime dashboard for a distributed system

I have a product for which I would like to create a dashboard to show
its availability/uptime over time and display any outages.
Specifically I am looking for
ability to report historical information on service uptime
provide details on any service outages
The product is running on a fleet of linux servers and connects to a DB running
on a separate instance, also we have some dedicated instances that run nightly
batch jobs. My system also relies on some external services to provide
additional functionality for select customers. There is redis cache also for
caching data for multiple customers.
We replicate all the above setup (application servers, DB, jobs servers, redis
cache etc) into dedicated clusters for large customers. Small customers are put
on one of the shared clusters to keep costs low.
Currently we are running health checks on application servers only and providing
that information in a simple HTML page. This is a go to page for end-users/customers
and support teams.
Since the product is constructed using multiple systems/services our current HTML
page often times says that the system is up and running fine while can be experiencing
issues with some of its components or external services.
Current health check is using a simple HTTP request and looks for a 200
status code, this check runs every minute and we plot this data into a simple
chart to show last 30 days. We also show a list of outages with timestamp and
additional static information that is manually added.
We would like to build a more robust solution that monitors much more than the HTTP port
and where we have more details like what part
of the system is having issues and how those issues are impacting the system and
which customers are impacted.
Appreciate any guidance or help. We prefer to build the solution using
open source tools since we dont have much budget. Goal is to improve things for
my team members who are already overloaded.
I'm not sure if this will be overkill or not for your setup, given that I don't know your product, but have a look at the ELK Stack and see if you can use some components or at least some ideas from there:
What is the ELK Stack?
The Complete Guide to the ELK Stack

What could have happened to my website files on my google cloud platform?

I was using google cloud platform to host a ckan based website. The website had a file library with about 5 gigs of documents. Our project got put on hold and I removed billing on the website about 8 months ago. Now we are trying to migrate the data, but when I look at the project in Google Cloud Platform, there are no compute instances, buckets or files under the various storage modules. I cannot find the 5gigs of files we uploaded and filed by various categories on the ckan website. What could have happened to them? I'm not very experience with this platform and a bit confused. Is there any way for me to recover my data?
As the GCP public documentation indicates, if your billing account remains invalid for a protracted period, some resources might be removed from the projects associated with your account. For example, if you use Google Cloud, your Compute Engine resources might be removed.
Removed resources are not recoverable, the best option you had was to review your case with tech support and have a more specific answer, I noticed that there isn't any way to recover resources 30 days later it had been removed, this is also described in Data deletion term, including existing copies.

Stackdriver Logging Client Libraries - What happens during Google Downtime?

If you embed the Stackdrvier client library in your application and the Google stack driver API has downtime (Google documentation indicates 99.95% or 21.92 minutes of downtime/month)
My question is: What will happen to my application during the downtime? Will logging info build up in memory? Will it cause application errors or will it discard the log data and continue on?
Logging API downtimes can have different root causes and consequences. Google System Engineers have mechanisms in place to track and take mitigation actions so the downtime and its consequences are minimal but Google cannot guarantee data loss prevention in all outages all the time related to logging API.
Hopefully your application and pipeline can withstand up to (21.56 minutes) expected downtime a month (SLA 99.95%) as per the internal SLOs and SLAs of GCP.
The three scenarios you listed are plausible. In this period, your application sending the logs may have 500 responses from the network so it has to be able to deal with this kind of issue.
If the logging data manages to reach Google's platform but an outage prevents the data to be accessible, then Google's team will try their best to release backlogs, repopulate data, etc. They will post general notice on https://status.cloud.google.com/
If the issue is caused by the logging agent not sending data to our platform, then logging data may not be retrievable (but it could still be an infrastructural outage with one of the GCP products) or linked to something other than an outage like your application or its underlying host running out of resources or the logging agent being corrupted which is not covered by GCP Stackdriver SLA [1].
If the pipeline that ingests data from Logging API is backlogged, it could cause an outage but GCP team will try their best to make the data accessible after the outage ends.
If you suspect issues with Logging API malfunctioning, please contact support or file issue tracker or inspect open issues where Google's product team will provide updates live. Links below:
[1] https://cloud.google.com/stackdriver/sla#sla_exclusions
[2]
create new incident:
https://issuetracker.google.com/issues/new?component=187203&template=0
[3]
open issues:
https://issuetracker.google.com/savedsearches/559764

What is the best tool to use for real-time web statistics?

I operate a number of content websites that have several million user sessions and need a reliable way to monitor some real-time metrics on particular pieces of content (key metrics being: pageviews/unique pageviews over time, unique users, referrers).
The use case here is for the stats to be visible to authors/staff on the site, as well as to act as source data for real-time content popularity algorithms.
We already use Google Analytics, but this does not update quickly enough (4-24 hours depending on traffic volume). Google Analytics does offer a real-time reporting API, but this is currently in closed beta (I have requested access several times, but no joy yet).
New Relic appears to offer a few analytics products, but they are quite expensive ($149/500k pageviews - we have several times this).
Other answers I found on StackOverflow suggest building your own, but this was 3-5 years ago. Any ideas?
Heard some good things about Woopra and they offer 1.2m page views for the same price as Relic.
https://www.woopra.com/pricing/
If that's too expensive then it's live loading your logs and using an elastic search service to read them to get he data you want but you will need access to your logs whilst they are being written to.
A service like Loggly might suit you which would enable you to "live tail" your logs (view whilst being written) but again there is a cost to that.
Failing that you could do something yourself or get someone on freelancer to knock something up for you enabling logs to be read and displayed in a format you recognise.
https://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the metrics that you need to track are just limited to the ones that you have listed (Page Views, Unique Users, Referrers) you may think of collecting the logs of your web servers and using a log analyzer.
There are several free tools available on the Internet to get real-time statistics out of those logs.
Take a look at www.elastic.co, for example.
Hope this helps!
Google Analytics offers real time data viewing now, if that's what you want?
https://support.google.com/analytics/answer/1638635?hl=en
I believe their API is now released as we are now looking at incorporating this!
If you have access to web server logs then you can actually set up Elastic Search as a search engine and along with log parser as Logstash and Kibana as Front end tool for analyzing the data.
For more information: please go through the elastic search link.
Elasticsearch weblink

Should I use google charts in a production environment [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Should I use google charts in production environment?
Google charts are very easy to use.
https://google-developers.appspot.com/chart/interactive/docs/quick_start
But is it recommended to be used in a production environment?
The API's are not hosted in house but called form google servers.
There is a risk of google changing them or discontinuing them.
I couldn't find any license agreement to use.
Is the data secure as the data is being sent to google servers.
Are the above real risks or am I over thinking.
I was wondering if anyone has any experience with using google API's in production. Or if anyone can give some recommendations.
The Terms of Service cover some of your questions. Basically, Google's deprecation policy says that the API will be available for 3 years following deprecation (and most of the API - namely, the Interactive Charts API - is not deprecated; the old Image Chart API is, however).
For data security, most charts in the Interactive Charts API do not send any data to Google's servers, though there are exceptions. Each chart's documentation has a Data Policy section which explains what, if any, data is sent to Google (examples: AreaCharts, which do not send any data; and GeoCharts, which may send data if you use the geocoding features). Charts in the Image Chart API do send data to Google's servers, as they generate the images server-side rather than client-side, but this API is deprecated anyway, so you probably shouldn't be using it.
The main risk with using the Visualization API in my experience is that you have (practically) no control over versioning. When the development team releases an update, everyone everywhere gets the update. Usually this is a good thing, as it brings new features, bug fixes, and performance enhancements to everyone. Occasionally, however, a new release may introduce a bug, or change the behavior or appearance of a chart in some way that is undesirable for your application. When this happens, you generally cannot roll back to the previous version. For projects that are under active development for long periods of time, this is generally an acceptable trade-off for the free (as in beer) chart API. For projects that do not have a long-term maintenance budget, this can be problematic.
If your user-base is in an area that has poor connectivity to Google's servers, having the API hosted remotely could be problematic, but in general this is not the case.
I have used it in a production environment. All the questions you have posed are very real possibilities. For use it came down to budget, the money was there to purchase a system so we went with what we could afford at the time. The direction you go really depends on budget and existing systems that might be able to achieve the same thing.