Cloud Spanner read vs Cloud Spanner SQL API - google-cloud-platform

There are two different APIs provided by Cloud Spanner. What’s the difference between Cloud Spanner read vs Cloud Spanner SQL API?

Under the hood, they both use the same execution machinery, so you should see very similar performance for both APIs.
The SQL API is more expressive, since it supports constructs like ORDER BY, LIMIT, filtering, etc. But in some cases the Read API can be simpler to use. For example if you're just doing a simple table range scan on a table with a multi-column primary key, and you want to see all rows with primary key greater than ("A","B","C") and less than ("X","Y","Z").
If you have any doubt about which API to use, I would recommend using the query (SQL) API, as it can grow with you as you realize your simple request actually increases in complexity as your application changes organically over time. You need to add an extra selection condition? That is no problem with the SQL API. You actually need to change the ordering of your result set? That is also easy.

Related

Simple Search capabilities with NoSql (DynamoDB)

I am new to NoSQL. I am trying to make simple app which will have products that you search through. With SQL I would simply have a products table and be able to search any of the columns for substrings with %LIKE% and pull the returned rows. I would like to use DynamoDB, but seemingly there is no way of doing this without introducing AWS OpenSearch (ElasticSearch) which will probably cost more than all my DynamoDb tables. Is there any simple way to do this in DynamoDb without having to scan the whole table and filtering with contains?
No, there is no way to do what you want (search dynamodb) without adding in another layer such as elasticsearch - keep it simple, use a traditional database.
IMO, never assume you need a nosql database - because you rarely do - always assume you need a traditional database until proven otherwise.
Ok so DynamoDB is not what you are looking for, it is designed for a very different use case.
However, ElasticSearch which is in no way tied to DynamoDB very much is what you are looking for and will greatly simplify what you are trying to over using a traditional SQL database. Those who are saying otherwise, are providing poor information. A traditional database cannot index a %LIKE% query, where this is precisely what ElasticSearch does on every field in your document.
Getting started with ElasticSearch is super easy. Just download the Jar and run it, then start going through examples posting and getting documents from the index. If your experience is anything like mine, you will never really want to use a SQL database again, but as is mentioned they each have their own place, and so I do still use traditional RDBMS but I specialize in ElasticSearch.
I have converted many applications that were unable to find reasonable performance, to ElasticSearch where the performance is almost always sub second, and typically a fraction of that. An RDBMS being asked to do many %LIKE% matches will not be able to provide you sub second results.
There are also a number of tools that will automatically funnel data from your RDBMS db into ElasticSearch so that you can have the benefits of both worlds.
NoSQL means a great many things. In general it has been applied to at several classes of datastore.
Columnar Datastore - DynamoDB, Hive
Document/Object Database - MongoDB, CouchDB, MarkLogic, and a great many others
Key/Value - Cassandra, MongoDB, Redis, Memcache
Search Index - SOLR, ElasticSearch, MarkLogic
ElasticSearch bridges the gap between Document Database and Search Index, providing the features of both. It also provides the capabilities of a Key/Value data store.
The columnar datastore is much more tuned for doing work across massive amounts of data, generally in an aggregate, but results from the queries are not the kind of performance you are looking for. These are used for datasets with trillions of rows and hundreds of features/columns.
ElasticSearch however provides very rapid search across large numbers of JSON documents index by default every value in the json.
The way to do this with dynamodb is by using ElasticSearch, however, you do not need DynamoDB to do this with ElasticSearch, so you don't need double the cost.

Is there an important reason for why Google's Datastore and Firestore are so limited when it comes to querying?

If I understand correctly, both Datastore and Firestore provide very limited options for querying substrings. Options, which I'd honestly would have expected to be available, like "LIKE" or "IN", seem not to be supported like they often are by Database Management Systems. Although a limited version of IN seems to be available in Firestore. Is there something I'm missing here? Fetching all entities of a kind to process them yourself server-side seems like a horrible way to deal with these limitations. I thank you in advance.
Firestore has a rather unusual performance guarantee: its query performance is completely independent of the number of documents that it has to consider. In its own terms: query performance is dependent on the size of the result set, not on the size of the collection. This means that if it takes 2s to retrieve 10 documents out of a collection of 1,000 documents, it will also take 2s to retrieve those 10 documents out of a collection of 1,000,000 documents, or even out of a collection of 1,000,000,000 documents.
Firestore only allows queries where it guarantee this performance. If a query type is now available on Firestore, it's because it can't meet the performance guarantee for that query type.
You will have to make a choice for yourself whether your app benefits more from the performance guarantee that Firestore makes, or from the additional query capabilities that another database may offer.
To learn more about trade-offs in NoSQL databases in general, I recommend reading NoSQL data modeling, and for Firestore specifically, watch the Getting to know Cloud Firestore series.

What's the difference between Google Cloud Spanner and Cloud SQL?

I am novice in GCP stack so I am so confused about amount GCP technologies for storing data:
https://cloud.google.com/products/storage
Although google cloud spanner is not mentioned in the article above I know that it is exist and iti is used for data storage: https://cloud.google.com/spanner
From my current view I don't see any significant difference between cloud sql(with postgres under the hood) and cloud spanner. I found that it has a bit different syntax but it doesn't answer when I should prefer this techology to spring cloud sql.
Could you please explain it ?
P.S.
I consider spring cloud sql as a traditional database with automatic replication and horizontal scalability managed by google.
There is not a big difference between them in terms on what they do (storing data in tables). The difference is how they handle the data in a small and big scale
Cloud Spanner is used when you need to handle massive amounts of data with an elevated level of consistency and with a big amount of data handling (+100,000 reads/write per second). Spanner gives much better scalability and better SLOs.
On the other hand, Spanner is also much more expensive than Cloud SQL.
If you just want to store some data of your customer in a cheap way but still don't want to face server configuration Cloud SQL is the right choice.
If you are planning to create a big product or if you want to be ready for a huge increase in users for your application (viral games/applications) Spanner is the right product.
You can find detailed information about Cloud Spanner in this official paper
The main difference between Cloud Spanner and Cloud SQL is the horizontal scalability + global availability of data over 10TB.
Spanner isn’t for generic SQL needs, Spanner is best used for massive-scale opportunities. 1000s of writes per second, globally. 10,000s - 100,000s of reads per second, globally.
Above volume is extremely difficult to achieve with NORMAL SQL / MySQL without doing complex sharding of the database. Spanner deals with all this AND allows ACID updates (which is basically impossible with sharded databases). They accomplish this with super-accurate clocks to manage conflicts.
In short, Spanner is not for CRM databases, it is more for supermassive global data within an organisation. And since Spanner is a bit expensive (compared to cloud SQL), the project should be large enough to justify the additional cost of Spanner.
You can also follow this discussion on Reddit (a good one!): https://www.reddit.com/r/googlecloud/comments/93bxf6/cloud_spanner_vs_cloud_sql/e3cof2r/
Previous answers are correct, the main advantages of Spanner are scalability and availability. While you can scale with Cloud SQL, there is an upper bound to write throughput unless you shard -- which, depending on your use case, can be a major challenge. Dealing with sharded SQL was the big problem that Spanner solved within Google.
I would add to the previous answers that Cloud SQL provides managed instances of MySQL or PostgreSQL or SQL Server, with the corresponding support for SQL. If you're migrating from a MySQL database in a different location, not having to change your queries can be a huge plus.
Spanner has its own SQL dialect, although recently support for a subset of the PostgreSQL dialect was added.

Selecting the right cloud storage option on GCP

I am an entry level developer in a startup. I am trying to deploy a text classifier on GCP. For storing inputs(training data) and outputs, I am struggling to find the right storage option.
My data isn't huge in terms of columns but is fairly huge in terms of instances. It could even be just key-value pairs. My use case is to retrieve each entity from just one particular column from the DB, apply some classification on it and store the result in the corresponding column and update the DB. Our platform requires a DB which can handle a lot of small queries at once without much delay. Also, the data is completely unrelational.
I looked into GCP's article of choosing a storage option but couldn't narrow down my options to any specific answer. Would love to get some advice on this.
You should take a look at Google's "Choosing a Storage Option" guide: https://cloud.google.com/storage-options/
Your data is structured, your main goal is not analytics, your data isn't relational, you don't mostly need mobile SDKs, so you should probably use Cloud Datastore. That's a great choice for durable key-value data.
In brief, these are the storage options available. May be in future it can be more or less.
Depending on choice, you can select your storage option which is best suited.
SOURCE: Linux Academy

looking for a hosted back-end business data storage for analytics

i want a simple hosted data store for licensed for business applications. i want the following features:
REST-like access for CRUD operations (primarily adding records)
private and authenticated
makes for easy integration with a front end charting client like Google Visualization Apis
easy to use and set up
what about:
* Google Fusion Tables
* Google Cloud Services
* Google BigQuery
* Google Cloud SQL
or other non-google products. but i am imagining a cleaner integration between Google Charts and one of their backend data services.
Pros, Cons, Advice?
First, since this is Stack Overflow, I won't attempt to provide a judgement about how about "easy to use and setup" - that can be done by you reading the documentation for each product.
That being said, overall, the "right" answer really depends on what you are trying to do, and how much data you have. It also depends on what type of application you are building (this is Stack Overflow, so I am assuming you are a developer).
Relational Databases (like Google Cloud SQL) are great for maintaining transactional consistency but once your data grows massive it becomes difficult, expensive, or impossible to run analysis queries in a reasonable timeframe.
Google BigQuery is an analysis tool that allows developers to ask questions about really really big datasets using an SQL like language. It is 100% cloud based and is accessed via RESTful API - but it only allows for appending data, not changing individual records.