In API Store statistics I have created a gadget to sort API requests details. The data exposed are in a persisted stream (table), populated by a stream that groups data by date.
The data shown in the gadget (TableChart) are sorted by date in ascending order, so the oldest records are shown for firsts; the limit of records imposed in js/core/batch-provider-api.js does that newest records are never shown.
Is it possible to define an order by clause to retreive the most recent data as the first?
Related
BigQuery's documentation says the following about clustered tables:
When you create a clustered table in BigQuery, the table data is automatically organized based on the contents of one or more columns in the table’s schema. The columns you specify are used to colocate related data. When you cluster a table using multiple columns, the order of columns you specify is important. The order of the specified columns determines the sort order of the data.
Since the records in the table are already colocated and sorted, is it possible to retrieve all the records from an arbitrary cluster?
I have a table that is too large to use ORDER BY. However, it is already clustered in the manner I need, so I can save a lot of time and expense if I could retrieve all the data from each cluster separately.
Google BigQuery (BQ) allows you to create a partition using timestamp or date types only.
99% of my data has a very clear selector, idClient. I've created to my customer's views with a predicate like idClient = code so the privacy is guaranteed.
The problem with this strategy is that there are customers with 5M rows and others with 200K and as BQ does not have indexes, they are always processing data from each other (and the costs are rising).
I am intending to create a timestamp field where each customer will have a different timestamp that will be repeated for every Insert in every customer sensitive table and thus I can query by timestamp by fixing it as it would be with a standard ID.
Does this make any sense? If BQ was an indexed database I'd be concerned about skewed data but as it is always full table scan, I think I'd have only benefits and no downsides.
The solution for your problem is to add Cluster field to your table which is equivalent to an Index in other databases
This link provides the basic on how to use cluster field
Clustering can improve the performance of certain types of queries such as queries that use filter clauses and queries that aggregate data. When data is written to a clustered table by a query job or a load job, BigQuery sorts the data using the values in the clustering columns
Note: When using cluster field BigQuert dryRun doesn't show the cost improvement which can only be seen post-execution
I added some items to dynamoDB table using DynamoDBMapper.save. I then queried the item immediately. Will I definitely get the saved item? Or I should put thread.sleep() before querying the item? In SQL database, we use transactions and we can guarantee that we will get the item once the record is inserted to sql table. But for dynamoDB, I am not sure. Checked AWS dynamodb documents but didn't find related information.
DynamoDB reads are eventually consistent by default. However, DynamoDB does allow you to specify strongly consistent reads using the ConsistentRead parameter for Read operations. It does come at a cost however, strongly consistent reads take up twice as much Read Capacity Units.
See: Read consistency in DynamoDB
I am looking to stream some data into Big Query and had a question around Step 3 of Google's best practices for Streaming data into Big Query. The process makes sense at a high level but I'm struggling with the implementation for step 3. (I am looking to use the datastore as my transactional data store.) For step 3 says to "reconciled data from the transactional data store and truncate the unreconciled data table.". My question is this; If my reconciled data is in the Google Datastore is there a way to automate the backup and deletion of this data without manually intervention?
I know I could achieve this recommended practice by using the Datastore Admin. I could:
1) Pause all writes to the datastore
2) Backup the datastore table to Cloud Storage
3) Delete all entities in the table I just backed up.
4) Import the backup into Big Query
Is there a way I can automate this so I don't have to do it manually at regular intervals?
Real-time dashboards and queries
In certain situations, streaming data into BigQuery enables real-time analysis over transactional data. Since streaming data comes with a possibility of duplicated data, ensure that you have a primary, transactional data store outside of BigQuery.
You can take a few precautions to ensure that you'll be able to perform analysis over transactional data, and also have an up-to-the-second view of your data:
1) Create two tables with an identical schema. The first table is for the reconciled data, and the second table is for the real-time, unreconciled data.
2) On the client side, maintain a transactional data store for records.
Fire-and-forget insertAll() requests for these records. The insertAll() request should specify the real-time, unreconciled table as the destination table.
3) At some interval, append the reconciled data from the transactional data store and truncate the unreconciled data table.
4) For real-time dashboards and queries, you can select data from both tables. The unreconciled data table might include duplicates or dropped records.
My database have 6-7 tables on server side. i want only few 10-50 list of customers which is get me by store procedure (selecting records by joining of 6-7 tables).
I created application(used in both online & offline environment) which is sync up table from server to client vise versa. Which is displaying that customers name in combo box (records from stored procedure).
I am using sync framework. but this 6-7 tables contain huge records near around 67k. I don't want to sync up that 6-7 table. I want to sync up only those list of customer as per the login user.
I created one table like:
Customer_List user_Id Customer_Name customer_Id
and stored procedure return list of customers as per above table structure:
I want to sync up this table with my stored procedure using sync framework.
How I can do this?
there is on publicly available API in Sync Framework for you to specify/invoke a custom stored proc.
seems like what you're SP does can be represented as a Filter...
e.g., side.CustomerId IN (SELECT CustomerId FROM Customer_List WHERE User_Id =#User_Id)