Is there a way to store two months of history in my Ripple testnet? - blockchain

I have a ripple testnet and mainnet. Their history storage configurations are equal. However, in my mainnet my complete_ledgers result returns a first block whose closing time was in the beginning of October and in the testnet the first ledger that is available is from mid-November. The only other significative difference between the two configuration files is the node size (small for testnet and medium for mainnet, as advised). Below follows the configuration for the history block.
Is there any difference in the configuration of the mainnet and testnet that i am not accounting for? Do you have any suggestions on solving this?
[node_db]
type=NuDB
path=/data/nudb
advisory_delete=0
# How many ledgers do we want to keep (history)?
# Integer value that defines the number of ledgers
# between online deletion events
online_delete=700000
[ledger_history]
# How many ledgers do we want to keep (history)?
# Integer value (ledger count)
# or (if you have lots of TB SSD storage): 'full'
350000
I have seen in the documentation that the ledger_history should never be greater that the online_deletion and vice-versa, so I tried setting the ledger_history to the same value as I have in online_deletion, but that did not work.

It seems like the testnet and mainnet's chains do not progress in the same way or the configurations affect the chains differently, so I increased the ledger history and online_deletion's values for the testnet and I am now storing as much history as needed

The ledgers your rippled node can sync depends highly on the peers it is connected to.
Your node can only download as much history as its connected peers can provide. So if you want to download 700k ledgers but your peers only hold 100k ledgers, then your max downloadable history will be 100k ledgers.
Probably due to the config change and therefore restart of rippled, you got connected to different peers which hold more history and your node was able to download as much as it requested.
You can check your peers with the command:
rippled peers
and see what history they can provide in the complete_ledgers property.
Best,
Daniel

Related

S3 objects without prefix performance

I'm trying to find out whether storing objects with randomized keys and no "prefix" will give me S3 max performacne of 5500 Get/sec per object or since I don't have a prefix all those objects without prefix fall into a "no-prefix" category and share the 5500 limit.
Example: The following objects are stored directly in a bucket
njfoia74G.obj
njfoia74G.obj
njfoia74G.obj
will I get 5500 Get/Sec for each object or do they share that?
S3 documentation suggests that ky are not part of the prefix so not sure how to calculate throughput for those objects.
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-keys
Has anyone done a benchmark or have documentation that can answer this?
From Request Rate and Performance Guidelines - Amazon Simple Storage Service:
Your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix in a bucket.
The root of a bucket is effectively an empty prefix, so all objects in the root would share the limit.
By the way very few systems would approach anywhere near these volumes. If you have millions of users (causing over 10 million requests per hour), then definitely implement some of the recommended techniques. But the vast majority of sites will never need to worry about it.

Google Analytics not filtering internal traffic

I know there have been similar questions in the past but I have tried many solutions given online to no avail. I am just not able to hide internal traffic for Google Analytics on my Django site.
I am setting the filter from Admin->View->Filters. Have tried Predefined and Custom both with fixed IP as well as a regex pattern. (Yes, I have double checked my IP from whatismyip.com and I am using the right one)
I read somewhere that it takes time for the filters to come into effect, so even waited for 24 hours but I still see a lot of internal traffic.
Google Tag Assistant is also tracking the pages when I access them from internal IP (not sure if its supposed to know about the filters)
Not sure where could I be going wrong.
(I am using reverse proxy but hopefully that wouldn't change anything since the google analytics code is run on the client side)
Do not use any filter on the default view (called 'All Website Data'). Create a separate view and then create a filter on it. That will work.
(After struggling with it for a few days, this response helped me with the above fix)
I struggled with this as well, so here is what I found out.
Note that real time reporting can take up to 2hrs to catch up to and reflect analytics configuration changes such as the addition of filters.
Possible solutions
1) As suggested in the other answer, leave the default view as default and create an additional view for the filters:
The default view collects
all traffic. You need to create a new view for which you can apply
your filter. Check out item 3 here
https://support.google.com/analytics/answer/1009618?hl=en
How to add
a new view: https://support.google.com/analytics/answer/1009714?hl=en
2) Filter IP v6, not v4:
Exclude the ipv6 address as mentioned in above post. This is the one
that "what is my ip address" returns. It's not the ipv4 syntax
(xxx.xxx.xxx.xxx) However, I have noticed that wired machines that
stay connected seem to keep the same ipv6 IP (the 31 digit sequence),
however wireless accounts (mobile phones, tablets) tend to be dynamic.
However, as posted above if you use just the first 15 digits of the
sequence and use the "begins with" filter type, it will block
the devices using the same shared router (ie. internet router in your
home)
About filtering only the first 15 digits:
I think it is meant to filter the first four blocks, so if your IPv6 looks like 2601:191:c001:2f9:5c5a:1c20:61b6:675a, then filter IP that begins with 2601:191:c001:2f9:.
Information found here.

AWS - Aurora replicas

Scenario:
I have two reader-aurora replicas.
I make many calls to my system (high load)
I see only one replica working at 99.30%, but the other one is not doing
anything at all
Why?, is because this second replica is ONLY to prevent failures of the first one?, cannot be possible to make both to share the load?
In your RDS console, you should be able to look at each of the 3 instances
aurora-databasecluster-xxx.cluster-yyy.us-east-1.rds.amazonaws.com:3306
zz0.yyy.us-east-1.rds.amazonaws.com:3306
zz1.yyy.us-east-1.rds.amazonaws.com:3306
If you look at the cluster tab you will see two end points and the 2nd is the following:
aurora-databasecluster-xxx.cluster-ro-yyy.us-east-1.rds.amazonaws.com
Aurora allows you do either explicitly get to specific read replica. This would allow a set of read only nodes for OLTP performance and another set for data analysis - with long running queries that won't impact performance.
If you use the -ro end point, it should balance cross all read only nodes or you can have your code take a list of read only connection strings and do your own randomizer. I would have expected the ro to be better...but I am not yet familiar on their load balancing technique (fewest connections, round robin, etc)

How long does it take for AWS S3 to save and load an item?

S3 FAQ mentions that "Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES." However, I don't know how long it takes to get eventual consistency. I tried to search for this but couldn't find an answer in S3 documentation.
Situation:
We have a website consists of 7 steps. When user clicks on save in each step, we want to save a json document (contains information of all 7 steps) to Amazon S3. Currently we plan to:
Create a single S3 bucket to store all json documents.
When user saves step 1 we create a new item in S3.
When user saves step 2-7 we override the existing item.
After user saves a step and refresh the page, he should be able to see the information he just saved. i.e. We want to make sure that we always read after write.
The full json document (all 7 steps completed) is around 20 KB.
After users clicked on save button we can freeze the page for some time and they cannot make other changes until save is finished.
Question:
How long does it take for AWS S3 to save and load an item? (We can freeze our website when document is being saved to S3)
Is there a function to calculate save/load time based on item size?
Is the save/load time gonna be different if I choose another S3 region? If so which is the best region for Seattle?
I wanted to add to #error2007s answers.
How long does it take for AWS S3 to save and load an item? (We can freeze our website when document is being saved to S3)
It's not only that you will not find the exact time anywhere - there's actually no such thing exact time. That's just what "eventual consistency" is all about: consistency will be achieved eventually. You can't know when.
If somebody gave you an upper bound for how long a system would take to achieve consistency, then you wouldn't call it "eventually consistent" anymore. It would be "consistent within X amount of time".
The problem now becomes, "How do I deal with eventual consistency?" (instead of trying to "beat it")
To really find the answer to that question, you need to first understand what kind of consistency you truly need, and how exactly the eventual consistency of S3 could affect your workflow.
Based on your description, I understand that you would write a total of 7 times to S3, once for each step you have. For the first write, as you correctly cited the FAQs, you get strong consistency for any reads after that. For all the subsequent writes (which are really "replacing" the original object), you might observe eventual consistency - that is, if you try to read the overwritten object, you might get the most recent version, or you might get an older version. This is what is referred to as "eventual consistency" on S3 in this scenario.
A few alternatives for you to consider:
don't write to S3 on every single step; instead, keep the data for each step on the client side, and then only write 1 single object to S3 after the 7th step. This way, there's only 1 write, no "overwrites", so no "eventual consistency". This might or might not be possible for your specific scenario, you need to evaluate that.
alternatively, write to S3 objects with different names for each step. E.g., something like: after step 1, save that to bruno-preferences-step-1.json; then, after step 2, save the results to bruno-preferences-step-2.json; and so on, then save the final preferences file to bruno-preferences.json, or maybe even bruno-preferences-step-7.json, giving yourself the flexibility to add more steps in the future. Note that the idea here to avoid overwrites, which could cause eventual consistency issues. Using this approach, you only write new objects, you never overwrite them.
finally, you might want to consider Amazon DynamoDB. It's a NoSQL database, you can securely connect to it directly from the browser or from your server. It provides you with replication, automatic scaling, load distribution (just like S3). And you also have the option to tell DynamoDB that you want to perform strongly consistent reads (the default is eventually consistent reads; you have to change a parameter to get strongly consistent reads). DynamoDB is typically used for "small" records, 20kB is definitely within the range -- the maximum size of a record would be 400kB as of today. You might want to check this out: DynamoDB FAQs: What is the consistency model of Amazon DynamoDB?
How long does it take for AWS S3 to save and load an item? (We can freeze our website when document is being saved to S3)
You will not find the exact time anywhere. If you ask AWS they will give you approx timings. Your file is 20 KB so as per my experience from S3 usage the time will be more or less 60-90 Sec.
Is there a function to calculate save/load time based on item size?
No there is no any function using which you can calculate this.
Is the save/load time gonna be different if I choose another S3 region? If so which is the best region for Seattle?
For Seattle US West Oregon Will work with no problem.
You can also take a look at this experiment for comparison https://github.com/andrewgaul/are-we-consistent-yet

Maximum no.of connections that can be held by s3

I am learning about Amazon Web services. I just want to know what is the maximum number of connections(roughly) that can be held by Amazon S3 simultaneously without crashing...
Theoretically this is infinite. To achieve this, they use a partitioning scheme they explain here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Basically they partition your buckets on different servers based on the first few characters of the filename. If those are random, you scale indefinitely (they just take more characters to partition on). If you prepend all files with file_ or something (so S3 cannot partition the files correctly because all files have the same starting characters), the limit is about 300 GET / sec or 100 PUT/DELETE/POST per second.
See that page for an in-depth explanation.
Given the AWS documentation you will receive HTTP 503 Slow Down over 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second.
The limit has been increased in July 2018.
More information :
https://aws.amazon.com/en/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html