Do blockchains contain a websocket server? - blockchain

I was recently reading about blockchains and am very intrigued by this technology. I had a few questions regarding blockchains:
Do Blockchains use web-sockets to transmit information between users? If yes then is the information(blocks) sent always a JSON object?
Do all users have the entire copy of the blockchain, do they each just see a partial copy of the blockchain? if yes then how big can the file get?
Also, what determines transactions/second? I read bitcoin does about 7transactions/seconds and what is needed to make them more scalable, is it coding factors such as writing a more efficient algorithm (big-O) or is it some kind of hardware limitation?
Sorry if these questions seem trivial but I am a newbie trying to learn the technology. Any help would be appreciated.

No, usually they use low-level protocol build on top of TCP.
Users should have an entire copy of blockchain in order to verify transactions. Database size of Bitcoin: 200 GB, Ethereum 660 GB. You can use lightweight clients, which don't have a full copy, but in this case, you are not part of the network.
In Bitcoin, there is a limit on block size, 1 MB. Average transaction size is about 400 bytes, so the average block contains 2000 transactions. There is no problem with increasing block size limit, this was done in Bitcoin Cash network (32 MB). But we cannot increase it to infinity since internet connection speed and transaction verification disk/CPU resources are not infinite.

Related

Need a high latency, ultra low bandwidth data transfer technique the likes of Google Pub/Sub but inverse

I have an interesting IoT use case. For this example let's say I need to deploy a thousand IoT cellular displays. These have a single LED that will display information that will be useful to someone out in the world. For example a sign at the start of hiking trail that indicates whether the conditions are favorable. Each display needs to receive 3 bytes of data every 5-10 minutes.
I have successfully created a computer based demo of this system using a basic http GET request and cloud functions in the GCP. The "device" will ask for its 3 bytes every 10 minutes and receive the data back. The issue here is that the http overhead takes up 200+ bytes so the bandwidth usage will be high over cellular.
I then decided to try out Google cloud Pub/Sub protocol, but quickly realized that it is designed for devices transmitting to the cloud rather than receiving. My best guess is that each device would need its own topic, which would scale horribly?
Does anyone have any advice on a protocol that would work with the cloud (hopefully GCP) that could serve low bandwidth receive only devices? Does the pub/sub structure really not work for this case?

Are websockets a suitable lowest latency and robust real-time communication protocol between two nearby servers in the same AWS Availability Zones?

Suitable technologies I am aware of:
Websockets
Zeromq
Please suggest others if they are a better fit for my problem.
For this use case I have just two machines, the sender and the receiver, and it's important to note they are fixed "nearby" each other, as they will be in the same availability zone on AWS. Answers which potentially relate to message passing over large spans of the internet aren't necessarily applicable. Note also the receiver server isn't queuing these up as tasks, it will just be forwarding select message feeds to website visitors over a websocket. The sending server does a lot of pre-processing and collating to the messages.
The solution needs to:
Be very high throughput. At present the sending server is processing about 10,000 messages per second (written in Rust) without breaking a sweat. Bursty traffic may increase this up to 20,000 or a bit more. I know zeromq can handle this.
Robust. The communication pipe will be open 24/7 365 days per year. My budget is extremely limited in terms of setting up clusters of machines as failovers so I have to do the best I can with two machines.
Message durability isn't required or a concern, the receiving server isn't required to store anything, it just needs all the data. The sender server asynchronously writes a durable 5 second summary of the data to a database and to a cache.
Messages must retain the order in which they are sent.
Low latency. This is very important as the data needs to be as realtime as possible.
A websocket seems to get this job done for 1 to 4. What I don't know is how robust a websocket is for communication that's 24 hours a day 7 days a week. I've observed websocket connections getting dropped online in general (Of course I will write re-connect code, heartbeat mointoring if required but still this concerns me). I also wonder if the high throughput is too much for the websocket.
I have zero experience in this kind of problem but I have a very good websocket library that I'm comfortable using. I ruled out Apache Kafka as it seems expensive to get high throughput, tricky to manage with dev ops (zookeeper) and seems overkill as I don't need durability and it's only communication between 2 machines. So I'm hoping for a simple solution.
It sounds like you are explaining precicely what EC2 cluster placement groups provide: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-placementgroup.html
Edit: You should be able to create the placement group with 2 machines to cover your limited budget. Using larger instances, according to your budget, will also support higher network throughput.
Point 4 looks like it would be supported by SQS FIFO though, despite the fact that SQS FIFO queues only support up to 3,000 messages per second with batching.
A managed streaming solution like Kinesis Data Streams would definitely cover your use case, at scale, much better than a raw web socket. Using Kinesis Client Libraries, you can write your consumer to read from the stream.
AWS also has a Managed Kafka service to rule out the overhead and management of necessary components like Apache ZK: https://aws.amazon.com/msk/

Ethereum - In regard to its TPS

I want to use Ethereum as private chain in a network. But as it would inside a transactional system, performance required is good. Ethereum on the other hand is said to have 15 TPS
What does 15 TPS mean ? That is not a practical performance if I try to use it for payment settlement in a private network
There would be probably 5000 - 6000 transactions per second and later more
How can I then use Ethereum for such a situation ?
15TPS means that at any given second only 15 transaction can be processed by the network not more than that. These transaction can be simple value transfer or any smart contract transaction.
Yes I understand that 15TPS is very low, which makes Ethereum or any other major blockchain implementations to be not very scalable.
But please keep in mind that the Ethereum foundation is actively working to solve this problem this problem by sharded blockchain and other techniques.
So you can assume that in near future the TPS for Ethereum will be around 10000, which is good enough.
But if you want any immediate blockchain which provides such transaction capacities, than you have to look beyond Ethereum.

How much data transfer does a typical http request generate on EC2?

I'm trying to assess what running a crawler off EC2 would cost. This page says data transfer IN is free and data transfer OUT is not. So in case I make an HTTP GET request to some site, with GET header length, say, 200 bytes, and response of 20000 bytes how many bytes of outbound data transfer do actually get billed from my account? Is there a case study, or an explanation of how they measure it?
They will measure it at the boundary between the network and the datacenter. Which means that TCP/IP overhead is probably involved in the packet overall packet size.
I don't think most people are looking at this as cost of processing and storing the data will become a concern long before bandwidth does.

Sitecore ECM Slow to Process/Dispatch

I have a client who is using the ECM and just dispatched an email blast to approximately 18,000 users. The dispatch is taking quite a while (about 2 hours to process not even half of the users).
Has anyone encountered this issue?
Can the ECM not handle such large lists?
As mentioned elsewhere, ECM can handle that load just fine. In general, throughput on ECM is limited by:
Fragmented indices on the "analytics" database (or just limited capacity on same)
Bandwidth limitations. If each mail is 500KB (lots of images), sending 10 mails per second requires 5Mbit bandwidth
CPU on the server in question
From what you've shared so far, I cannot guess as to which of the above is limiting the throughput on your installation. My personal guess would be capacity and speed of the database.
More information here: http://sdn.sitecore.net/upload/sdn5/products/ecm/200/ecm_tuning_guide_20-a4.pdf
I had a similar issue with a client where it was taking hours to send emails. Check the NumberThreads setting in the Sitecore.EmailCampaign.config file. The default is quite low at "1" and most servers should be able to handle more threads.
Definitely follow the tuning guide that Mark posted. The Performance Measurement Tool can help you get the ideal number of sending threads so that you're not over or under utilizing your server.