What is XDQP in MarkLogic Replication - database-replication

https://docs.marklogic.com/guide/database-replication/configuring
I was reading the documentation on Data Replication, the section on security, and it references XDQP, but searching the documents and developer.marklogic.com, I was not able to find anything that describes what XDQP means. Can someone please clarify and point me to documentation with more information?

XDQP is the protocol MarkLogic nodes use to talk to each other.
The name is an acronym for XML Data Query Protocol, if I remember right, but it's evolved to be more than that.
It's an undocumented internal-only protocol.

The most relevant points needed to be considered
Its a TCP/IP based (but not HTTP) protocol and runs on port 7999 by default (changeable)
Multiple sockets are opened to each host for redundancy
All hosts in a cluster need to be able to communicate on that port at all times to all other hosts. The 'hostnames' of each host must resolve independently to an IP address it can be reached by all other hosts in the cluster. (not necessarily the same IP as a client connection)
Therefore any firewall, iptables, routers, network security etc needs to be configured to allow bi directional TCP/IP initiated and received by all hosts in a cluster to all other hosts at a TCP/IP level (not HTTP) without port rewriting or content based filtering/routing.
There is a continual 'heartbeat' synchronizing all servers to the same clock (transaction timestamp) and keeping a consistent state of the 'quorum' and propagating configuration change. If this is interrupted a host will become disconnected from the cluster. If that host has critical data the cluster may stop being fully functional.
Monitoring the traffic patterns (not content) can sometimes be useful in debugging or predicting performance issues or unusual behaviour
Any 'dead periods' between any 2 hosts on this port is an indication of some kind problem, conversely any interruption of networking availability on this port will cause the cluster to rejoin and determine if the subset of hosts accessible by any one host is sufficient to be a 'quorum' (the 'live' part of the cluster) or if the host(s) are the non-active part of a disjoint cluster.

Related

How to handle suspicious access to EC2 servers

FastAPI is running on EC2.
The service is published on 0.0.0.0/0 with a single Port number.
There are multiple accesses with directory names unrelated to its own service.
What should I do in such a case?
Is this a common occurrence and is it something I should be concerned about?
This type of traffic is perfectly normal on the Internet.
In fact, if you were to look at the logs on the network router in your home (which connects you to the Internet), you will see hundreds of such attempts every day.
These requests are coming from automated scripts ('bots') running on the Internet. They attempt to take advantage of known security vulnerabilities to gain access to your systems. This is why it is generally a good idea to keep software up-to-date and to limit the number of ports that are opened to the Internet.
WordPress sites are often targets of bots since people do not keep them updated. You will often see requests in your logs that are trying WordPress vulnerabilities, even though you are not running WordPress. The bots just try everything, everywhere!
For a web server, you need to open ports 80 (HTTP) and 443 (HTTPS), but any other ports should be kept closed, or perhaps only opened to a specific range of IP addresses (eg for your home/office).
What should you do?
Only open ports that are strictly necessary, and limit the IP address range if possible
Keep software updated
Live with it -- it's a fact of life on the Internet

ARP protocol in GCP for two VMs to communicate directly

I have two machines within GCP. Both machines are on the same subnet.
The way I understood, GCP is build on SDN and so there is not traditional switching. In other words, there is no ARP recognition for my two machines to communicate directly to each other omiting default gateway.
Am I right? can you please shed some light onto it?
I am not sure what you mean by:
In other words, there is no ARP recognition for my two machines to
communicate directly to each other omiting default gateway.
ARP and RARP are supported. ARP lookups are handled in kernel software.
Once two systems have communicated together, the operating systems know about each other and the IP to MAC mapping. If the systems have not communicated previously, then MAC lookup takes place which is managed by the VPC network.
VPC networks use Linux's VIRTIO network module to model Ethernet card and router functionality, but higher levels of the networking stack, such as ARP lookups, are handled using standard networking software.
ARP lookup
The instance kernel issues ARP requests and the VPC network issues ARP replies. The mapping between MAC addresses and IP addresses is handled by the instance kernel.
MAC lookup table, IP lookup table, active connection table
These tables are hosted on the underlying VPC network and cannot be inspected or configured.
Advanced VPC concepts

Tcp level Information on Ec2

I'm trying to get TCP timestamp from the packets for clock skewing purposes on my application which is hosted on EC2. In my network I have an ALB.
So my question is how do I get TCP level packet information in my app ? Since ALB filters out all the OSI Layers except application level (HTTP)
If the only reason to get access to TCP packet is to detect timestamp and correct clock drift, I would suggest to configure your EC2 instance to use NTP time server instead.
https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/
That being said, the ALB is not "removing" TCP information from network packets. HTTP connections made to your application are still transported over IP and TCP. If you need low level access to network packets from an app, I would suggest to look at the pCAP library which is used by TCPDUMP and many other tool to capture network traffic on an interface.
https://www.tcpdump.org/
[UPDATED to include comments]
It is important to understand the TCP connection between your client and the ALB is terminated at the ALB level. The ALB creates a second TCP connection to forward HTTP requests to your EC2 instance. The ALB does not remove information from TCP/IP, it just creates a second, independent and new connection. Usually the only information you want to propagate from the initial TCP connection is the source IP address. The ALB, like most load balancers and proxies, captures this information from the original connection (the one received from the client) and embed the information in an HTTP header called X-Forwarded-For.
This is documented at https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/x-forwarded-headers.html
If you want to capture other information from the original connection, I am afraid it will not be possible using ALB. (but I also would be very curious about the use case, i.e. WHAT you're trying to achieve)

Can internal ip addresses of the worker machines be used to connect the worker to rabbitmq queue?

the two machines the "master rabbitmq queue handler" and the "worker machine are servers of same platform (i.e digitalocean)".
Can their internal ips be used to connect them? If yes will it be any faster than the external ip connection?
This is more of a network/sysadmin question question than a programming one but anyway:
As long as one machine can access the other thru it's "internal ip" (I assume you mean a local private ip) and there's nothing blocking connections (firewall etc) on those ips, then yes of course it will work. Just note that because two machines belong to the same hosting company doesn't mean they are connected to the same local network (are they physically in the same datacenter ?).
As wether using the internal ip will or not be faster, it depends on the local network, the ethernet cards etc - it's usually faster when both servers are on the same local network indeed, but it's not garanteed either.

Keeping connection after removing/(change IP address) in security policies in AWS

In AWS console we must specify actual rules in order make it possible to access EC2 instances from remote localizations.
I mean rules like opening some port or access from allowed IP addresses.
And it is working for me now.
I consider following scenario:
Let's assume that we have application A which maitain long running connection and everything is working because
security rules are properly set. Now,
(a) someone remove rules allowing application A connect to EC2 instancs (so external IP address which is used by application A)
(b) at any point external IP address of machine used by application A change.
I consider if it is possible that connection established before occurence (a) or (b) keeps working? If yes, then how is it possible?
Here's a pretty basic explaination to your answers. Ofcourse, there's a lot more information on the matter, but I guess it is not of importance right now.
If you change a rule, let's assume it is a Firewall rule or AWS Security Group rule, the connection will terminate as the rule takes effect immediately.
Simply put, you are sending a stream of information packet by packet, so when the change is detected the packets will no longer be receieved and you will no longer receive a response, i.e. the connection will terminate.
If you change your IP and you are using TCP connections, which I assume you do, they will also terminate as TCP connections are based on IP:Port combinations, BUT if you are using DNS rather than just IP your traffic will be routed correctly, you might experience some downtime, but your service will get back working soon enough.
EDIT: As noted by Michael, the security group change doesn't cut off existing connections. The next time an attempt is made, it will block them.