mmap performance of Amazon ESB - amazon-web-services

I am looking at porting an application to the cloud, more speficially I am looking at Amazon EC2 or Google GCE.
My app heavily uses Linux's mmap to memory map large read-only files and I I would like to understand how mmap would actually work when a file is on the ESB volume.
I would specifically like to know what happens when I call mmap as EBS appears to be a black-box. Also, are the benefits negated?

I can speak for GCE Persistent Disks. It behaves pretty much in the same way a physical disk would. At a high level, pages are faulted in from disk as mapped memory is accessed. Depending on your access pattern these pages might be loaded one by one, or in a larger quantity when readahead kicks in. As the file system cache fills up, old pages are discarded to give space to new pages, writing out dirty pages if needed.
One thing to keep in mind with Persistent Disk is that performance is proportional to disk size. So you'd need to estimate your throughput and IOPS requirements to ensure you get a disk with enough performance for your application. You can find more details here: Persistent disk performance.
Is there any aspect of mmap that you're worried about? I would recommend to write a small app that simulates your workload and test it before deciding to migrate your application.
~ Fabricio.

Related

Recommended minimum system requirements to try out the QuestDB binaries?

I'm trying out QuestDB using the binaries, running them in an Ubuntu container under Proxmox. The docs for the binaries don't say what resources you need, so I guesstimated. Looking at the performance metrics for the container when running some of the CRUD examples with 10,000,000 rows, I still managed to over-provision — by a lot.
Provisioned the container with 4 CPU cores, 4GB RAM & swap, and 8GB SSD. It would probably be fine with a fraction of that: CPU usage during queries is <1%, RAM usage <1.25GB, and storage is <25%.
There is some good info in the capacity planning section of the QuestDB docs (e.g. 8 GB RAM for light workloads), but my question is really about the low end of the scale — what’s the least you can get away with and still be performant when getting started with the examples from the docs?
(I don't mind creating a pull request with this and some other docs additions. Most likely, 2 cores, 2 GB of RAM and 4 GB of storage would be plenty and still give you a nice 'wow, this is quick' factor, with the proviso that this is for evaluation purposes only.)
In QuestDB ingestion and querying are separated by design, meaning if you are planning to ingest medium/high throughput data while running queries, you want to have a dedicated core for ingestion and then another for the shared pool.
The shared pool is used for queries, but also for internal tasks QuestDB needs to run. If you are just running a demo, you probably can do well with just one core for the shared pool, but for production scenarios it is likely you would want to increase that depending on your access patterns.
Regarding disk capacity and memory, it all depends on the size of the data set. QuestDB queries will be faster if the working dataset fits in memory. 2GB of RAM and 4GB of disk storage as you suggested should be more than enough for the examples, but for most production scenarios you would probably want to increase both.

VMWare: About snapshots: do they usually occupy how much% of the disk space source VM? And are they used to downgrade software?

I would like to update the samba on a 3TB NAS. My boss suggested making a clone, however, there is no storage that will fit him whole. If a snapshot of the VM costs a smaller size, and serves to, in case of failure, restore the samba as it was, making it a better idea.
There's no real guide on how much space snapshots occupy. That will greatly depend on the activity on the VM where the snapshot has been taken. If it's an active VM (database or something of the like), there could be a considerable amount of data written. If it's not a very used VM, there could be limited to no data written to the backend datastore.

RDS eating all the swap space

We have been using MariaDB in RDS and we noticed that the swap space is getting increasingly high whithout being recycled. The freeable memory however seems to be fine. Please check the attached files.
Instance type : db.t2.micro
Freeable memory : 125Mb
Swap space : increased by 5Mb every 24h
IOPS : disabled
Storage : 10Gb (SSD)
Soon RDS will eat all the swap space, which will cause lots of issues to the app.
Does anyone have similar issues?
What is the maximum swap space? (didn't find anything in the docs)
Please help!
Does anyone have similar issues?
I had similar issues on different instance types. The trend of swapping stays even if you would switch to higher instance type with more memory.
An explanation from AWS you can find here
Amazon RDS DB instances need to have pages in the RAM only when the pages are being accessed currently, for example, when executing queries. Other pages that are brought into the RAM by previously executed queries can be flushed to swap space if they haven't been used recently. It's a best practice to let the operating system (OS) swap older pages instead of forcing the OS to keep pages in memory. This helps make sure that there is enough free RAM available for upcoming queries.
And the resolution:
Check both the FreeableMemory and the SwapUsage Amazon CloudWatch metrics to understand the overall memory usage pattern of your DB instance. Check these metrics for a decrease in the FreeableMemory metric that occurs at the same time as an increase in the SwapUsage metric. This can indicate that there is pressure on the memory of the DB instance.
What is the maximum swap space?
By enabling Enhanced Monitoring you should be able to see OS metrics, e.g. The amount of swap memory free, in kilobytes.
See details here
Enabling enhanced monitoring in RDS has made things more clear.
Obviously what we needed to watch was Committed Swap instead of Swap Usage. We were able to see how much Free Swap we had.
I now also believe that MySQL is dumping things in swap just because there is too much space in there, even though it wasn't really in urgent need of memory.

Nuodb and HDFS as storage

Using HDFS for Nuodb as storage. Would this have a performance impact?
If I understand correctly, HDFS is better suited for batch mode or write once and read many times, types of application. Would it not increase the latency for record to be fetch in case it needs to read from storage?
On top of that HDFS block size concept, keep the file size small that would increase the network traffic while data is being fetch. Am I missing something here? Please point out the same.
How would Nuodb manage these kind of latency gotchas?
Good afternoon,
My name is Elisabete and I am the Technical Support Engineer over at NuoDB. I believe that I may have just answered this via your post on our own forum, but I'm responding here as well for anyone else who's curious.
First... a mini lesson on NuoDB architecture/layout:
The most basic NuoDB set-up includes:
Broker Agent
Transaction Engine (TE)
Storage Manager (SM) connected to an Archive Directory
Broker Agents keep track of all the moving parts in the domain (collection of machines hosting NuoDB processes) and provide client applications with connection information for the next available Transaction Engine.
Transaction Engines process incoming SQL requests and manage transactions.
Storage Managers read and write data to and from "disk" (Archive Directory)
All of these components can reside on a single machine, but an optimal set up would have them spread across multiple host machines (allowing each process to take full advantage of the host's available CPU/RAM). Also, while it's possible to run with just one of each component, this is a case where more is definitely more. Additional Brokers provide resiliency, additional TE's increase performance/speed and additional SM's ensure durability.
Ok, so now lets talk about Storage:
This is the "Archive Directory" that your storage manager is writing to. Currently, we support three modes of storage:
Local Files System
Amazon Web Services: Simple Storage volume (S3), Elastic Block Storage (EBS)
Hadoop Distributed Files System (HDFS)
So, to elaborate on how NuoDB works with HDFS... it doesn't know about the multiple machines that the HDFS layer is writing to. As far as the SM is concerned, it is reading and writing data atoms to a single directory. The HDFS layer decides how to then distribute and retrieve data to and from the cluster of machines it resides over.
And now to finally address the question of latency:
Here's the thing, whenever we introduce a remote storage device, we inevitably introduce some amount of additional latency because the SM now has further to go when reading/writing atoms to/from memory. HDFS likely adds a bit more, because now it needs to do it's magic divvying up, distributing, retrieving and reassembling data. Add to that discrepancy in network speed, etc.
I imagine that the gained disk space outweighs the cost in travel time, but this is something you'd have to decide on a case by case basis.
Now, all of that said... I haven't mentioned that TE and SM's both have the ability to cache data to local memory. The size of this cache is something you can set, when starting up each process. NuoDB uses a combination of Multi-Version Concurrency Control (MVCC) and a near constant stream of communication between all of the processes, to ensure that data held in cache is kept up to date with all of the changes happening within the system. Garbage Collection also kicks in and clears out atoms in a Least Recently Used order, when the cache grows close to hitting its limit.
All of this helps reduce latency, because the TE's can hold onto the data they reference most often and grab copies of data they don't have from sibling TE's. When they do resort to asking the SM's for data, there's a chance that the SM (or one of its sibling SM's) has a copy of the requested data in local cache, saving itself the trip out to the Archive Directory.
Whew.. that was a lot and I absolutely glossed over more than a few concepts. These topics are covered in greater depth via the new suite of white papers (and the new "green book") available on our main website. I'm currently also working on some visual guides, to help explain all of this.
If you'd like to know more about NuoDB or if I didn't quite answer your question.... please reach out to me directly via the NuoDB Community Forums (I respond to posts there, a bit faster).
Thank you,
Elisabete
Technical Support Engineer at NuoDB

Amazon EC2 and EBS using Windows AMIs

I put our application on EC2 (Windows 2003 x64 server) and attached up to 7 EBS volumes. The app is very I/O intensive to storage -- typically we use DAS with NTFS mount points (usually around 32 mount points, each to 1TB drives) so i tried to replicate that using EBS but the I/O rates are bad as in 22MB/s tops. We suspect the NIC card to the EBS (which are dymanic SANs if i read correctly) is limiting the pipeline. Our app uses mostly streaming for disk access (not random) so for us it works better when very little gets in the way of our talking to the disk controllers and handling IO directly.
Also when I create a volume and attach it, I see it appear in the instance (fine) and then i make it into a dymamic disk pointing to my mount point, then quick format it -- when I do this does all the data on the volume get wiped? Because it certainly seems so when i attach it to another AMI. I must be missing something.
I'm curious if anyone has any experience putting IO intensive apps up on the EC2 cloud and if so what's the best way to setup the volumes?
Thanks!
I've had limited experience, but I have noticed one small thing:
The initial write is generally slower than subsequent writes.
So if you're streaming a lot of data to disk, like writing logs, this will likely bite you. But if you make a big file fill it with data, and do a lot of random access I/O to it, it gets better on the second time writing to any specific location.