limit on string size in c++? - c++

I have like a million records each of about 30 characters coming in over a socket. Can I read all of it into a single string? Is there a limit on the string size I can allocate?
If so, is there someway I can send data over the socket records by record and receive it record by record. I dont know the size of each record until runtime.

To answer your first question: The maximum size of a C++ string is given by string::max_size

std::string::max_size() will tell you the theoretical limit imposed by the architecture your program is running under. Other than that, as long as you have sufficient RAM and/or disk swap space, you can have std::strings of huge size.
The answer to your second question is yes, you can send record by record, moreover you might not be able to send big chunks of data over a socket at once - there are limits on the size of a single send operation. That the size of a single string is not known until runtime is not a problem, it doesn't need to be known at compile time for sending them over a socket. How to actually send those strings record by record depends on what socket/networking library you are using; consult the relevant documentation.

There is no official limit on the size of a string. The software will ask your system for memory and, as long as it gets it, it will be able to add characters to your string.
The rest of your question is not clear.

The only practical limit on string size in c++ is your available memory. That being said, it will be expensive to reallocate your string to the right size as you keep receiving data (assuming you do not know its total size in advance). Normally you would read chunks of the data into a fixed-size buffer and decode it into its naturally shape (your records) as you get it.

The size of a string is only limited by the amount of memory available to the program, it is more of a operating system limitation than a C++ limitation. C++/C strings are null terminated so the string routines will happily process extremely long strings until they find a null.
On Win32 the maximum amount of memory available for data is normally around 2 Gigs.
You can read arbitrarily large amounts of data from a socket, but you must have some way of delimiting the data that you're reading. There must be an end of record marker or length associated with the records that you are reading so use that to parse the records. Do you really want read the data into a string? What happens if your don't have enough free memory to hold the data in RAM? I suspect there is a more efficient way to handle this data, but I don't know enough about the problem.

In theory, no. But don't go allocating 100GB of memory, because the user will probably not have that much RAM. If you are using std::strings then the max size is std::string::npos.

If we are talking about char* You are limited with smth about 2^32 on 32-bit systems and with 2^64 on (surprise) 64-bit ones
Update: This is wrong. See comments

How about send them with different format?
in your server:
send(strlen(szRecordServer));
send(szRecordServer);
in you client:
recv(cbRecord);
alloc(szRecordClient);
recv(szRecordClient);
and repeat this million times.

Related

Create array upto 10^12

I tried to create an array with size upto 10^12 elements in c++. But I can only make array upto 1000001 size. i.e
long long int dp[1000001]
But I want to store data upto 10^12 values in the array. Any Idea how can I implement this in C++ ?
First, you must realize that the size of that array is nearly 8 TB. Does your computer have that much memory? Probably not. In such case, you cannot store that much data in memory, and practically cannot have such a large array.
Any Idea how can I implement this
Instead of an array in memory, you could store the data in the file system... Assuming you have 8 TB free storage. You can use a paging mechanism to read and write small pieces of the file at a time.
The simplest way to implement that in C++ is to use operating system functionality to map the file into the memory. That way the operating system takes care of the paging. There is no standard way to map files into memory in C++, so first step is to figure out what operating system you're using. POSIX standard specifies mmap function for this purpose.
Before doing that however, I recommend considering whether you actually need to store that much data. Perhaps you need a smarter algorithm instead.

Creating and managing a byte buffer in memory, in C and/or C++, that can automatically resize as needed

When programming in C and/or C++, how does one set up byte-buffer in-memory structure, such that it can automatically resize as the situation warrants?
Often, I will want to write some unknown quantity bytes to a buffer, without knowing how much space is needed. I feel like this is a fundamental I/O programming concern – and I don’t know how to approach the problem, let alone solve it.
Specifically, I’m doing this I/O to process image data – the sizes can vary from a few kilobytes on up to hundreds of megabytes, depending on compression settings and (many!) other factors.
My current workaround, in many cases, is to:
open() a write-mode descriptor on a temporary file, and write() my indeterminate quantity of bytes to this file;
then call fsync() and subsequently close() the descriptor;
use stat() to get the size of the file;
re-open() the temporary file in read mode;
and then finally read() the entire file back into a newly allocated, properly-sized buffer.
My question, therefore, is a two-parter: one, how problematic is my workaround? and two: how can I accomplish this task using only in-memory structures?
Nothing wrong with your approach as long as you can make sure the file doesn't change size between steps 3 and 5. It is, actually, the solution that has most probably the best performance.
In case you realize (by counting bytes read vs. buffer size) while reading the file that there is more to read but you run out of buffer space, you can always use realloc to increase the buffer by an arbitrary amount. How much that "arbitrary amount" would be depends on the nature of your application and your expected memory situation. If memory is plenty, you might want to over-allocate by factor 1.5 and realloc to the actual size once you have read the complete file.
Dynamically re-allocating the buffer has, however, a bit of a speed penalty and might not always be possible when you are working with huge buffers and are already tight on memory (most implementations of realloc will temporarily need to hold both the too-small and the re-sized buffer in memory).
Depending on the buffer sizes, your program might also suffer from a performance penalty when resizing the buffer - after all, the contents you already read needs to be copied over to the new, re-sized, buffer.
In C++, you would probably use a vector to do the same thing and may run into the very same problems.
One last method to load large files is memory mapping - But this also has the requirement that you need to know how much space you need.

C++ vector out of memory

I have a very large vector (millions of entries 1024 bytes each). I am exceeding the maximum size of the vector (getting a bad memory alloc exception). I am doing a recursive operation over the vector of items which requires accessing other elements in the vector. The operations need to be done quickly. I am trying to avoid writing to disk for speed reasons. Is there any other way to store this data that would not require writing to disk? If I have to write the data to disk, what would be the most ideal way to do it>
edit for a few more details.
The operations that I am performing on the data set is generating a string recursively based on other data points in the vector. The data is sorted when it is read in. Data sets ranging from 50,000 to 50,000,0000.
The easiest way to solve this problem is to use STXXL. It's a reimplementation of the STL for large structures that transparently writes to disk when the data won't fit in memory.
Your problem cannot be solved as stated and clarified in the comments.
You have requested a way to have a contiguous in-memory buffer of 50,000,000 entries of size 1024 on a 32 bit system.
A 32 bit system has only 4294967296 bytes of addressable memory. You are asking for 51200000000 bytes of addressable memory, or 11.9 times the amount of memory address space on your system.
If you don't require that your data be contiguous and memory-addressable, if you don't require that your data all be in memory at once, or if you relax other requirements, there may be an answer to your problem. Ie, some OSs expose access to a non-memory space of values that corresponds to RAM (there where ways in 8 gig windows systems to use more than 4 gigs of total RAM) through some hacky interface or other.
But as stated, the answer is "no, you cannot do that".
Because your data must be contiguous, and you know how many elements you need to store, just create a std::vector and use the reserve() function to attempt to gain a contiguous block of memory of the required size.
There is very little overhead in storing a vector (just a few pointers to manage the beginning and end). This is as good as you'll be able to do.
If that fails:
add more memory to your machine (may not actually help, if you've run up against addressing or implementation constraints)
switch to a raw array
find a way to reduce the size of your elements
try to find a solution that can tackle the problem in small blocks
That is 1GB of memory (1024KB * 10^6 = 1MB * 10^3 = 1GB). Ideally for a 32 bit machine upto 4GB memory operations can be performed.
To answer your question, try first a normal malloc() call and allocate 1 GB of memory. This should be done without any error.
Also, please paste the exact error msg that you get while using the vector.

Resizable char buffer container type for C++

I'm using libcurl (HTTP transfer library) with C++ and trying to download files from remote HTTP servers. As file is downloaded, my callback function is called multiple times (e.g. every 10 kb) to send me buffer data.
Basically I need something like "string bufer", a data structure to append char buffer to existing string. In C, I allocate (malloc) a char* and then as new buffers come, I realloc and then memcpy so that I can easily copy my buffer to resized array.
In C, there are multiple solutions to achieve this.
I can keep using malloc, realloc, memcpy but I'm pretty sure that they are not recommended in C++.
I can use vector<char>.
I can use stringstream.
My use cases is, I'll append a few thousands of items (chars) at a time, and after it all finishes (download is completed), I will read all of it at once. But I may need options like seek in the future (easy to achieve in array solution (1)) but it is low priority now.
What should I use?
I'd go for stringstream. Just insert into it as you recieve the data, and when you're done you can extract a full std::string from it. I don't see why you'd want to seek into an array? Anyway, if you know the block size, you can calculate where in the string the corresponding block went.
I'm not sure if many will agree with this, but for that use case I would actually use a linked list, with each node containing an arbitrarily large array of char that were allocated using new. My reasoning being:
Items are added in large chunks at a time, one at a time at the back.
I assume this could use quite a large amount of space, so you avoid reallocation events when a vector would otherwise need more space.
Since items are read sequentially, the penalty of link lists being unidirectional doesn't affect you.
Should Seeking through the list become a priority, this wouldn't work though. If it's not a lot of data ultimately, I honestly think a vector would be fine, dispite not being the most efficient structure.
If you just need to append char buffers, you can also simply use std::string and the member function append. On top of that stringstream gives you formatting, functionality, so you can add numbers, padding etc., but from your description you appear not to need that.
I would use vector<char>. But they will all work even with a seek, so your question is really one of style and there are no definitive answers there.
I think I'd use a deque<char>. Same interface as vector, and vector would do, but vector needs to copy the whole data each time an append exceeds its existing capacity. Growth is exponential, but you'd still expect about log N reallocations, where N is the number of equal-sized blocks of data you append. Deque doesn't reallocate, so it's the container of choice in cases where a vector would need to reallocate several times.
Assuming the callback is handed a char* buffer and length, the code to copy and append the data is simple enough:
mydeque.insert(mydeque.end(), buf, buf + len);
To get a string at the end, if you want one:
std::string mystring(mydeque.begin(), mydeque.end());
I'm not exactly sure what you mean by seek, but obviously deque can be accessed by index or iterator, same as vector.
Another possibility, though, is that if you expect a content-length at the start of the download, you could use a vector and reserve() enough space for the data before you start, which avoids reallocation. That depends on what HTTP requests you're making, and to what servers, since some HTTP responses will use chunked encoding and won't provide the size up front.
Create your own Buffer class to abstract away the details of the storage. If I were you I would likely implement the buffer based on std::vector<char>.

How many character can a STL string class can hold?

I need to work with a series of characters. The number of characters can be upto 1011.
In a usual array, it's not possible. What should I use?
I wanted to use gets() function to hold the string. But, is this possible for STL containers?
If not, then what's the way?
Example:
input:
AMIRAHID
output: A.M.I.R.A.H.I.D
How to make this possible if the number of characters lessened to 10^10 in 32-bit machine ?
Thank you in advance.
Well, that's roughly 100GByte of data. No usual string class will be able to hold more than fits into your main memory. You might want to look at STXXL, which is an implementation of STL allowing to store part of the data on disk.
If your machine has 1011 == 93GB of memory then it's probably a 64bit machine, so string will work. Otherwise nothing will help you.
Edited answer for the edited question: In that case you don't really need to store the whole string in memory. You can store only small part of it that fits into the memory.
Just read every character from the input, write it to the output and write a dot after it. Repeat it until you get and EOF on the input. To increase performance you can read and write large chunks of the data but such that still can fit into the memory.
Such algorithms are called online algorithms.
It is possible for an array that large to be created. But not on a 32-bit machine. Switching to STL will likely not help, and is unnecessary.
You need to contemplate how much memory that is, and if you have any chance of doing it at all.
1011 is roughly 100 gigabytes, which means you will need a 64-bit system (and compiler) to even be able to address it.
STL's strings support a max of max_size() characters, so the answer can change with the implementation.
A string suffers from the same problem as an array: *it has to fit in memory.
10^11 characters would take up over 4GB. That's hard to fit into memory on a 32-bit machine which has a 4GB memory space. You either need to split up your data into smaller chunks, and only load a bit of it at a time, or switch to 64-bit, in which case both arrays and strings should be able to hold the data (although it may still be preferable to split it up into multiple smaller strings/arrays
The SGI version of STL has a ROPE class (A rope is a big string, get it).
I am not sure it is designed to handle that much data but you can have a look.
http://www.sgi.com/tech/stl/Rope.html
If all you're trying to do is read in some massive file and write to another file the same data with periods interspersed between each character, why bother reading the whole thing into memory at once? Pick some reasonable buffer size and do it in chunks.