I am having a requirement where in I need to call a C++ application from command line and need to pass a two dimensional array of int type to it. Can anyone please let me know how to do that, and how to interpret it in C++ application using argv parameter
thanks in advance.
In argv you can pass only a one dimensional array, containing strings, it's
char* argv[]
So, you can't really pass 2D array, but you can "simulate" it.
For example, pass 2 parameters, saying what are the sizes of the matrix - number of rows and number of columns and then pass all elements, one by one.
Then parse the arguments in your program, knowing what format you will use.
For example: if you want to pass
1 2 3
4 5 6
you may run your program like this:
./my_program 2 3 1 2 3 4 5 6
This way, you'll know, that argv[1] is the number of rows, argv[2] s the number of columns and them all elements of the 2D array, starting from the upper left corner.
Don't forget, that argv is array, containing char* pointers. In other words, you'll need to convert all parameters ints.
I would recommend passing a file as the only argument. Or data in the same format on stdin as #j_random_hacker suggests. If no human needs to edit it, it could be a binary file. One possible format:
4 bytes = size of first dimension
4 bytes = size of second dimension
4 bytes * size of first * size of second = contents of array
When reading, everything is aligned. Just read every four byte int and interpret as above.
If it needs to be human readable I would do csv or space-delimited. There would be no need to specify the dimensions in that case because each row ends in newline.
Related
I have a text file with one column of numbers. There are about 500 numbers within the column. How would I only read the nth number. For example, is there a way to read and store the 49th number in the column?
if the numbers are fixed size (you don't show a sample file) then you can seek to size * n and read. Otherwise just do a read, parse, count loop till you reach n
If they're stored as text, so the space occupied by each number can vary, you're pretty much stuck with either reading through them until you get to the correct point, or else using a level of indirection--that is, creating an index into the data itself.
For the former, you could (for example) store each number as a 32-bit binary number. On a typical machine that means every number occupies 4 bytes, so getting to the Nth item, you multiply N by 4, seek to that point in the file, and read 4 bytes.
If you want to store the numbers as text, but still support seeking like this, you could pad every number with spaces so they all still take up the same amount of space in the file (e.g., 10 characters for every number).
If you really want to avoid that (or have a pre-defined format so you can't do it), then you could create an index into the file. This makes sense primarily when/if you can justify the cost of reading through the entire file and storing the position of the beginning of each line. There are at least two obvious justifications. One is that the indexing can be separated from usage, such as building the index in a batch at night to optimize use of the data during the day. The other possibility is simply that you use the data file enough that the savings from being able to seek to a specific point outweighs the time to index the file once.
There are less obvious justifications as well though--for example, you might need to meet some (at least soft) real-time constraints, and if you read through the file to find each item, you might not be able to meet those constraints--in particular, the size of file you can process can be limited by the real-time constraints. In this case, an index may be absolutely necessary to meet your requirements, rather than just an optimization.
Adding verbosity to #pm100's answer (perhaps unnecessarily), the fixed size means same ascii count.
001
01
Line 001 takes up 3 bytes while 01 only takes up two.
Thus if your file has numbers formatted like this:
1
2
3
100
10
Using lseek (or fseek) would only work if each column entry has the same number of ASCII chars for each line (as far as I am aware).
Also you need to keep track of the \n character if you go this route too.
lseek(fd, size * n * 2);
You can do it this way:
ifstream fin("example.txt");
std::string line;
for(size_t i = 0; i < n; ++i) {
std::getline(fin, line);
}
// line contains line n;
This question already has answers here:
Simple UTF8->UTF16 string conversion with iconv
(2 answers)
Closed 8 years ago.
In my C++ code, I need to convert Unicode strings to UTF-8 strings using iconv(). Before calling the function, I need to allocate proper buffer size. A couple of examples I have seen overallocate the buffer (for example, two times the length of input string). I am wondering if there is a way to determine the exact size that would be required for the conversion. Regards.
Essentially you want to do two things:
Get the actual code point in each character (if "Unicode" means UTF-16, you need to handle surrogate pairs appropriately)
Determine how many bytes the code point will take up in UTF-8.
I'll assume that you have knowledge of how to do the first step and will focus on the second step:
U+0000..U+007F = 1 byte
U+0080..U+07FF = 2 bytes
U+0800..U+FFFF = 3 bytes
U+10000..U+1FFFFF = 4 bytes*
U+200000..U+3FFFFFF = 5 bytes*
U+4000000..U+7FFFFFFF = 6 bytes*
* UTF-8 can encode 2147483648 code points [0...0x7FFFFFFF], but UTF-16 can only encode the first 1114112 of them [0...0x10FFFF], which are the only ones currently designated. As a result, anything beyond U+10FFFF is pointless at the time of this writing. I included the others for completeness only.
I have a structure that describes a bitmap. It looks like this
struct bitmap {
int XSize;
int YSize;
unsigned char *pData;
};
When an instances of this structure is initialized pData points to thousands of random-like but non-zero bytes. When I print the instance of the structure GDB shows a lot of meaningless bytes. That very time consuming. When the disp of such a variable is active I get the output for each step what delays debugging.
Is there a GDB option that limits the output length?
When the bytes are meaningless I could change the type of pData to void *. But since the structure is used in a precompiled library the type can't be changed. Can the type that GDB uses for print and disp be "overridden"?
As Paul has pointed out the answer in this question gives the correct command to allow unlimited length.
To limit the length you need the command
set print elements n
where n is the maximum number of elements. Setting n to 0 gives the unlimited length.
Setting print elements 4 will limit the number of pData characters to 4, but it will also limit all other strings and arrays, which could be quite annoying (e.g. print filename would produce /tmp... when the actual value is /tmp/foobar.
A possibly better approach is to write a Python pretty-printer for struct bitmap (assuming you have sufficiently recent GDB). See this answer on how to do that.
When I have
char anything[20];
cout << sizeof anything;
it prints 20.
However
string anymore;
cout << sizeof anymore; // it prints 4
getline(cin, anymore); // let's suppose I type more than one hundred characters
cout << sizeof anymore; // it still prints 4 !
I would like to understand how c++ manages this. Thanks
sizeof is a compile-time construct. It has nothing to do with runtime, but rather gives a fixed result based on the type passed to it (or the type of the value passed to it). So char[20] is 20 bytes, but a string might be 4 or 8 bytes or whatever depending on the implementation. The sizeof isn't telling you how much storage the string allocated dynamically to hold its contents.
sizeof is a compile-time operator. It tells you the size of the type.
It's because of anything - is array with 20 characters. Sizeof of each character 1 byte - so, totally 20 bytes.
And string class contain pointer for the begin of char-array and size_t(unsigned int for example) - it's 4 bytes. sizeof doesn't know how many memory you allocated for the string, it know just that you have pointer for something, because it's compile-time function.
sizeof isn't what you have decided that it should be. It doesn't magically perceive the semantics of whatever type you throw at it. All it knows is how many bytes are used up, directly for storing the instance of a type.
For an array of five characters, that's 5. For a pointer (to anything, including an array), that's usually 4 or 8. For an std::string, it's however many bytes your C++ Standard Library implementation happens to need to do its work. This work usually involves dynamic allocation, so the four bytes you're looking likely represent just enough storage for a pointer.
It is not to be confused with specific "size" semantics. For std::string, that's anymore.length(), which uses whatever internal magic is required to calculate the length of the buffer of characters that it's stored somewhere, possibly (and usually) indirectly.
For what it's worth, I'm very surprised that a std::string could take up only four bytes. I'd expect it'd store at least "length" and a pointer, which is usually going to take more than four bytes.
The 'string' type is a class template. String instances are accessed by references, AKA pointers. 4 bytes on 32-bit systems.
In http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.6, it is wriiten that
"Another valid approach would be to define a "byte" as 9 bits, and simulate a char* by two words of memory: the first could point to the 36-bit word, the second could be a bit-offset within that word. In that case, the C++ compiler would need to add extra instructions when compiling code using char* pointers."
I couldn't understand what it meant by "simulating char* by two words" and further quote.
Could somebody please explain it by giving an example ?
I think this is what they were describing:
The PDP-10 referenced in the second paragraph had 36-bit words and was unable to address anything inside of those words. The following text is a description of one way that this problem could have been solved while fitting within the restrictions of the C++ language spec (that are included in the first paragraph).
Let's assume that you want to make 9-bit-long bytes (for some reason). By the spec, a char* must be able to address individual bytes. The PDP-10 can't do this, because it can't address anything smaller than a 36-bit word.
One way around the PDP-10's limitations would be to simulate a char* using two words of memory. The first word would be a pointer to the 36-bit word containing the char (this is normally as precise as the PDP-10's pointers allow). The second word would indicate an offset (in bits) within that word. Now, the char* can access any byte in the system and complies with the C++ spec's limitations.
ASCII-art visual aid:
| Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 |
-------------------------------------------------------------------------
| Word 1 | Word 2 |
| (Address) | (Offset) |
-------------------------------------------------------------------------
Say you had a char* with word1 = 0x0100 and word2 = 0x12. This would point to the 18th bit (the start of the third byte) of the 256th word of memory.
If this technique was really used to generate a conforming C++ implementation on the PDP-10, then the C++ compiler would have to do some extra work with juggling the extra bits required by this rather funky internal format.
The whole point of that article is to illustrate that a char isn't always 8 bits. It is at least 8 bits, but there is no defined maximum. The internal representation of data types is dependent on the platform architecture and may be different than what you expect.
Since the C++ spec says that a char* must point to individual bytes, and the PDP-6/10 does not allow addressing individual bytes in a word, you have a problem with char* (which is a byte pointer) on the PDP-6/10
So one work around is: define a byte as 9 bits, then you essentially have 4 bytes in a word (4 * 9 = 36 bits = 1 word).
You still can't have char* point to individual bytes on the PDP-6/10, so instead have char* be made up of 2 36-bit words. The lower word would be the actual address, and the upper word would be some byte-mask magic that the C++ compiler could use to point to the right 9bits in the lower word.
In this case,
sizeof(*int) (36bits) is different than sizeof(*char) (72bits).
It's just a contrived example that shows how the spec doesn't constrain primatives to specific bit/byte sizes.
data: [char1|char2|char3|char4]
To access char1:
ptrToChar = &data
index = 0
To access char2:
ptrToChar = &data
index = 9
To access char3:
ptrToChar = &data
index = 18
...
then to access a char, you would:
(*ptrToChar >> index) & 0x001ff
but ptrToChar and index would be saved in some sort of structure that the compiler creates so they would be associated with each other.
Actually, the PDP-10 can address (load, store) 'bytes', smaller than a (36-bit) word, with a single word pointer. On th -10, a byte pointer includes the word address containing the 'byte', the width (in bits) of the 'byte', and the position (in bits from the right) of the 'byte' within the word. Incrementing the pointer (with an explicit increment, or increment and load/deposit instruction), increments the position part (by the size part) and, handles overflow to the next word address. (No decrementing, though.) A byte pointer can e.g. address individual bits, but 6, 8, 9, 18(!) were probably common, as there were specially-formatted versions of byte pointers (global byte pointers) that made their use somewhat easier.
Supposing a PDP-10 implementation wanted to get as close to having 8-bit bytes as possible. The most reasonable to split up a 36-bit word (the smallest unit of memory that the machine's assembly langauge can address) is to divide the word up into four 9-bit bytes. To access a particular 9-bit byte, you need to know which word it's in (you'd use the machine's native addressing mode for that, using a pointer which takes up one word), and you'd need extra data to indicate which of the 4 bytes inside the word was the one you're interested. This extra data would be stored in a second machine word. The compiler would generate lots of extra instructions to use that extra data to pull the right byte out of the word, using the extra data stored in the second word.