Binary file containing all ASCII characters - c++

Is there a binary file (.dat file) that contains all 256 ASCII characters? I'd like to test it on my Huffman compression algorithm.
When creating a file normally using a text editor like vim, I won't be able to add in the weird characters like control characters, etc. I wonder if anyone knows whether there is such a file ready to use or if I can make one myself.

You can create one with for(i=0;i<256;i++)printf("%c",i);.

Sounds pretty straightforward.
What is the problem?
void MakeFile(void)
{
unsigned char ascii[128] = {0};
for(int i=1; i<ARRAY_SIZE(ascii); ++i)
{
ascii[i] = ascii[i-1]+1;
}
FILE* fp = fopen("all_ascii.txt", "w");
if (fp == NULL)
{
return;
}
fwrite(ascii, ARRAY_SIZE(ascii), 1, fp);
fclose(fp);
}

I don't know of such a file, but you could use the integer representation, or the hex, etc of those characters. You could check to see if they can be represented by using the function isprint(), which will return false if it can't be printed.
So what I would do is use a loop and start at the first character, then use that function to determine whether it can be represented in a character form, or whether you will need to express it some other way.

Take a look at Random.org.
There is a white noise generator, it generates a audiofile, but you may use it for your compression tests.

Related

How to put UTF-8 text into std::string through linux sockets

I made a simple C++ server program, which works just fine as long as I use it with simple tools like telnet, however when I use for example .Net (C#) that would connect to it and send it some strings, the text is somewhat corrupted. I tried multiple encodings on C# side, and only result was that it was corrupted in a different way.
I belive that main problem is in this function that is meant to read a line of text from socket:
std::string Client::ReadLine()
{
std::string line;
while (true)
{
char buffer[10];
read(this->Socket, buffer, 9);
int i = 0;
while (i < 10)
{
if (buffer[i] == '\r')
{
i++;
continue;
}
if (buffer[i] == '\0')
{
// end of string reached
break;
}
if (buffer[i] == '\n')
{
return line;
}
line += buffer[i];
i++;
}
}
return line;
}
This is a simple output of program into terminal, when I send it string "en.wikipedia.org" using telnet I see:
Subscribed to en.wikipedia.org
When I use C# that open a stream writer using this code
streamWriter = new StreamWriter(networkStream, Encoding.UTF8);
I see:
Subscribed to en.wiki,pedia.org,
When I use it without UTF-8 (so that default .net encoding is used, IDK what it is)
streamWriter = new StreamWriter(networkStream);
I see:
Subscribed to en.wiki�pedia.org�
However, in both cases it's wrong. What's a most simple way to achieve this, using just standard C++ and linux libraries? (no boost etc - I can do this using some framework, like Qt, boost etc, but I would like to understand this). Full code #http://github.com/huggle/XMLRCS
A UTF-8 string is just a series of single bytes, basically just wnat std::string is supposed to handle. You have two other problems:
The first is that you don't actually check ho many characters was actually read, you always loop over ten characters. Since you don't loop over the actual number of characters read (and don't check for error or end of connection) you might read data in the buffer beyond what was written by read and you have undefined behavior.
The second problem is kind of related to the first, and that is that you have a buffer of ten characters, you read up to nine characters into the buffer, and then loop over all ten characters in the buffer. The problem with this is that since you only read up to nine characters, the tenth character will always be uninitialized. Because the tenth entry in the buffer is always uninitialized, its value will be indeterminate and reading it will again lead to undefined behavior.

What's the easiest way to parse this data?

For my game, I'm creating a tilemap that has tiles with different numeric ids, such as 1, 28, etc. This map data is saved into a .dat file that the user edits, and looks a little bit like this:
0, 83, 7, 2, 4
Now in order for the game to generate the correct tiles, it must see what the map data is obviously. What I want this small parser to do is to skip over whitespace and the commas, and fetch the numeric data. Also, I want it to be in a way that I can get not only single digit id numbers, but 2-3 digits as well (83, etc).
Thanks for any and all help!
Sounds like a job for the strtok function:
#include <stdlib.h>
#include <string.h>
int parseData(char* data, size_t maxAllowed, int* parsedData) {
char * pch;
pch = strtok (data," ,");
int idx = 0;
while (pch != NULL)
{
parsedData[idx++] = atoi(pch); //convert to integer
if (i == maxAllowed) {
break; //reached the maximum allowed
}
pch = strtok (NULL, " ,");
}
return i; //return the number found
}
//E.g.
char data[] ="0, 83, 7, 2, 4";
int parsedData[5];
int numFound = parseData(data,5,parsedData);
The above sample will remove all spaces and commas, returning an integer value for each found along with the total number of elements found.
Reading the file could be done easily using C functions. You could read it either all at once, or chunk by chunk (calling the function for each chunk).
This is CSV parsing, but easy is in the eye of the beholder. Depends on whether you want to "own" the code or use what someone else did. There are two good answers on SO, and two good libraries on a quick search.
How can I read and parse CSV files in C++?
Fast, Simple CSV Parsing in C++
https://code.google.com/p/fast-cpp-csv-parser/
https://code.google.com/p/csv-parser-cplusplus/
The advantage of using a pre-written parser is that when need some other feature it's probably already there.

No methods of read a file seem to work, all return nothing - C++

EDIT: Problem solved! Turns out Windows 7 wont let me read/ write to files without explicitly running as administrator. So if i run as admin it works fine, if i dont i get the weird results i explain below.
I've been trying to get a part of a larger program of mine to read a file.
Despite trying multiple methods(istream::getline, std::getline, using the >> operator etc) All of them return with either /0, blank or a random number/what ever i initialised the var with.
My first thought was that the file didn't exist or couldn't be opened, however the state flags .good, .bad and .eof all indicate no problems and the file im trying to read is certainly in the same directory as the debug .exe and contains data.
I'd most like to use istream::getline to read lines into a char array, however reading lines into a string array is possible too.
My current code looks like this:
void startup::load_settings(char filename[]) //master function for opening a file.
{
int i = 0; //count variable
int num = 0; //var containing all the lines we read.
char line[5];
ifstream settings_file (settings.inf);
if (settings_file.is_open());
{
while (settings_file.good())
{
settings_file.getline(line, 5);
cout << line;
}
}
return;
}
As said above, it compiles but just puts /0 into every element of the char array much like all the other methods i've tried.
Thanks for any help.
Firstly your code is not complete, what is settings.inf ?
Secondly most probably your reading everything fine, but the way you are printing is cumbersome
cout << line; where char line[5]; be sure that the last element of the array is \0.
You can do something like this.
line[4] = '\0' or you can manually print the values of each element in array in a loop.
Also you can try printing the character codes in hex for example. Because the values (character codes) in array might be not from the visible character range of ASCII symbols. You can do it like this for example :
cout << hex << (int)line[i]

C++ Text File, Chinese characters

I have a C++ project which is supposed to add <item> to the beginning of every line and </item > to the end of every line. This works fine with normal English text, but I have a Chinese text file I would like to do this to, but it does not work. I normally use .txt files, but for this I have to use .rtf to save the Chinese text. After I run my code, it becomes gibberish. Here's an example.
{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff31507\deff0\stshfdbch31506\stshfloch31506\stshfhich31506\stshfbi31507\deflang1033\deflangfe1033\themelang1033\themelangfe0\themelangcs0{\fonttbl{\f2\fbidi
\fmodern\fcharset0\fprq1{*\panose
02070309020205020404}Courier
New;}
Code:
int main()
{
ifstream in;
ofstream out;
string lineT, newlineT;
in.open("rawquote.rtf");
if(in.fail())
exit(1);
out.open("itemisedQuote.rtf");
do
{
getline(in,lineT,'\n');
newlineT += "<item>";
newlineT += lineT;
newlineT += "</item>";
if (lineT.length() >5)
{
out<<newlineT<<'\n';
}
newlineT = "";
lineT = "";
} while(!in.eof());
return 0;
}
That looks like RTF, which makes sense as you say this is an rtf file.
Basically, if you dump that file when you open, you'll see it looks like that...
Also, you should revisit your loop
std::string line;
while(getline(in, line, '\n'))
{
// do stuff here, the above check correctly that you have indeed read in a line!
out << "<item>" << line << "</item>" << endl;
}
You can't read the RTF code the same way as plain text as you'll just ignore format tags, etc. and might just break the code.
Try to save your chinese text as a text file using UTF-8 (without BOM) and your code should work. However this might fail if some other UTF-8 encoded character contains essentially a line break (not sure about this part right now), so you should try to do real UTF-8 conversion and read the file using wide chars instead of regular chars (as Chan suggested), which is a little bit tricky using C++.
It's kind of a miracle that this works for non-Chinese text. "\n" is not the line separator in RTF, "\par" is. The odds that more damage is done to the RTF header are certainly greater for Chinese.
C++ is not the best language to tackle this. It is a trivial 5 minute program in C# as long as the file doesn't get too large:
using System;
using System.Windows.Forms; // Add reference
class Program {
static void Main(string[] args) {
var rtb = new RichTextBox();
rtb.LoadFile(args[0], RichTextBoxStreamType.RichText);
var lines = rtb.Lines;
for (int ix = 0; ix < lines.Length; ++ix) {
lines[ix] = "<item>" + lines[ix] + "</item>";
}
rtb.Lines = lines;
rtb.SaveFile(args[0], RichTextBoxStreamType.RichText);
}
}
If C++ is a hard requirement then you'll have to find an RTF parser.
I think you should use 'wchar' for string instead of 'regular char'.
If I'm understanding the objective of this code, your solution is not going to work. A line break in an RTF document does not correspond to a line break in the visible text.
If you can't just use plain text (Chinese characters are not a problem with a valid encoding), take a look at the RTF spec. You'll discover that it is a nightmare. So you're best bet is probably a third-party library that can parse RTF and read it "line" by "line." I have never looked for such a library, so do not have any suggestions off the top of my head, but I'm sure they are out there.

Need help about monitoring txt file and reading new(last) entry(word) from that txt file

This is my first contact with C++.I have to make program that will monitor one .txt or .doc file and read every new(last) entry(word) from it.Only thing that I was able to do by now is to completely read txt file, but that is not the point, I can't even get only last word from txt file so I would really appreciate your help with this.
Thank you all in advance!!!
Not sure if this is homework, and just in case it is I'm trying to avoid spoiling it by "telling to much", and instead point you to the key ideas you could use.
To avoid reading the whole file, you could use first use the seekg method to position the file a certain number of bytes from the end, then perform the "read to the last word" from there.
To perform the "read to the last word" task proper (net of the optimization of not reading the whole file one word at a time, for which see first paragraph) use the >> operator with the std::ifstream as the left operand and a std::string as the right operand: just put this in a while(!thestream.eof()) { ... } so it will keep reading until it has the last word.
BTW, note that reading the text from a .doc file will be orders of magnitude harder than reading it from a text file, unless you can use a suitable ".doc-reading library" (the standard C++ library has no such functionality, per se).
Reading from MS Word from C++ is a tedious task; you'll need to get through the jumble of COM interfaces. Since you are saying it's your first contact with C++, my advice is to concentrate on plain text instead, namely on getting the last line of a plain text file.
I would do something like this. Provide your implementations of ReadFromEnd and FindRightmostLineSeparator, they should be trivial, and initialize the fileSize variable.
int const INITIAL_BUFFER_SIZE = 64;
int bufferSize = INITIAL_BUFFER_SIZE;
char* lastLine = NULL;
std::auto_ptr<char> buffer (new char[buffer_size]);
while(true) {
ReadFromEnd(buffer, buffer_size);
lastLine = FindRightmostLineSeparator(buffer);
if (lastLine == NULL && bufferSize == fileSize)
lastLine = buffer;
if (lastLine)
break;
buffer_size *= 2;
if (buffer_size > fileSize)
bufferSize = fileSize;
buffer.reset(new char[buffer_size]);
}
// lastLine contains the pointer to your last line