Related
I want to read a txt file and convert two cells from each line to floats.
If I first run:
someString = someString.substr(1, tempLine.size());
And then:
std::stof(someString)
it only converts the first number in 'someString' to a number. The rest of the string is lost.
When I handled the string in my IDE I noticed that copying it and pasting it inside quotation marks gives me "\u00005\u00007\u0000.\u00007\u00001\u00007\u00007\u0000" and not 57.7177.
If I instead do:
std::string someOtherString = "57.7177"
std::stof(someOtherString)
I get 57.7177.
Minimal working example is:
int main() {
std::string someString = "\u00005\u00007\u0000.\u00007\u00001\u00007\u00007\u0000";
float someFloat = std::stof(someString);
return 0;
}
Same problem occurs using both UTF-8 and -16 encoding.
What is happening and what should I do differently? Should I remove the null-characters somehow?
"I want to read a txt file"
What is the encoding of the text file? "Text" is not a encoding. What I suspect is happening is that you wrote code that reads in the file as either UTF8 or Windows-1250 encoding, and stored it in a std::string. From the bytes, I can see that the file is actually UTF16BE, and so you need to read into a std::u16string. If your program will only ever run on Windows, then you can get by with a std::wstring.
You probably have followup questions, but your original question is vague enough that I can't predict what those questions would be.
I have a secret "mission" to write Vigenère cipher with it's analysis with ascii alphabet.
I have some troubles with encrypting text.
There are two kinds of them:
1) If I use whole ascii table, there are some troubles with decrypting text, because i use "system" chars that kills my text (by the way, it is "War and Peace" written by Tolstoy). Should i use it truncated version?
if yes, so - could i do operations from next question with truncated ascii table?
2) I want to have whole my text in one string. I can do it by this:
string s;
string p = "";
ifstream in("text_for_encryption.txt");
while (getline(in, s))
{
p+=s;
p+="\n";
}
"s" is the temporary string, and "p" is the string that has all text from file in it (with endl's and, of course, EOF)
i will make a cycle for "p" which looks like as
while (not eof in p)
{
take first keyword.length() chars from "p"? check every of them for EOF and encrypt them. (they will be deleted from p)
kick them in file "encrypted_text.txt"
}
in pseudocode (yeah, it is shit-like :( ).
so, the question is - how can i compare a string element with eof?
maybe, i can't google good, but i couldn't find the answer for this question.
Thanks in advance for every advice!
Update:
if i will encrypt string-by-string, it wll be easy to get a length of a key by Fridman's method (if the key is quite small).
so i want to encrypt text with endl's for more security
For encrypting, it depends largely on what you want to encrypt,
and what you want to do with the encrypted text. The usual
solution is to encrypt the bytes values (not the characters);
this means that you'll have to read and write the encrypted file
in binary mode, but since it's not meant to be readable anyway,
that's usually not an issue.
For the rest, strings do not have "EOF" characters. In fact,
there is no such thing as an EOF character[1]. (Nor en endl
character, either.) EOF is, in fact, an "event" which occurs
when reading from a stream; in C++, it is, in fact, treated as
a sort of an error. std::istream functions which can return
EOF (e.g. std::istream::get()) return int, and not char,
in order to be able to return an out of band value.
Strings do have a known length. To visit all of the characters
in a string:
for ( std::string::const_iterator current = s.begin();
current != s.end();
++ current ) {
// Do something with *current...
}
(If you have C++11, you can replace
std::string::const_iterator with auto. This is much simpler
to type, but until you master the iterator idioms, it's probably
better to write the type out, to ensure you understand what is
going on.)
[1] Historically, text files have had EOF characters on some
systems. This is not the end of file that you see with
std::istream::get(), but even today, if you open a file in
text mode under Windows, a 0x1A in the file will trigger the end
of file event in the input.
Im coding huffman compression and it works fine for all exteded ascii(0-255) but when I open non text file like mp3 that has somethink like that inside:
ťîxł¸ H…W]`9M ČČ ˇ˘Ł¤Ąxw
it crashes. I tested and it is not because of the size it is because of the input data.
It crashes on file save, heres the code:
for(int i=0;i<=contents.length();i++){
newString +=kod[contents[i]];
}
saveFile("test_nowy.txt", newString);
bool saveFile (string name, string contents)
{
ofstream file;
file.open(name.c_str());
file << contents;
file.close();
}
I also need to say that despite passing all earlier steps(calculating codes etc) the results are wrong. It seems like my program doesn't understand those characters.
You are accessing out the boundary of a string which is undefined behavior.
for(int i=0;i<=contents.length();i++)
^^
should be:
for(int i=0;i<contents.length();i++)
^^
BTW, it will be a good time to learn debugger. Capture the exact point where the program crashes and find out why.
You're accessing negative indices of your kod array. Try
kod[contents[i] & 0xFFu]
See http://ideone.com/2LvmKW
Also fix the overrun that billz spotted.
You say it works for all characters from 0..255, but not for these characters, that would imply you're using something other than an 8 bit char. If so, you're probably indexing outside the range of kod.
I have a file which has text like this:
#1#14#ADEADE#CAH0F#0#0.....
I need to create a code that will find text that follows # symbol, store it to variable and then writes it to file WITHOUT # symbol, but with a space before. So from previous code I will get:
1 14 ADEADE CAH0F 0 0......
I first tried to did it in Python, but files are really big and it takes a really huge time to process file, so I decided to write this part in C++. However, I know nothing about C++ regex, and I'm looking for help. Could you, please, recommend me an easy regex library (I don't know C++ very well) or the well-documented one? It would be even better, if you provide a small example (I know how to perform transmission to file, using fstream, but I need help with how to read file as I said before).
This looks like a job for std::locale and his trusty sidekick imbue:
#include <locale>
#include <iostream>
struct hash_is_space : std::ctype<char> {
hash_is_space() : std::ctype<char>(get_table()) {}
static mask const* get_table()
{
static mask rc[table_size];
rc['#'] = std::ctype_base::space;
return &rc[0];
}
};
int main() {
using std::string;
using std::cin;
using std::locale;
cin.imbue(locale(cin.getloc(), new hash_is_space));
string word;
while(cin >> word) {
std::cout << word << " ";
}
std::cout << "\n";
}
IMO, C++ is not the best choice for your task. But if you have to do it in C++ I would suggest you have a look at Boost.Regex, part of the Boost library.
If you are on Unix, a simple sed 's/#/ /' <infile >outfile would suffice.
Sed stands for 'stream editor' (and supports regexes! whoo!), so it would be well-suited for the performance that you are looking for.
Alright, I'm just going to make this an answer instead of a comment. Don't use regex. It's almost certainly overkill for this task. I'm a little rusty with C++, so I'll not post any ugly code, but essentially what you could do is parse the file one character at a time, putting anything that wasn't a # into a buffer, then writing it out to the output file along with a space when you do hit a #. In C# at least two really easy methods for solving this come to mind:
StreamReader fileReader = new StreamReader(new FileStream("myFile.txt"),
FileMode.Open);
string fileContents = fileReader.ReadToEnd();
string outFileContents = fileContents.Replace("#", " ");
StreamWriter outFileWriter = new StreamWriter(new FileStream("outFile.txt"),
Encoding.UTF8);
outFileWriter.Write(outFileContents);
outFileWriter.Flush();
Alternatively, you could replace
string outFileContents = fileContents.Replace("#", " ");
With
StringBuilder outFileContents = new StringBuilder();
string[] parts = fileContents.Split("#");
foreach (string part in parts)
{
outFileContents.Append(part);
outFileContents.Append(" ");
}
I'm not saying you should do it either of these ways or my suggested method for C++, nor that any of these methods are ideal - I'm just pointing out here that there are many many ways to parse strings. Regex is awesome and powerful and may even save the day in extreme circumstances, but it's not the only way to parse text, and may even destroy the world if used for the wrong thing. Really.
If you insist on using regex (or are forced to, as in for a homework assignment), then I suggest you listen to Chris and use Boost.Regex. Alternatively, I understand Boost has a good string library as well if you'd like to try something else. Just look out for Cthulhu if you do use regex.
You've left out one crucial point: if you have two (or more) consecutive #s in the input, should they turn into one space, or the same number of spaces are there are #s?
If you want to turn the entire string into a single space, then #Rob's solution should work quite nicely.
If you want each # turned into a space, then it's probably easiest to just write C-style code:
#include <stdio.h>
int main() {
int ch;
while (EOF!=(ch=getchar()))
if (ch == '#')
putchar(' ');
else
putchar(ch);
return 0;
}
So, you want to replace each ONE character '#' with ONE character ' ' , right ?
Then it's easy to do since you can replace any portion of the file with string of exactly the same length without perturbating the organisation of the file.
Repeating such a replacement allows to make transformation of the file chunk by chunk; so you avoid to read all the file in memory, which is problematic when the file is very big.
Here's the code in Python 2.7 .
Maybe, the replacement chunk by chunk will be unsifficient to make it faster and you'll have a hard time to write the same in C++. But in general, when I proposed such codes, it has increased the execution's time satisfactorily.
def treat_file(file_path, chunk_size):
from os import fsync
from os.path import getsize
file_size = getsize(file_path)
with open(file_path,'rb+') as g:
fd = g.fileno() # file descriptor, it's an integer
while True:
x = g.read(chunk_size)
g.seek(- len(x),1)
g.write(x.replace('#',' '))
g.flush()
fsync(fd)
if g.tell() == file_size:
break
Comments:
open(file_path,'rb+')
it's absolutely obligatory to open the file in binary mode 'b' to control precisely the positions and movements of the file's pointer;
mode '+' is to be able to read AND write in the file
fd = g.fileno()
file descriptor, it's an integer
x = g.read(chunk_size)
reads a chunk of size chunk_size . It would be tricky to give it the size of the reading buffer, but I don't know how to find this buffer's size. Hence a good idea is to give it a power of 2 value.
g.seek(- len(x),1)
the file's pointer is moved back to the position from which the reading of the chunk has just been made. It must be len(x), not chunk_size because the last chunk read is in general less long than chink_size
g.write(x.replace('#',' '))
writes on the same length with the modified chunk
g.flush()
fsync(fd)
these two instructions force the writing, otherwise the modified chunk could remain in the writing buffer and written at uncontrolled moment
if g.tell() >= file_size: break
after the reading of the last portion of file , whatever is its length (less or equal to chunk_size), the file's pointer is at the maximum position of the file, that is to say file_size and the program must stop
.
In case you would like to replace several consecutive '###...' with only one, the code is easily modifiable to respect this requirement, since writing a shortened chunk doesn't erase characters still unread more far in the file. It only needs 2 files's pointers.
Alright here's the deal, I'm taking an intro to C++ class at my university and am having trouble figuring out how to change the extension of a file. First, what we are suppose to do is read in a .txt file and count words, sentences, vowels etc. Well I got this but the next step is what's troubling me. We are then suppose to create a new file using the same file name as the input file but with the extension .code instead of .txt (in that new file we are then to encode the string by adding random numbers to the ASCII code of each character if you were interested). Being a beginner in programming, I'm not quite sure how to do this. I'm using the following piece of code to at first get the input file:
cout << "Enter filename: ";
cin >> filename;
infile.open(filename.c_str());
I'm assuming to create a new file I'm going to be using something like:
outfile.open("test.code");
But I won't know what the file name is until the user enters it so I can't say "test.txt". So if anyone knows how to change that extenstion when I create a new file I would very much appreciate it!
I occasionally ask myself this question and end up on this page, so for future reference, here is the single-line syntax:
string newfilename=filename.substr(0,filename.find_last_of('.'))+".code";
There are several approaches to this.
You can take the super lazy approach, and have them enter in just the file name, and not the .txt extension. In which case you can append .txt to it to open the input file.
infile.open(filename + ".txt");
Then you just call
outfile.open(filename + ".code");
The next approach would be to take the entire filename including extension, and just append .code to it so you'd have test.txt.code.
It's a bit ambiguous if this is acceptable or not.
Finally, you can use std::string methods find, and replace to get the filename with no extension, and use that.
Of course, if this were not homework but a real-world project, you'd probably do yourself -- as well as other people reading your code -- a favor by using Boost.Filesystem's replace_extension() instead of rolling your own. There's just no functionality that is simple enough that you couldn't come up with a bug, at least in some corner case.
Not to give it away since learning is the whole point of the exercise, but here's a hint.
You're probably going to want a combination of find_last_of and replace.
Here is a few hints. You have a filename already entered - what you want to do is get the part of the filename that doesn't include the extension:
std::string basename(const std::string &filename)
{
// fill this bit in
}
Having written that function, you can use it to create the name of the new file:
std::string codeFile = basename(filename) + ".code";
outFile.open(codeFile);
Pseudo code would be to do something like
outFilename = filename;
<change outFilename>
outfile.open(outFilename);
For changing outFilename, look at strrchr and strcpy as a starting point (might be more appropriate methods -- that would work great with a char* though)
In Windows (at least) you can use _splitpath to dissect the base name from the rest of the pieces, and then reassemble them using your favorite string formatter.
why not using the string method find_last_of() ?
std::string new_filename = filename;
size_type result = new_filename.find_last_of('.');
// Does new_filename.erase(std::string::npos) working here in place of this following test?
if (std::string::npos != result)
new_filename.erase(result);
// append extension:
filename.append(".code");
I would just append ".code" to the filename the user entered. If they entered "test.txt" then the output file would be "test.txt.code". If they entered a file name with no extension, like "test" then the output file would be "test.code".
I use this technique all the time with programs that generate output files and some sort of related logging/diagnostic output. It's simple to implement and, in my opinion, makes the relationships between files much more explicit.
How about using strstr:
char* lastSlash;
char* newExtension = ".code";
ChangeFileExtension(char* filename) {
lastSlash = strstr(filename, ".");
strcpy(lastSlash, newExtension);
}
What you'll need to do is copy the original filename into a new variable where you can change the extension. Something like this:
string outFilename;
size_t extPos = filename.rfind('.');
if (extPos != string::npos)
{
// Copy everything up to (but not including) the '.'
outFilename.assign(filename, 0, extPos);
// Add the new extension.
outFilename.append(".code");
// outFilename now has the filename with the .code extension.
}
It's possible you could use the "filename" variable if you don't need to keep the original filename around for later use. In that case you could just use:
size_t extPos = filename.rfind('.');
if (extPos != string::npos)
{
// Erase the current extension.
filename.erase(extPos);
// Add the new extension.
filename.append(".code");
}
The key is to look at the definition of the C++ string class and understand what each member function does. Using rfind will search backwards through the string and you won't accidentally hit any extensions in folder names that might be part of the original filename (e.g. "C:\MyStuff.School\MyFile.txt"). When working with the offsets from find, rfind, etc., you'll also want to be careful to use them properly when passing them as counts to other methods (e.g. do you use assign(filename, 0, extPos-1), assign(filename, 0, extPos), assign(filename, 0, extPos+1)).
Hope that helps.
size_t pos = filename.rfind('.');
if(pos != string::npos)
filename.replace(pos, filename.length() - pos, ".code");
else
filename.append(".code");
Very Easy:
string str = "file.ext";
str[str.size()-3]='a';
str[str.size()-2]='b';
str[str.size()-1]='c';
cout<<str;
Result:
"file.abc"