C++ ifstream and "umlauts"

C++ ifstream and "umlauts" - c++

I am having an issue with "umlauts" (letters ä, ü, ö, ...) and ifstream in C++.
I use curl to download an html page and ifstream to read in the downloaded file line by line and parse some data out of it. This goes well until I have a line like one of the following:
te="Olimpija Laibach - Tromsö";
te="Burghausen - Münster";
My code parses these lines and outputs it as the following:
Olimpija Laibach vs. Troms?
Burghausen vs. M?nster
Things like outputting umlauts directly from the code work:
cout << "öäü" << endl; // This works fine
My code looks somewhat like this:
ifstream fin("file");
while(!(fin.eof())) {
getline(fin, line, '\n');
int pos = line.find("te=");
if(pos >= 0) {
pos = line.find(" - ");
string team1 = line.substr(4,pos-4);
string team2 = line.substr(pos+3, line.length()-pos-6);
cout << team1 << " vs. " << team2 << endl;
}
}
Edit: The weird thing is that the same code (the only changed things are the source and the delimiters) works for another text input file (same procedure: download with curl, read with ifstream). Parsing and outputting a line like the following is no problem:
<span id="...">Fernwärme Vienna</span>

What's the locale embedded in fin? In the code you show, it would
be the global locale, which if you haven't reset it, is "C".
If you're anywhere outside the Anglo-Saxon world—and the strings
you show suggest that you are— one of the first things you do in
main should be
std::locale::global( std::locale( "" ) );
This sets the global locale (and thus the default locale for any streams
opened later) to the locale being using in the surrounding environment.
(Formally, to an implementation defined native environment, but in
practice, to whatever the user is using.) In "C" locale, the encoding
is almost always ASCII; ASCII doesn't recognize Umlauts, and according
to the standard, illegal encodings in input should be replaces with an
implementation defined character (IIRC—it's been some time since
I've actually reread this section). In output, of course, you're not
supposed to have any unknown characters, so the implementation doesn't
check for them, and the go through.
Since std::cin, etc. are opened before you have a chance to set the
global locale, you'll have to imbue them with std::locale( "" ) specifically.
If this doesn't work, you might have to find some specific locale to
use.

Related

Comparing terms entered by the user to lists in a txt file

I'm want the user to enter in their name, sex, age, medication, and condition. Then look through the text to see if their condition matches any of the other people in the text document, then see if their age, sex, or medication is the same. If it is out put the possible side effect that's also in the text document.
Having trouble getting started since its been so long since I did anything like this. I just need to know the basics of how to read and compare a text document.
Txt doc is laid out like this:
Name Med Sex Age Cond Effect
Bill DepMed M 33 Depression StomachAche
Tom ADDMed M 24 ADD HeadAche

I don't know how "basic" you need, but to read and write files you need to include the header file "fstream". You can read and write files in a variety of ways. One way is to open a file and instead of using cin for input and cout for output you would use the name of the file stream that you opened the file in. Example:
#include <fstream>
int main() {
string input;
fstream dataFile; //names stream 'dataFile' sort of like a variable.
dataFile.open("data.txt", ios::in | ios::out); //opens data.txt for reading (ios::in) and writing (ios::out)
dataFile >> input; //stores data to input exactly like 'cin' would from the screen, but in this case the input is coming from 'dataFile'
getline(dataFile, input, '\n'); //stores data to input exactly like 'cin.getline()' would
dataFile << "String to be added in file" << endl; //prints to file exactly like 'cout' prints to screen
dataFile.close() //closes file, be sure to do this or else you risk memory leak issues
}
Specifically for your question:
Ask the user for one of the columns (you don't need to ask for all of them. Name, condition, or symptom would work best).
Open data file
Use getline(inFile, junk, '\n'); to skip first line (you don't want to search the column titles). junk is a string variable and inFile is your '.txt' file.
Read the next line in the file by using getline() again.
For every line, search the string that was read from the file, searchString, for the string that the user inputted, userInput, using found = searchString.find(userInput, 0). You would have to declare size_t found before the loop.
For every line, check if userInput was found in searchString using if(found != std::string::npos)
If found print `searchString to the screen using 'cout'
Repeat step 4-7 until the end of file is reached
Close the file

Error while reading input from console in D language

I am writing a simple code to input the number of candies and balloons to be brought to a party.
I have written
import std.stdio;
void main()
{
int candiesCount;
readf("%s", &candiesCount);
write("How many balloons are there? ");
int balloonCount;
readf("%s", &balloonCount);
writeln("Got it: There are ", candiesCount, " candies",
" and ", balloonCount, " balloons.");
}
but after entering the number of candies I get this error :
Unexpected '
' when converting from type LockingTextReader to type int
----------------
0x00403B5F
0x004038FF
0x004033AE
0x00402564
0x004024C0
0x00402415
0x0040206A
0x7564173E in BaseThreadInitThunk
0x77C76911 in LdrInitializeThunk
0x77C768BD in LdrInitializeThunk
Pls Help me as I am new to this language.

This stumped me for awhile as well. Andrei explains that readf is very picky about the input matching the format string.
You just need to add \n to the end your format strings. I think this is because you press enter to submit the input, but I'm not entirely sure (I am still new to this language as well).
It should look something like this:
readf("%s\n", &candiesCount);
...
readf("%s\n", &balloonCount);

The error is happening due to unmatched input. The reason for that is whitespace. To fix that, use
readf(" %s", &candiesCount); // notice the space before %s
Adding a space before %s skips whitespace characters.
For more details, check this page (which happens to have a very similar example of yours for some reason): http://ddili.org/ders/d.en/input.html

Ada: omit newline when redirecting to stdout (testing Put)

I am trying to write a test for a method with a simple Ada.Text_IO.Put. For the sake of simplicity, this is a made up method that I want to test:
procedure Say_Something is
begin
Put("Something.");
end Say_Something;
In my AUnit test, I have:
procedure Test_Put (T : in out Test) is
pragma Unreferenced (T);
use Ada.Text_IO;
Stdout : constant File_Type := Standard_Output;
Put_File_Name : constant String := "say_something_test.txt";
Put_File : File_Type;
Expected : constant String := "Something.";
begin
-- Create the output file and redirect output
Create (Put_File, Append_File, Put_File_Name);
Set_Output (Put_File);
Say_Something;
-- Redirect output to stdout and close the file
Set_Output (Stdout);
Close (Put_File);
-- Read file
declare
File_Size : constant Natural :=
Natural (Ada.Directories.Size (Put_File_Name));
Actual : String (1 .. File_Size);
begin
Actual := Read_File (Put_File_Name, File_Size);
Ada.Directories.Delete_File (Put_File_Name);
Assert (Expected = Actual,
"Expected " & '"' & Expected & '"' & ", " &
"Got " & '"' & Actual & '"');
end;
end Test_Put;
function Read_File (File_Name : String; File_Size : Natural)
return String is
subtype File_String is String (1 .. File_Size);
package File_String_IO is new Ada.Direct_IO (File_String);
File : File_String_IO.File_Type;
Contents : File_String;
begin
File_String_IO.Open (File, File_String_IO.In_File, File_Name);
File_String_IO.Read (File, Contents);
File_String_IO.Close (File);
return Contents;
end Read_File;
Unfortunately, the result is:
FAIL Test Vectors.Put
Expected "Something.", Got "Something.
"
It seems that Ada automagically adds a newline to the end of the file. I realize that I could add a (CR)LF to my expected string like this:
Expected : constant String := "Something.";
& Ada.Characters.Latin_1.CR
& Ada.Characters.Latin_1.LF;
but a) It does not feel right to alter my expected string and b) This will run on a Windows machine, but on Unix/Linux/Mac I would have to drop the "CR". In other words, the success of my test run is platform dependent while my code is not, which is bad.
So my question is: how can I write to the file without appending a newline? Other suggestions on how to test for output are highly welcome as well.
I have seen this related question but couldn't deduce any useful information from it apart that I might try the Append_File instead of the Out_File mode, which did not resolve my issue.

Sorry for my previous answer, I missed that the file you were reading was one you created earlier in the program.
In Ada.Text_IO, the RM (A.10(7-8)) says "the end of a file is marked by the combination of a line terminator immediately followed by a page terminator and then a file terminator", and "The actual nature of terminators is not defined by the language and hence depends on the implementation" ... "they are not necessarily implemented as characters or as sequences of characters". So when you create say_something_test.txt, it will always end with a "line terminator" although that doesn't necessarily mean it will end with an LF. That's implementation-dependent. The only thing you're guaranteed is that if you use Ada.Text_IO to create a file, it will work correctly if you read it back in with Ada.Text_IO. But if you want this level of control of the actual bytes written to the file, then Ada.Text_IO would not really be suitable; you'd be better off using Ada.Stream_IO.
Whether an CR and/or LF is written to the end of the file is implementation-dependent. It looks like GNAT does add LF (and maybe CR) to the end, and it doesn't provide a way (such as a Form parameter) to turn this behavior off. At least I didn't see one in the manual.
If you're really determined to use Ada.Text_IO to write say_something_test.txt and Ada.Direct_IO to read it back in, then you need to be aware that the file may or may not contain CR/LF, and the input routine should check for those characters and strip them off so that the string can be compared to the expected value.

You're using Text_IO for the output, but Direct_IO when reading it back in. You shouldn't mix them like that since they do different things. In your simple example, all output is test, so I recommend you use Text_IO to read it back in in your test as well.

How do I parse weeks-in-year with boost::date_time?

I want to parse strings that consist of a 4-digit year and the week number within the year. I've followed the boost date/time IO tutorial, producing a test example like this:
std::string week_format = "%Y-W%W";
boost::date_time::date_input_facet<boost::gregorian::date, char> week_facet = boost::date_time::date_input_facet<boost::gregorian::date, char>(week_format);
std::stringstream input_ss;
input_ss.imbue(locale(input_ss.getloc(), &week_facet));
std::string input_week = "2004-W34";
input_ss.str(input_week);
boost::gregorian::date input_date;
input_ss >> input_date;
Unfortunately, input_date just prints as "2004-01-01", implying that it just parsed the year. What am I doing wrong? Is %W not available on input? (The documentation doesn't mark it as such.)

You are correct that the documentation doesn't mark it as such in the "Format Flags" section (no "!" next to it...)
http://www.boost.org/doc/libs/1_35_0/doc/html/date_time/date_time_io.html#date_time.format_flags
But that seems to be an oversight. Because in Boost's format_date_parser.hpp there is no coverage for this case in parse_date...you can look at the switch statement and see that:
http://svn.boost.org/svn/boost/trunk/boost/date_time/format_date_parser.hpp
Despite the absence of any code to do it, even the comments in the source say it handles %W and %U on parse input. What's up with that? :-/
On another note, I believe week_facet needs to be dynamically allocated in your example:
std::string week_format = "%Y-W%W";
boost::date_time::date_input_facet<boost::gregorian::date, char>* week_facet =
new boost::date_time::date_input_facet<boost::gregorian::date, char>(
week_format
);
std::stringstream input_ss;
input_ss.imbue(std::locale(input_ss.getloc(), week_facet));
(Or at least I had to do it that way to keep the example from crashing.)

String extraction

Currently I am working very basic game using the C++ environment. The game used to be a school project but now that I am done with that programming class, I wanted to expand my skills and put some more flourish on this old assignment.
I have already made a lot of changes that I am pleased with. I have centralized all the data into folder hierarchies and I have gotten the code to read those locations.
However my problem stems from a very fundamental flaw that has been stumping me.
In order to access the image data that I am using I have used the code:
string imageLocation = "..\\DATA\\Images\\";
string bowImage = imageLocation + "bow.png";
The problem is that when the player picks up an item on the gameboard my code is supposed to use the code:
hud.addLine("You picked up a " + (*itt)->name() + "!");
to print to the command line, "You picked up a Bow!". But instead it shows "You picked up a ..\DATA\Images\!".
Before I centralized my data I used to use:
name_(item_name.substr(0, item_name.find('.')))
in my Item class constructor to chop the item name to just something like bow or candle. After I changed how my data was structured I realized that I would have to change how I chop the name down to the same simple 'bow' or 'candle'.
I have changed the above code to reflect my changes in data structure to be:
name_(item_name.substr(item_name.find("..\\DATA\\Images\\"), item_name.find(".png")))
but unfortunately as I alluded to earlier this change of code is not working as well as I planned it to be.
So now that I have given that real long winded introduction to what my problem is, here is my question.
How do you extract the middle of a string between two sections that you do not want? Also that middle part that is your target is of an unknown length.
Thank you so very much for any help you guys can give. If you need anymore information please ask; I will be more than happy to upload part or even my entire code for more help. Again thank you very much.

In all honeasty, you're probably approaching this from the wrong end.
Your item class should have a string "bow", in a private member. The function Item::GetFilePath would then (at runtime) do "..\DATA\Images\" + this->name + ".png".
The fundamental property of the "bow" item object isn't the filename bow.png, but the fact that it's a "bow". The filename is just a derived proerty.

Assuming I understand you correctly, the short version of your question is: how do I split a string containing a file path so I have removed the path and the extension, leaving just the "title"?
You need the find_last_of method. This gets rid of the path:
std::size_type lastSlash = filePath.find_last_of('\\');
if (lastSlash == std::string::npos)
fileName = filePath;
else
fileName = filePath.substr(lastSlash + 1);
Note that you might want to define a constant as \\ in case you need to change it for other platforms. Not all OS file systems use \\ to separate path segments.
Also note that you also need to use find_last_of for the extension dot as well, because filenames in general can contain dots, throughout their paths. Only the very last one indicates the start of the extension:
std::size_type lastDot = fileName.find_last_of('.');
if (lastDot == std::string::npos)
{
title = fileName;
}
else
{
title = fileName.substr(0, lastDot);
extension = fileName.substr(lastDot + 1);
}
See http://msdn.microsoft.com/en-us/library/3y5atza0(VS.80).aspx

using boost filesystem:
#include "boost/filesystem.hpp"
namespace fs = boost::filesystem;
void some_function(void)
{
string imageLocation = "..\\DATA\\Images\\";
string bowImage = imageLocation + "bow.png";
fs::path image_path( bowImage );
hud.addLine("You picked up a " + image_path.filename() + "!"); //prints: You picked up a bow!

So combining Paul's and my thoughts, try something like this (broken down for readability):
string extn = item_name.substr(item_name.find_last_of(".png"));
string path = item_name.substr(0, item_name.find("..\\DATA\\Images\\"));
name_ = item_name.substr( path.size(), item_name.size() - extn.size() );
You could simplify it a bit if you know that item name always starts with "..DATA" etc (you could store it in a constant and not need to search for it in the string)
Edit: Changed extension finding part to use find_last_of, as suggested by EarWicker, (this avoids the case where your path includes '.png' somewhere before the extension)

item_name.find("..\DATA\Images\") will return the index at which the substring "..\DATA\Images\" starts but it seems like you'd want the index where it ends, so you should add the length of "..\DATA\Images\" to the index returned by find.
Also, as hamishmcn pointed out, the second argument to substr should be the number of chars to return, which would be the index where ".png" starts minus the index where "..\DATA\Images\" ends, I think.

One thing that looks wrong is that the second parameter to substr should be the number of chars to copy, not the position.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js