Weird results when I iterate through string with "ñ" char - c++

I have a string of characters and one of those possible characters is the letter "ñ". My problem is, my string seems to behave in a weird way when I try to modify it or iterate through the string. For example if I have the code:
std::ifstream infile (argv[1]);
std::string texto_crudo((std::istreambuf_iterator<char>(infile)), std::istreambuf_iterator<char>());
for (int i = 0; i<texto_crudo.length(); i++){
if(es_enie(texto_crudo[i])) {
texto_crudo[i] = '$';
}
}
Where es_enie returns true if texto_crudo[i] = ñ. It seems like in the cell where ñ is located it behaves as if it has two values instead of one.

I managed to find a solution to my exact problem. As Some programmer dude commented, it was because my text was in UTF-8 and I needed to convert it to iso-8859-1 in order to be able to use "ñ" correctly.
Convert string from UTF-8 to ISO-8859-1

Related

Concatenation of strings in C++ (Linux)

I want to concatenate three string in C++.
I have a vector std::vector<std::string> my_list where the filenames are stored. I want to add the directory and filename extension for each of the filenames in order to read binary the information from the file, so i do it like that:
for (int i = 0; i < my_list.size(); i++) {
std::string tmp = prefix + my_list[i] + suffix;
std::ifstream file(tmp.c_str(), std::ifstream::binary);
}
where prefix ist std::string prefix = "directory/" and suffix ist std::string suffix = ".foo".
And it works in Windows. However it doesn't work in Linux.
Suffix overwrites "tmp"-string. It looks like foo/y_file_timestamp instead of out/my_file_timestamp.foo.
What should I do to prevent this overwriting?
The bug is not in the code you showed us.
The problem is that your strings have unexpected characters in them, specifically carriage returns ('\r') that cause the caret to return to the beginning of the line during output of your concatenated string to the terminal window.
Presumably this is a problem caused by careless parsing of input data with Windows-style line endings. You should normalise your data and be sure to strip all line-ending variants during parsing.
Always be sure to check the contents of your strings at the character-level when encountering a problem with string operations.
Thank you #BarryTheHatchet. I forgot to mention that vector my_list was filled this way:
std::string LIST_WITH_DATA = "data/list.scp"
const char* my_data = LIST_WITH_DATA.c_str();
std::ifstream my_file(my_data);
std::string my_line;
while (std::getline(my_file, my_line)) {
my_list.push_back(my_line);
}
data/list.scp looks like:
file1/00001-a
file2/00001-b
file3/00001-c
file4/00001-d
std::getline was my problem.
The solution I found here: Getting std :: ifstream to handle LF, CR, and CRLF?

C++, Trouble with string and int conversion

I know how to convert the string when it's just made up of integers, or it begins with ints. I am trying to convert to an integer when the string starts with a char in the beginning, or middle. I've tried running through a for loop, checking if (isdigit(str[i]) before trying stoi, stringstream, atoi, etc... None of them really work. I have the same problem even without the for loop. I've tried Googling my problem, but no luck. Any suggestions, or anything that I can try?
You have to check character by character if it's a digit or not and, if it is, add it to a new string. In the end, you convert your new string to an int like you would normally. Look at the code below. Hope I could help!
string s = "pc2jjj10";
char temp;
string result;
for (int i = 0; i < s.length(); i++){
temp = s.at(i);
if (isdigit(temp)){
result.push_back(temp);
}
}
int number = stoi(result);

Pull out data from a file and store it in strings in C++

I have a file which contains records of students in the following format.
Umar|Ejaz|12345|umar#umar.com
Majid|Hussain|12345|majid#majid.com
Ali|Akbar|12345|ali#geeks-inn.com
Mahtab|Maqsood|12345|mahtab#myself.com
Juanid|Asghar|12345|junaid#junaid.com
The data has been stored according to the following format:
firstName|lastName|contactNumber|email
The total number of lines(records) can not exceed the limit 100. In my program, I've defined the following string variables.
#define MAX_SIZE 100
// other code
string firstName[MAX_SIZE];
string lastName[MAX_SIZE];
string contactNumber[MAX_SIZE];
string email[MAX_SIZE];
Now, I want to pull data from the file, and using the delimiter '|', I want to put data in the corresponding strings. I'm using the following strategy to put back data into string variables.
ifstream readFromFile;
readFromFile.open("output.txt");
// other code
int x = 0;
string temp;
while(getline(readFromFile, temp)) {
int charPosition = 0;
while(temp[charPosition] != '|') {
firstName[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != '|') {
lastName[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != '|') {
contactNumber[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != endl) {
email[x] += temp[charPosition];
charPosition++;
}
x++;
}
Is it necessary to attach null character '\0' at the end of each string? And if I do not attach, will it create problems when I will be actually implementing those string variables in my program. I'm a new to C++, and I've come up with this solution. If anybody has better technique, he is surely welcome.
Edit: Also I can't compare a char(acter) with endl, how can I?
Edit: The code that I've written isn't working. It gives me following error.
Segmentation fault (core dumped)
Note: I can only use .txt file. A .csv file can't be used.
There are many techniques to do this. I suggest searching StackOveflow for "[C++] read file" to see some more methods.
Find and Substring
You could use the std::string::find method to find the delimiter and then use std::string::substr to return a substring between the position and the delimiter.
std::string::size_type position = 0;
positition = temp.find('|');
if (position != std::string::npos)
{
firstName[x] = temp.substr(0, position);
}
If you don't terminate a a C-style string with a null character there is no way to determine where the string ends. Thus, you'll need to terminate the strings.
I would personally read the data into std::string objects:
std::string first, last, etc;
while (std::getline(readFromFile, first, '|')
&& std::getline(readFromFile, last, '|')
&& std::getline(readFromFile, etc)) {
// do something with the input
}
std::endl is a manipulator implemented as a function template. You can't compare a char with that. There is also hardly ever a reason to use std::endl because it flushes the stream after adding a newline which makes writing really slow. You probably meant to compare to a newline character, i.e., to '\n'. However, since you read the string with std::getline() the line break character will already be removed! You need to make sure you don't access more than temp.size() characters otherwise.
Your record also contains arrays of strings rather than arrays of characters and you assign individual chars to them. You either wanted to yse char something[SIZE] or you'd store strings!

sscanf for this type of string

I'm not quite sure even after reading the documentation how to do this with sscanf.
Here is what I want to do:
given a string of text:
Read up to the first 64 chars or until space is reached
Then there will be a space, an = and then another space.
Following that I want to extract another string either until the end of the string or if 8192 chars are reached. I would also like it to change any occurrences in the second string of "\n" to the actual newline character.
I have: "%64s = %8192s" but I do not think this is correct.
Thanks
Ex:
element.name = hello\nworld
Would have string 1 with element.name and string2 as
hello
world
I do recommend std::regex for this, but apart from that, you should be fine with a little error checking:
#include <cstdio>
int main(int argc, const char *argv[])
{
char s1[65];
char s2[8193];
if (2!=std::scanf("%64s = %8192s", s1, s2))
puts("oops");
else
std::printf("s1 = '%s', s2 = '%s'\n", s1, s2);
return 0;
}
Your format string looks right to me; however, sscanf will not change occurences of "\n" to anything else. To do that you would then need to write a loop that uses strtok or even just a simple for loop evaluating each character in the string and swapping it for whatever character you prefer. You will also need to evaluate the sscanf return value to determine if the 2 strings were indeed scanned correctly. sscanf returns the number of field successfully scanned according to your format string.
#sehe shows the correct usage of sscanf including the check for the proper return value.

How to read a word into a string ignoring a certain character

I am reading a text file which contains a word with a punctuation mark on it and I would like to read this word into a string without the punctuation marks.
For example, a word may be " Hello, "
I would like the string to get " Hello " (without the comma). How can I do that in C++ using ifstream libraries only.
Can I use the ignore function to ignore the last character?
Thank you in advance.
Try ifstream::get(Ch* p, streamsize n, Ch term).
An example:
char buffer[64];
std::cin.get(buffer, 64, ',');
// will read up to 64 characters until a ',' is found
// For the string "Hello," it would stream in "Hello"
If you need to be more robust than simply a comma, you'll need to post-process the string. The steps might be:
Read the stream into a string
Use string::find_first_of() to help "chunk" the words
Return the word as appropriate.
If I've misunderstood your question, please feel free to elaborate!
If you only want to ignore , then you can use getline.
const int MAX_LEN = 128;
ifstream file("data.txt");
char buffer[MAX_LEN];
while(file.getline(buffer,MAX_LEN,','))
{
cout<<buffer;
}
EDIT: This uses std::string and does away with MAX_LEN
ifstream file("data.txt");
string string_buffer;
while(getline(file,string_buffer,','))
{
cout<<string_buffer;
}
One way would be to use the Boost String Algorithms library. There are several "replace" functions that can be used to replace (or remove) specific characters or strings in strings.
You can also use the Boost Tokenizer library for splitting the string into words after you have removed the punctuation marks.