Translating Program - c++

I am beginning to write a translator program which will translate a string of text found on a file using parallel arrays. The language to translate is pig Latin. I created a text file to use as a pig latin to English dictionary. I didn't want to use any two dimension arrays; I want to keep the arrays in one dimension.
Basically I want to read a text file written in PigLatin and using the dictionary I created I want to output the translation to English on the command line.
My pseudo-code idea is:
Open the dictionary text file.
Ask the user for the name of the text file written in PigLatin that he/she wants to translate to English
Searching each word on the user's text file and comparing to the Dictionary to then translate the word accordingly. Keep on going until there are no more words to translate.
Show the translated words on the command line screen.
I was thinking on using a parallel arrays, one containing the english translated words and another one containing the pig latin words.
I would like to know how can I manipulate the strings using arrays in C++?
Thank you.

If files will be always translated in one direction (e.g. PigLatin -> English) then it would be easier and more efficient to use std::map to map one string to another:
std::map<std::string, std::string> dictionary;
dictionary["ashtray"] = "trash";
dictionary["underplay"] = "plunder";
And get translated word, just use dictionary[] to lookup (e.g. std::cout << dictionary["upidstay"] << std::endl;)

Pig latin can be translated on the fly.
Just split the words before the first vowel of each word and you won't need a dictionary file. Then concatenate the second part with the first part, delimited with a '-', and add "ay" at the end.
Unless you want to use a dictionary file?

Declaring an array of strings is easy, the same as declaring an array of anything else.
const int MaxWords = 100;
std::string piglatin[MaxWords];
That's an array of 100 string objects, and the array is named piglatin. The strings start out empty. You can fill the array like this:
int numWords = 0;
std::ifstream input("piglatin.txt");
std::string line;
while (std::getline(input, line) && numWords < MaxWords) {
piglatin[numWords] = line;
++numWords;
}
if (numWords == MaxWords) {
std::cerr << "Too many words" << std::endl;
}
I strongly recommend you not use an array. Use a container object, such as std::vector or std::deque, instead. That way, you can load the contents of the files without knowing in advance how big the files are. With the example declaration above, you need to make sure you don't have more than 100 entries in your file, and if there are fewer than 100, then you need to keep track of how many entries in your array are valid.
std::vector<std::string> piglatin;
std::ifstream input("piglatin.txt");
std::string line;
while (std::getline(input, line)) {
piglatin.push_back(line);
}

Related

How can I read CSV file in to vector in C++

I'm doing the project that convert the python code to C++, for better performance. That python project name is Adcvanced EAST, for now, I got the input data for nms function, in .csv file like this:
"[ 5.9358170e-04 5.2773970e-01 5.0061589e-01 -1.3098677e+00
-2.7747922e+00 1.5079222e+00 -3.4586751e+00]","[ 3.8175487e-05 6.3440394e-01 7.0218205e-01 -1.5393494e+00
-5.1545496e+00 4.2795391e+00 -3.4941311e+00]","[ 4.6003381e-05 5.9677261e-01 6.6983813e-01 -1.6515008e+00
-5.1606908e+00 5.2009044e+00 -3.0518508e+00]","[ 5.5172237e-05 5.8421570e-01 5.9929764e-01 -1.8425952e+00
-5.2444854e+00 4.5013981e+00 -2.7876694e+00]","[ 5.2929961e-05 5.4777789e-01 6.4851379e-01 -1.3151239e+00
-5.1559062e+00 5.2229333e+00 -2.4008298e+00]","[ 8.0250458e-05 6.1284608e-01 6.1014801e-01 -1.8556541e+00
-5.0002270e+00 5.2796564e+00 -2.2154367e+00]","[ 8.1256607e-05 6.1321974e-01 5.9887391e-01 -2.2241254e+00
-4.7920742e+00 5.4237065e+00 -2.2534993e+00]
one unit is 7 numbers, but a '\n' after first four numbers,
I wanna read this csv file into my C++ project,
so that I can do the math work in C++, make it more fast.
using namespace std;
void read_csv(const string &filename)
{
//File pointer
fstream fin;
//open an existing file
fin.open(filename, ios::in);
vector<vector<vector<double>>> predict;
string line;
while (getline(fin, line))
{
std::istringstream sin(line);
vector<double> preds;
double pred;
while (getline(sin, pred, ']'))
{
preds.push_back(preds);
}
}
}
For now...my code emmmmmm not working ofc,
I'm totally have no idea with this...
please help me with read the csv data into my code.
thanks
Unfortunately parsing strings (and consequently files) is very tedious in C++.
I highly recommend using a library, ideally a header-only one, like this one.
If you insist on writing it yourself, maybe you can draw some inspiration from this StackOverflow question on how to parse general CSV files in C++.
You could look at getdelim(',', fin, line),
But the other issue will be those quotes, unless you /know/ the file is always formatted exactly this way, it becomes difficult.
One hack I have used in the past that is NOT PERFECT, if the first character is a quote, then the last character before the comma must also be a matching quote, and not escaped.
If it is not a quote then getdelim() some more, but the auto-alloc feature of getdelim means you must use another buffer. In C++ I end up with a vector of all the pieces of getdelim results that then need to be concatenated to make the final string:
std::vector<char*> gotLine;
gotLine.push_back(malloc(2));
*gotLine.back() = fgetch();
gotLine.back()[1] = 0;
bool gotquote = *gotLine.back() == '"'; // perhaps different classes of quote
if (*gotLine.back() != ',')
for(;;)
{
char* gotSub= nullptr;
gotSub=getdelim(',');
gotLine.push_back(gotSub);
if (!gotquote) break;
auto subLen = strlen(gotSub);
if (subLen>1 && *(gotSub-1)=='"') // again different classes of quote
if (sublen==2 || *(gotSub-2)!='\\') // needs to be a while loop
break;
}
Then just concatenate all these string segments back together.
Note that getdelim supports null bytes. If you expect null bytes in the content, and not represented by the character sequences \000 or \# you need to store the actual length returned by getdelim, and use memcpy to concatenate them.
Oh, and if you allow utf-8 extended quotes it gets very messy!
The case this doesn't cover is a string that ends \\" or \\\\". Ideally you need to while count the number of leading backslashes, and accept the quote if the count is even.
Note that this leave the issue of unescaping the quoted content, i.e. converting any \" into ", and \\ into \, etc. Also discarding the enclosing quotes.
In the end a library may be easier if you need to deal with completely arbitrary content. But if the content is "known" you can live without.

C++ retrieve numerical values in a line of string

Here is the content of txt file that i've managed read.
X-axis=0-9
y-axis=0-9
location.txt
temp.txt
I'm not sure whether if its possible but after reading the contents of this txt file i'm trying to store just the x and y axis range into 2 variables so that i'll be able to use it for later functions. Any suggestion? And do i need to use vectors? Here is the code for reading of the file.
string configName;
ifstream inFile;
do {
cout << "Please enter config filename: ";
cin >> configName;
inFile.open(configName);
if (inFile.fail()){
cerr << "Error finding file, please re-enter again." << endl;
}
} while (inFile.fail());
string content;
string tempStr;
while (getline(inFile, content)){
if (content[0] && content[1] == '/') continue;
cout << endl << content << endl;
depends on the style of your file, if you are always sure that the style will remain unchanged, u can read the file character by character and implement pattern recognition stuff like
if (tempstr == "y-axis=")
and then convert the appropriate substring to integer using functions like
std::stoi
and store it
I'm going to assume you already have the whole contents of the .txt file in a single string somewhere. In that case, your next task should be to split the string. Personally, yes, I would recommend using vectors. Say you wanted to split that string by newlines. A function like this:
#include <string>
#include <vector>
std::vector<std::string> split(std::string str)
{
std::vector<std::string> ret;
int cur_pos = 0;
int next_delim = str.find("\n");
while (next_delim != -1) {
ret.push_back(str.substr(cur_pos, next_delim - cur_pos));
cur_pos = next_delim + 1;
next_delim = str.find("\n", cur_pos);
}
return ret;
}
Will split an input string by newlines. From there, you can begin parsing the strings in that vector. They key functions you'll want to look at are std::string's substr() and find() methods. A quick google search should get you to the relevant documentation, but here you are, just in case:
http://www.cplusplus.com/reference/string/string/substr/
http://www.cplusplus.com/reference/string/string/find/
Now, say you have the string "X-axis=0-9" in vec[0]. Then, what you can do is do a find for = and then get the substrings before and after that index. The stuff before will be "X-axis" and the stuff after will be "0-9". This will allow you to figure that the "0-9" should be ascribed to whatever "X-axis" is. From there, I think you can figure it out, but I hope this gives you a good idea as to where to start!
std::string::find() can be used to search for a character in a string;
std::string::substr() can be used to extract part of a string into another new sub-string;
std::atoi() can be used to convert a string into an integer.
So then, these three functions will allow you to do some processing on content, specifically: (1) search content for the start/stop delimiters of the first value (= and -) and the second value (- and string::npos), (2) extract them into temporary sub-strings, and then (3) convert the sub-strings to ints. Which is what you want.

Converting strings to char arrays using cstring

I have a small assignment which partly requires me to take inputs from a file in the form of strings and place them into char arrays so I can check if the string contains any '*' character at the end of it.
I have been able to extract the strings from the files successfully, however i have failed to find a way in which to place them in char arrays so i can process them.
I would be very grateful if someone would let me know how to place a string into char arrays using cstring library. Please keep in mind that the strings are taken from a file and not as user input.
some of the ways i tried is the following:
//Try 1
char CstringArray[] = LineFromFile;
//Try 2
char CstringArray[100] = LineFromFile;
//Try 3
ifstream Test("Test.txt");
Test>>CstringArray;
//Try 4
ifstream Test("Test.txt");
Test>>CstringArray[0];
Thank you very much
Since this is an assignment, your professor will probably not be happy with you using all of C++'s functionality, particularly if you don't understand it, but since it's a one liner I figured I'd tell you how I'd print all strings ending in an asterisks. Given that you have successfully opened the file to ifstream Test you can do:
copy_if(istream_iterator<string>(Test), istream_iterator<string>(), ostream_iterator<string>(cout, " "), [](const auto& i) { return !empty(i) && i.back() == '*'; })
EDIT:
I'm using an istream_iterator to read in each string in Test and istream_iterator, I'm operating on these values immediately, but if you needed to start by saving all the strings to a vector<string> you could also do this: vector<string> CstringArray{ istream_iterator<string>(Test), istream_iterator<string>() }
I'm using an ostream_iterator to directly stream out my selected strings rather than storing them
I'm using copy_if to iterate over all the strings that are streamed in, selecting only those that meet a given criteria
I'm using the lambda: [](const auto& i) { return !empty(i) && i.back() == '*'; } to conditionally select non-empty strings which end with an asterisks character

How to create an inverted index when I've already tokenized my file?

I'm trying to create an inverted index. I'm reading the lines of a text file, the text file has in the first position of each line the id of a document docId and the rest of the line has keywords about this document.
In order to create an inverted index, I first have to tokenize this text file. I did it with a function I wrote, and I store every word in a vector. My only gripe is that I also store the docId as a string in the vector. Here is the header of the tokenize function if you need it:
void tokenize(string& s, char c, vector<string>& v)
Now after tokenizing the file I have to create a function that puts every word in a map, i'm thinking of using an unordered map, in the map every word appears one time. I also have to somehow store the frequency of the word somewhere. I thought that using the docId as a key in the map would be a good idea but then I realized that I can only have one docId which will show me the word, while in my text file a docId has more than one words.
So, how am I going to solve this problem? Where should I begin?
What a mess of a question. Breaking it down, if I understand correctly you have:
doc1 word1a word1b word1c word1d
doc2 word2a word2b word2c
...
You want mappings from words to documents and vice versa. It's hard to tell from your question whether your talk of word "frequency" reflects the same word being a keyword for multiple documents, or whether the description you have of your file format failed to incorporate a needed count for repetitions within each file. Assuming the former:
if (std::ifstream f(filename))
{
std::map<std::string, std::vector<string>> words_in_doc;
std::map<std::string, std::vector<string>> docs_containing_word;
std::string line;
while (getline(f, line))
{
std::istringstream iss(line);
std::string docid, word;
if (line >> docid)
while (line >> word)
{
words_in_doc[docid].push_back(word);
docs_containing_word[word].push_back(docid);
}
}
// do whatever with your data/indices...
}
else
std::cerr << "unable to open input file\n";

sorting strings in a file

I need a solution for sorting of unix pwd file using C++ based on the last name. The format of the file is username, password, uid, gid, name, homedir, shell. All are seperated by colon delimiters. The name field contains first name follwed by last name both seperated by space I am able to sort the values using map and i am posting my code. Can some one suggest me improvements that I can do to my code please. Also I am unable to see the sorted lines in my file.
string line,item;
fstream myfile("pwd.txt");
vector<string> lines;
map<string,int> lastNames;
map<string,int>::iterator it;
if(myfile.is_open())
{
char delim =':';
int count =0;
while(!myfile.eof())
{
count++;
vector<string> tokens;
getline(myfile,line);
istringstream iss(line);
lines.push_back(line);
while(getline(iss,item,delim))
{
tokens.push_back(item);
}
cout<<tokens.size()<<endl;;
size_t i =tokens[4].find(" ");
string temp = tokens[4].substr(i,(tokens[4].size()-i));
cout<<temp<<endl;
lastNames.insert(pair<string,int>(temp,count));
tokens.clear();
}
myfile.seekg(0,ios::beg);
for(it=lastNames.begin();it!=lastNames.end();it++)
{
cout << (*it).first << " => " << (*it).second << endl;
int value=lastNames[(*it).first ];
myfile<<lines[value-1]<<endl;
cout<<lines[value-1]<<endl;
cout<<value<<endl;
}
}
Also I am having problem writing to the file I am unable to see the sorted results.
my problem:
Can someone please explain me why I am unable to see the written results in the file!
Thanks & Regards,
Mousey.
Since the format of the file is fixed
username, password, uid, gid, first name(space)lastname, homedir, shell
Maintain a std::map with key value as string (which will contain last name, and value as line number
Start reading the file line by line, extract the last name (Split the line by "," and then split fifth extracted part on space).
Store the name along with line number in map
When complete file has been read, just output the line numbers as mentioned in map. (Map contains lat names in sorted order)
For splitting a string
Refer to
Split a string in C++?
If it's only a few megabytes, you're can basically slurp it into memory and use the O(n log n) sorting algorithm of your choice to sort it, then write it out.
Basically, write a code snippet to compare two lines the way you want, and use that with your standard library sort routine to sort the data. Or write your own sort routine, whatever.
If you're interested in how you'd go about dealing with gigabytes of data, take a look at Wikipedia's article on External Sorting for a good jumping-off point.