abort action islemi durdur(MS)
abort sequence durdurma dizisi(IBM)
I have a file.txt like above. I want to read this from the file.txt separately. Besides the file.txt I got 2 more turkce.txt and ingilizce.txt
Here is what I want to do :
I want to read from file.txt and separate the words English and Turkish. After that ingilizce.txt become like this
abort action
abort sequence
and turkce.txt like this
islemi durdur(MS)
durdurma dizisi(IBM)
Also, I have multiple columns and 5127 rows. Column numbers can changes each and every row.
Here is a pic of some part of my file.txt
http://i59.tinypic.com/33m0iu8.png
Thank you for your answers.
Update : I solved the problem. The difference between left column's starting of first letter and right column's starting of firs letter are same and it equals 37.
So I use
FILE* fp = fopen("file.txt","r");
char s[256];
fgets(s, 37 , "fp);
You don't say it explicitly, but your file has two fixed-width columns, which you want to separate.
A substring of a string str from a fixed index i to the end can be expressed with pointer arithmetic: str + i or &str[i]. Strings that are not zero-terminated (like your first column) can be printed by specifying a length with printfs precision field, e.g. printf("%.*s", len, str).
A quick and dirty way to print your two columns is:
char line[80];
int col = 36;
while (fgets(line, sizeof(line), in)) {
fprintf(en, "%.*s\n", col, line);
fprintf(tr, "%s", line + col);
printf("\n");
}
This method has some drawbacks: It will print garbage if the string is shorter than your separation width, i.e. if the right column is empty. It also prints the column padding spaces for the left column, which looks untidy. So let's write a function that splits the strings nicely, which we can call like so:
while (fgets(line, sizeof(line), in)) {
char *stren, *strtr;
split_at(line, &stren, &strtr, 36);
fprintf(en, "%s\n", stren);
fprintf(tr, "%s\n", strtr);
}
The function looks like this:
void split_at(char *line, char **left, char **right, int col)
{
char *trim = line;
char *p = line;
*left = line;
*right = line + col;
while (p < *right) {
if (*p == '\0') {
*right = p;
break;
}
if (!isspace(*p)) trim = p + 1;
p++;
}
*trim = '\0';
trim = p;
while (*p) {
if (!isspace(*p)) trim = p + 1;
p++;
}
if (trim) *trim = '\0';
}
This should work for your example data. It will also work for empty left or right columns. It will not work if there is no space between the left and right columns, i.e. when the left and right art are pasted together.
This method will also work only if the code points of the strings have the same length. You haven't said which encoding you use for your data. If you use ISO-8859-9, you will be okay. If you use UTF-8, all non ASCII-codepoints, i.e. the Turkish special characters, will be represented by more than one byte. What looks like a fixed-width column doesn't have a fixed width in its memory representation.
That said, you should be safe as long as your English text is in the left column. English text is made up of only ASCII characters unless you have fancy formatting with typographic quotation marks or some such.
There could be better solutions but here is simple one.
#include <iostream>
#include <fstream>
int main()
{
std::ifstream inFile("file.txt");
std::ofstream outFileT("turkce.txt", std::ios::app);
std::ofstream outFileE("ingilizce.txt", std::ios::app);
std::string a;
std::string b;
for (int i = 0; i < 2; i++) {
inFile >> a >> b;
outFileE << a + " " + b + "\n";
inFile >> a >> b;
outFileT << a + " " + b + "\n";
}
}
I assumed you have two lines but you can determine lines count in the file first.
Related
I'm working on a project where I'm given a file that begins with a header in this format: a1,b3,t11, 2,,5,\3,*4,344,00,. It is always going be a sequence of a single ASCII character followed by an integer separated by a comma with the sequence always ending with 00,.
Basically what I have to do is go through this and put each character/integer pair into a data type I have that takes both of these as parameters and make a vector of these. For example, the header I gave above would be a vector with ('a',1), ('b',3),('t',11),(',',5)(' ',2),('\',3),('*',4),('3',44) as elements.
I'm just having trouble parsing it. So far I've:
Extracted the header from my text file from the first character up until before the ',00,' where the header ends. I can get the header string in string format or as a vector of characters (whichever is easier to parse)
Tried using sscanf to parse the next character and the next int then adding those into my vector before using substrings to remove the part of the string I've already analyzed (this was messy and did not get me the right result)
Tried going through the string as a vector and checking each element to see if it is an integer, a character, or a comma and acting accordingly but this doesn't work for multiple-digit integers or when the character itself is an int
I know I can fairly easily split my string based on the commas but I'm not sure how to do this and still split the integers from the characters while retaining both and accounting for integers that I need to treat as characters.
Any advice or useful standard library or string functions would be greatly appreciated.
One possibility, of many, would be to store the data in a structure. This uses an array of structures but the structure could be allocated as needed with malloc and realloc.
Parsing the string can be accomplished using pointers and strtol which will parse the integer and give a pointer to the character following the integer. That pointer can be advanced to use in the next iteration to get the ASCII character and integer.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define SIZE 100
struct pair {
char ascii;
int integer;
};
int main( void) {
char input[] = "a1,b3,!0,t11, 2,,5,\\3,*4,34400,";
char *pt = input;//start with pt pointing to first character of input
char *end = input;
int each = 0;
int loop = 0;
int length = 0;
struct pair pairs[SIZE] = { { '\0', 0}};
//assuming input will always end in 00, ( or ,00,)
//remove those three ( or 4 ??) characters
length = strlen ( input);
if ( length > 3) {
input[length - 3] = '\0';
}
for ( each = 0; each < SIZE; each++) {
//get the ASCII character and advance one character
pairs[each].ascii = *pt;
pt++;
//get the integer
pairs[each].integer = strtol ( pt, &end, 10);
//end==pt indicates the expected integer is missing
if ( end == pt) {
printf ( "expected an integer\n");
break;
}
//at the end of the string?
if ( *end == '\0') {
//if there are elements remaining, add one to each as one more was used
if ( each < SIZE - 1) {
each++;
}
break;
}
//the character following the integer should be a comma
if ( *end != ',') {
//if there are elements remaining, add one to each as one more was used
if ( each < SIZE - 1) {
each++;
}
printf ( "format problem\n");
break;
}
//for the next iteration, advance pt by one character past end
pt = end + 1;
}
//loop through and print the used structures
for ( loop = 0; loop < each; loop++) {
printf ( "ascii[%d] = %c ", loop, pairs[loop].ascii);
printf ( "integer[%d] = %d\n", loop, pairs[loop].integer);
}
return 0;
}
Another option is to use dynamic allocation.
This also uses sscanf to parse the input. The %n will capture the number of characters processed by the scan. The offset and add variables can then be used to iterate through the input. The last scan will only capture the ascii character and the integer and the return from sscanf will be 2.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct pair {
char ascii;
int integer;
};
int main( void) {
char input[] = "a1,b3,!0,t11, 2,,5,\\3,*4,34400,";
char comma = '\0';
char ascii = '\0';
int integer = 0;
int result = 0;
int loop = 0;
int length = 0;
int used = 0;
int add = 0;
int offset = 0;
struct pair *pairs = NULL;//so realloc will work on first call
struct pair *temp = NULL;
//assuming input will always end in 00, ( or ,00,)
//remove those three ( or 4 ??) characters
length = strlen ( input);
if ( length > 3) {
input[length - 3] = '\0';
}
while ( ( result = sscanf ( &input[offset], "%c%d%c%n"
, &ascii, &integer, &comma, &add)) >= 2) {//the last scan will only get two items
if ( ( temp = realloc ( pairs, ( used + 1) * sizeof ( *pairs))) == NULL) {
fprintf ( stderr, "problem allocating\n");
break;
}
pairs = temp;
pairs[used].ascii = ascii;
pairs[used].integer = integer;
//one more element was used
used++;
//the character following the integer should be a comma
if ( result == 3 && comma != ',') {
printf ( "format problem\n");
break;
}
//for the next iteration, add to offset
offset += add;
}
for ( loop = 0; loop < used; loop++) {
printf ( "ascii[%d] = %c ", loop, pairs[loop].ascii);
printf ( "value[%d] = %d\n", loop, pairs[loop].integer);
}
free ( pairs);
return 0;
}
Since you have figured out that you can just ignore the last 3 characters, using sscanf will be sufficient.
You can use sscanf to read one character (or getch functions), use sscanf to read an integer and finally even ignore one character.
Comment if you are having problems understanding how to do so.
I want to select the first 8 characters of a string using C++. Right now I create a temporary string which is 8 characters long, and fill it with the first 8 characters of another string.
However, if the other string is not 8 characters long, I am left with unwanted whitespace.
string message = " ";
const char * word = holder.c_str();
for(int i = 0; i<message.length(); i++)
message[i] = word[i];
If word is "123456789abc", this code works correctly and message contains "12345678".
However, if word is shorter, something like "1234", message ends up being "1234 "
How can I select either the first eight characters of a string, or the entire string if it is shorter than 8 characters?
Just use std::string::substr:
std::string str = "123456789abc";
std::string first_eight = str.substr(0, 8);
Just call resize on the string.
If I have understood correctly you then just write
std::string message = holder.substr( 0, 8 );
Jf you need to grab characters from a character array then you can write for example
const char *s = "Some string";
std::string message( s, std::min<size_t>( 8, std::strlen( s ) );
Or you could use this:
#include <climits>
cin.ignore(numeric_limits<streamsize>::max(), '\n');
If the max is 8 it'll stop there. But you would have to set
const char * word = holder.c_str();
to 8. I believe that you could do that by writing
const int SIZE = 9;
char * word = holder.c_str();
Let me know if this works.
If they hit space at any point it would only read up to the space.
char* messageBefore = "12345678asdfg"
int length = strlen(messageBefore);
char* messageAfter = new char[length];
for(int index = 0; index < length; index++)
{
char beforeLetter = messageBefore[index];
// 48 is the char code for 0 and
if(beforeLetter >= 48 && beforeLetter <= 57)
{
messageAfter[index] = beforeLetter;
}
else
{
messageAfter[index] = ' ';
}
}
This will create a character array of the proper size and transfer over every numeric character (0-9) and replace non-numerics with spaces. This sounds like what you're looking for.
Given what other people have interpreted based on your question, you can easily modify the above approach to give you a resulting string that only contains the numeric portion.
Something like:
int length = strlen(messageBefore);
int numericLength = 0;
while(numericLength < length &&
messageBefore[numericLength] >= 48 &&
messageBefore[numericLength] <= 57)
{
numericLength++;
}
Then use numericLength in the previous logic in place of length and you'll get the first bunch of numeric characters.
Hope this helps!
So, I am trying to figure out the best/simplest way to do this. For my algorithms class we are supposed read in a string (containing up to 40 characters) from a file and use the first character of the string (data[1]...we are starting the array at 1 and wanting to use data[0] as something else later) as the number of rotations(up to 26) to rotate letters that follow (it's a Caesar cipher, basically).
An example of what we are trying to do is read in from a file something like : 2ABCD and output CDEF.
I've definitely made attempts, but I am just not sure how to compare the first letter in the array char[] to see which number, up to 26, it is. This is how I had it implemented (not the entire code, just the part that I'm having issues with):
int rotation = 0;
char data[41];
for(int i = 0; i < 41; i++)
{
data[i] = 0;
}
int j = 0;
while(!infile.eof())
{
infile >> data[j+1];
j++;
}
for(int i = 1; i < 27; i++)
{
if( i == data[1])
{
rotation = i;
cout << rotation;
}
}
My output is always 0 for rotation.
I'm sure the problem lies in the fact that I am trying to compare a char to a number and will probably have to convert to ascii? But I just wanted to ask and see if there was a better approach and get some pointers in the right direction, as I am pretty new to C++ syntax.
Thanks, as always.
Instead of formatted input, use unformatted input. Use
data[j+1] = infile.get();
instead of
infile >> data[j+1];
Also, the comparison of i to data[1] needs to be different.
for(int i = 1; i < 27; i++)
{
if( i == data[1]-'0')
// ^^^ need this to get the number 2 from the character '2'.
{
rotation = i;
std::cout << "Rotation: " << rotation << std::endl;
}
}
You can do this using modulo math, since characters can be treated as numbers.
Let's assume only uppercase letters (which makes the concept easier to understand).
Given:
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const std::string original_text = "MY DOG EATS HOMEWORK";
std::string encrypted_text;
The loop:
for (unsigned int i = 0; i < original_text.size(); ++i)
{
Let's convert the character in the string to a number:
char c = original_text[i];
unsigned int cypher_index = c - 'A';
The cypher_index now contains the alphabetic offset of the letter, e.g. 'A' has index of 0.
Next, we rotate the cypher_index by adding an offset and using modulo arithmetic to "circle around":
cypher_index += (rotation_character - 'A'); // Add in the offset.
cypher_index = cypher_index % sizeof(letters); // Wrap around.
Finally, the new, shifted, letter is created by looking up in the letters array and append to the encrypted string:
encrypted_text += letters[cypher_index];
} // End of for loop.
The modulo operation, using the % operator, is great for when a "wrap around" of indices is needed.
With some more arithmetic and arrays, the process can be expanded to handle all letters and also some symbols.
First of all you have to cast the data chars to int before comparing them, just put (int) before the element of the char array and you will be okay.
Second, keep in mind that the ASCII table doesn't start with letters. There are some funny symbols up until 60-so element. So when you make i to be equal to data[1] you are practically giving it a number way higher than 27 so the loop stops.
The ASCII integer value of uppercase letters ranges from 65 to 90. In C and its descendents, you can just use 'A' through 'Z' in your for loop:
change
for(int i = 1; i < 27; i++)
to
for(int i = 'A'; i <= 'Z'; i++)
and you'll be comparing uppercase values. The statement
cout << rotation;
will print the ASCII values read from infile.
How much of the standard library are you permitted to use? Something like this would likely work better:
#include <iostream>
#include <string>
#include <sstream>
int main()
{
int rotation = 0;
std::string data;
std::stringstream ss( "2ABCD" );
ss >> rotation;
ss >> data;
for ( int i = 0; i < data.length(); i++ ) {
data[i] += rotation;
}
// C++11
// for ( auto& c : data ) {
// c += rotation;
// }
std::cout << data;
}
Live demo
I used a stringstream instead of a file stream for this example, so just replace ss with your infile. Also note that I didn't handle the wrap-around case (i.e., Z += 1 isn't going to give you A; you'll need to do some extra handling here), because I wanted to leave that to you :)
The reason your rotation is always 0 is because i is never == data[1]. ASCII character digits do not have the same underlying numeric value as their integer representations. For example, if data[1] is '5', it's integer value is actually 49. Hint: you'll need to know these values when handle the wrap-around case. Do a quick google for "ANSI character set" and you'll see all the different values.
Your determination of the rotation is also flawed in that you're only checking data[1]. What happens if you have a two-digit number, like 10?
I'm quite new with programming, and now I'm doing an exercise where I should use a cycle to read 25 first symbols from the file, which contains a string of 25 letters (+spaces if the name is shorter than 25) and two numbers. Example:
Whirlpool machine 324 789.99
as I imagine it should look something like this:
ifstream info("Information.txt");
string str;
int a;
double b;
for(int i = 0; i < 25; i++)
{ // some kind of code to get first 25 symbols into a string.
}
info >> a >> b;
And I just can't seem to find the right code to get 25 characters straight to string. Any suggestions?
An easy way is to use read() to read given number of characters:
int length = 25; // num of chars you want to read
str.resize(length, ' '); // reserve spaces
char* begin = &*str.begin();
info.read(begin, length); // <- read it here
You can use the std::copy_n() algorithm with stream buffer iterators:
std::string str;
std::copy_n(std::istreambuf_iterator<char>(info.rdbuf()),
25, std::back_inserter(str));
An approach that you might be more comfortable with is using get() with a for() loop:
for (char c; str.size() != 25 && info.get(c); )
str += c;
Given the context, I'd read the entire line into a string, using
std::getline, and then extract the substring. Something like:
std::string line;
while ( std::getline( info, line ) ) {
std::string header = line.substr( 0, 25 );
// and later...
std::istringstream rest( line.substr( 25 ) );
int a;
double b;
rest >> a >> b;
// ...
}
In general, when reading line oriented input, read the line,
then use a std::istringstream to parse it. Or, if there are
parts you can use "as is" (as is the case here), use them as is.
I'm working with a dataset of attributes in a text file which looked something like this:
e,x,y,w,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,s,g
e,f,y,y,t,l,f,c,b,w,e,r,s,y,w,w,p,w,o,p,n,y,p
e,b,s,w,t,a,f,c,b,w,e,c,s,s,w,w,p,w,o,p,n,s,g
e,b,s,w,t,a,f,c,b,w,e,c,s,s,w,w,p,w,o,p,k,s,m
e,x,y,n,t,l,f,c,b,w,e,r,s,y,w,w,p,w,o,p,k,y,g
e,b,s,w,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,k,s,g
e,x,f,g,f,n,f,c,n,g,e,e,s,s,w,w,p,w,o,p,n,y,u
e,b,s,y,t,l,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,s,g
Now, I'm trying to figure out how I can easily read characters from a given column (say, the 5th letter on each line, for example). I can't figure out how to do it though. Does anyone know what I might be able to do?
Considering the set of data you're dealing with is only one character and NOT of an arbitrary size, you can deduce that each character is followed by a comma, so
1 character = 2 file spaces NOT counting the desired character
If you wanted to read the 5th line, it would be the (4*2 + 1) spot in the file. You could read the line into a string and find it in the string, or just take a single char from the file each time until you've reached the desired number.
cout << "What column would you like to read from? ";
cin >> num;
int Length = (num - 1) * 2 + 1;
char ColumnLetter;
for(int i = 0; i < Length; i++)
{
inFile >> ColumnLetter;
}
If there is no whitespaces in your data, every symbol is separated by comma, and ends of the string is one symbol "\n", you can do something like that:
#include <iostream>
#include <fstream>
using std::ifstream;
ifstream file;
const int LINE_WIDTH; //number of your chars in line (without commas)
char GetFromFile(int row, int position) //row and position indexes start from 0!
{
file.seekg(row * (LINE_WIDTH * 2) + position * 2);
return file.get();
}
int main()
{
file.open("data.txt", ios::binary);
char c = GetFromFile(10, 3);
file.close();
return 0;
}