I am trying to read this input as characters into memory in c in a 2 dimensional array.
00P015
00P116
030000
06P0ZZ
030005
06P1ZZ
04P0ZZ
26P1ZZ
3412ZZ
030010
06P0ZZ
99ZZZZ
030010
06P1ZZ
99ZZZZ
ZZ0000
ZZ0010
My code is
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int pr;
int value;
/*These are variables representing the VM itself*/
char IR[6] ;
short int PC = 0 ;
int P0 ; //these are the pointer registers
int P1 ;
int P2 ;
int P3 ;
int R0 ; //GP regs
int R1 ;
int R2 ;
int R3 ;
int ACC ;
char PSW[2];
char memory [100][6] ; //this is the program memory for first program
short int opcode ; //nice to know what we are doing
int program_line = 0 ;
int fp ;
int i ;
int q = -1; //Used to iterate through memory to execute program
int TrueFalse; //True / False value for check statements, 1 implies true, 0 implies false
int halt = 0;
int address;
char input_line [7] ;
main(int argc, char *argv[])
{ //Read file into VM
fp = open("C:\\Users\\Whiskey Golf\\ClionProjects\\untitled\\program.txt", O_RDONLY) ;
printf("Open is %d\n", fp) ; //always check the return value.
if (fp < 0) //error in read
{printf("Could not open file\n");
exit(0) ;
}
//read in the first line of the program
int charRead = read (fp, input_line, 8 ) ; //returns number of characters read`
printf("\n*******************************\n");
printf("* Reading Program Into Memory *\n");
printf("*******************************\n");
while (1)
{ if (charRead <= 0) //indicates end of file or error
break ; //breaks out of infinite loop
for (i = 0; i < 6 ; i++) //If we get here must have correctly read in a line of program code.
memory[program_line][i] = input_line[i] ; //copy from input line into program memory
printf("Program Line %d: ", program_line) ; //printing out program line for debugging purposes
for(i = 0; i < 6; i++)
printf("%c", memory[program_line][i]) ;
printf("\n") ;
opcode = (memory[program_line][0] -48) *10 ; //Get opcode, print out opcode to console
opcode += (memory[program_line][1] -48) ;
printf("Opcode is %d\n", opcode) ;
charRead = read (fp, input_line, 8) ; //read in next line of code
if(input_line[0] == 'Z') //if the firat character is a 'Z' then you are reading data.
break ; //No more program code so break out of loop
program_line++ ; //now at a new line in the prog
printf("%n");
}
The issue I am having is that when I run the program in the IDE I wrote it in, Clion, my output is correct, I get
Program Line 0: 00P015
Opcode is 0
Program Line 1: 00P116
Opcode is 0
Program Line 2: 030000
Opcode is 3
Program Line 3: 06P0ZZ
Opcode is 6
But when I run the code via a shell via gcc compilation then ./a.out execution, the output I get is
Program Line 0: 00P015
Opcode is 0
Program Line 1: 16
Opcode is -528
Program Line 2: 00
Opcode is -528
Program Line 3: ZZ
Opcode is-528
I have been trying to debug this issue for a while now, and I can not get it to work correctly when I do it through the shell, which is the way I need to do it. Any help would be greatly appreciated.
You are reading 8 bytes which takes the end of line character '\n' and tries to store it in a 7 bytes array.
read (fp, input_line, 8)
this leads to undefined behavrio, and it should be
read(fp, input_line, 7)
And then you could just discard the next byte like
char discard;
read(fp, &discard, 1);
I suppose you was reading 8 bytes to consume the end of line character, so you could have increased the array size to 8 and ignore the last character or simply read it and discard it.
EDIT: Looking closely at the data and your code, I found out that I don't understand what you try to do, you must read just 7 characters, that will include the trailing '\n', the following code will work if and only if there is always a new line '\n' after each line, otherwise it will skip the last line, you should think of the obvious solution yourself. Also, see this comment, if you write the program with a text editor on MS Windows, you will have trouble. To solve that you can just use fopen() instead of low level I/O.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
int
main(void)
{
int file;
ssize_t length;
char buffer[7];
file = open("program.txt", O_RDONLY);
if (file == -1)
return -1;
while ((length = read(file, buffer, sizeof(buffer))) == 0)
{
int opcode;
/* You will need to overwrite the '\n' for the printf() to work
* but you can skip this if you don't want to print the command
*/
buffer[length - 1] = '\0';
opcode = 10 * (buffer[0] - '0') + buffer[1] - '0';
fprintf(stderr, "Command: `%s'\n\topcode: %d\n", buffer, opcode);
}
close(file);
return 0;
}
char input_line [7] ;
int charRead = read (fp, input_line, 8 ) ;
Reads 8 bytes into a 7 byte array, which is bad. It just wrote over some memory after the array, but since the array is 7 bytes and most data is aligned on 4 or 8 byte values, you probably get away with it by not reading data over anything important.
But!!! Here is your data:
00P015<EOL>
00P116<EOL>
030000<EOL>
06P0ZZ<EOL>
030005<EOL>
...
On a Unix-based system where the end of line is one byte, reading 8 bytes will read
00P015<EOL>0
And the next eight bytes will read
0P116<EOL>03
etcetera... So here is your data on drugs:
00P015<EOL>0
0P116<EOL>03
0000<EOL>06P
0ZZ<EOL>0300
05<EOL>...
See what happens? Not what you need or want.
How this could work in the IDE, smurfed if I know, unless the input file is actually a windows text file (two byte end of line mark), but it's playing with fire. I'm going to stick with C and pitch fscanf as an alternative to read. I also stripped out all of the stuff not essential to this example.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
(void) argc; // I'm being pedantic. As pointed out below in the comments, this
// is not needed. Main needs no arguments. I just like them.
(void) argv;
//Read file into VM
// moved all variables into function
char memory [100][6] ; //This is likely program death if you read more than 100
// lines from the file. There are no guards to prevent this
// in the original code.
int opcode ;
int program_line = 0 ;
FILE* fp ; //using a C FILE handle rather than a posix handle for maximum portability
char input_line [8] ;// death if a line is poorly formatted and has extra characters,
// but in this case, the whole program falls apart.
// Went with 8 in case the input file was formatted for Windows.
fp = fopen("asd.txt", "r") ; // using c standard library file open
if (fp == NULL)
{
printf("Could not open file\n");
return 0 ;
}
int itemsRead = fscanf(fp, "%s\n", input_line) ;
//fscanf is a much more tractable reader. This will read one string of characters
// up to the end of line. It will easily and happily run past the end of input_line
// if the line is poorly formatted
// handles a variety of EOL types. and returns the number of the requested
// items read. In this case, one item.
printf("\n*******************************\n");
printf("* Reading Program Into Memory *\n");
printf("*******************************\n");
while (itemsRead == 1 && input_line[0] != 'Z' && program_line < 100)
{ // much better place for the exit conditions. Also added test to prevent
// overrunning memory
for (int i = 0; i < 6 ; i++)
{
memory[program_line][i] = input_line[i] ;
} // this can also be performed with memcpy
printf("Program Line %d: ", program_line) ;
for(int i = 0; i < 6; i++)
{
printf("%c", memory[program_line][i]) ;
} // if we were using properly terminated c-style strings, and we are not,
// this loop and the following printf("\n") could be replaced with
// printf("%s\n", memory[program_line]). As it is putc would be a more
// efficient option
printf("\n") ;
opcode = (memory[program_line][0] -'0') *10 ; // '0' much easier to read than 48
opcode += memory[program_line][1] -'0' ;
printf("Opcode is %d\n", opcode) ;
charRead = fscanf(fp, "%s\n", input_line) ;
program_line++ ;
printf("\n"); // fixed typo
}
}
And in C++, this sucker is trivial
#include <iostream>
#include <fstream>
#include <vector>
int main(int argc, char *argv[])
{
(void) argc; // I'm still being pedantic.
(void) argv;
//Read file into VM
std::vector<std::string> memory;
int opcode;
std::ifstream in("asd.txt");
std::cout << "\n*******************************\n"
<< "* Reading Program Into Memory *\n"
<< "*******************************\n";
std::string input_line;
while (std::getline(in, input_line) && input_line[0] != 'Z')
{
memory.push_back(input_line);
std::cout << input_line << std::endl;
opcode = (input_line[0] - '0') * 10 + input_line[1] - '0';
std::cout << "Opcode is " << opcode << std::endl << std::endl;
}
}
A note on being pedantic. There is this wonderful compiler option called -pedantic It instructs the compiler to do some fairly anally retentive error checking. Add it , -Wall, and -Wextra to your command line. Together they will spot a lot of mistakes. And some stuff that isn't mistakes, but you can't win them all.
Related
I am still a novice when it comes to UNIX and C++, creating a sort of unruly mess.
My task is to create a pipe, fork the process, let the parent process read in characters from a text file, pass those characters through the pipe to the child process, have the child process convert the case of the character from uppercase to lowercase or vice versa, then output the character.
When I run this code I see the following error: (null) Segmentation Fault (Core Dumped)
When I put sleeper print statements into the program, I saw that the program located the file, forked properly, but while the child process began, the parent wouldn't start. Any help is greatly appreciated.
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
int main()
{
FILE* fh;
fh = fopen("data.txt", "r");
int pipeID[2];
pipe(pipeID);
int len;
if (fork() == 0) //this is the filter process
{
char filter[len];
read(pipeID[0], filter, len);
if (filter[0] >= 'a' && filter[0] <= 'z')
filter[0] = filter[0] - 32;
else if (filter[0] >= 'A' && filter[0] <= 'Z')
filter[0] = filter[0] + 32;
printf("%s", filter[0]);
}
else {
char ch;
char* toFilter;
for (ch = getc(fh); ch != EOF; ch = getc(fh)) {
printf("%s", ch);
write(pipeID[1], &ch, len);
}
}
}
Why are you printing characters and using string specifiers?
You are probably accessing some not allowed memory locations..
Try using %c instead of %s.
I see one major problem with several other glitches. Based on your code, I modified like this (still not good), and please see comments prefacing #<num>:
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
int main()
{
FILE* fh;
fh = fopen("data.txt", "r");
int pipeID[2];
pipe(pipeID);
/* int len; */ // #1 `len` is not needed
if (fork() == 0)
{
close(pipeID[1]); // #2-1 usually we close unused end of pipe
char filter[1024];
int read_len;
while ((read_len = read(pipeID[0], filter, sizeof(filter))) > 0) // #3 see below
{
for (int i = 0; i < read_len; ++i)
{
if (filter[i] >= 'a' && filter[i] <= 'z')
filter[i] = filter[i] - 32;
else if (filter[i] >= 'A' && filter[i] <= 'Z')
filter[i] = filter[i] + 32;
printf("%c", filter[i]);
}
}
}
else {
close(pipeID[0]); // #2-2 same as #2-1
char ch;
/* char* toFilter; */ // #4 remove unused variable
while ((ch = getc(fh)) != EOF) { // #5 a while is better than for
printf("%c", ch); // #6 %s -> %c
write(pipeID[1], &ch, sizeof(char)); // #7 len -> sizeof(char)
}
}
}
The biggest problem is in the #3 part. You may think once you write to a pipe, the other end of pipe will immediately read the data. However, you can't rely on exactly one char is written and then read. So you need read as much as possible alternately until an EOF indicating the end of writing. Therefore, I changed the code as #3.
As for other problems, they are not really faulty. I think these are caused by carelessness.
I have a device that sends serial data to my esp8266.
I need to parse that data to my application.
The format of each line I receive from the serial port is like this:
"\x02""Q,328,013.83,N,00,\x03""1D"
where the 1st character is char 2 (start of transmission) and the 3rd from the end is char 3 (end of transmission). The last number ("1C") is the checksum.
The numbers between, are the values I want to parse.
I 've written the following code which works, but I wonder if this the correct way of doing it.
#include <iostream>
#include <stdio.h>
#include <string.h>
uint16_t calcCRC(char* str)
{
uint16_t crc=0; // starting value as you like, must be the same before each calculation
for (uint16_t i=0; i<strlen(str); i++) // for each character in the string
{
crc ^= str[i]; // update the crc value
}
printf("CRC: %X\n", crc);
return crc;
}
int main()
{
char * data= "\x02""Q,328,013.83,N,00,\x03""1D";
const char start = 2; //stx start transmission
const char end = 3; //etx end transmission
char* pos1 = strchr(data, start);
char* pos2 = strchr(data, end);
int p1,p2;
if (pos1 && pos2) {
p1 = pos1-data+1;
p2 = pos2-data+1;
printf ("found at %d, %d\n", p1, p2);
} else
return 0;
char* checksumStr;
checksumStr = strrchr(data, end);
int checksum = 0;
if (checksumStr) {
checksum = strtol(checksumStr+1, NULL, 16);
}
printf("checksum char: |%s| check number %X\n", checksumStr+1, checksum);
char cleanData[25];
strncpy(cleanData, data+p1, p2-p1-1);
cleanData[p2-p1-1] = '\0';
uint16_t crc = calcCRC(cleanData);
printf("Clean data to checksum: |%s|\n", cleanData);
char* addr = strtok(cleanData, ","); // Q
int WindDir = atoi(strtok(NULL, ",")); // 328
float WindSpeed = atof(strtok(NULL, ",")); // 13.83
char* unit = strtok(NULL, ","); // N
char* deviceStatus = strtok(NULL,","); // 00
printf("CRC: %X, Speed %3.2f, dir %d, ", crc, WindSpeed, WindDir);
return(0);
}
Run it here
Thank you !!
Break your code into two parts:
Message extraction – extracts the actual message from buffer and checks the checksum.
Message processing – converts the message to tokens which then can be easily take care of and extracts the information.
Your calling code should look like this:
// extract the message
char msg[ 1024 ];
if ( !extract_message( msg, data ) )
return false;
// process the message
int dir;
double speed;
process_message( msg, dir, speed );
A message extraction function idea:
#define STX '\x02'
#define ETX '\x03'
bool extract_message( char* d, const char* s )
{
// start of text
if ( *s != STX )
return false;
++s;
// actual message
char cs = 0; // checksum
while ( *s && ETX != *s )
{
cs ^= *s;
*d++ = *s++;
}
*(d-1) = '\0';
// end of text
if ( *s != ETX )
return false;
++s;
// sent checksum
char scs = (char)strtol( s, 0, 16 );
//
return scs == cs;
}
A processing function idea:
#define MAX_TOKENS 5
bool process_message(char* buf, int& dir, double& speed) // assumes the message is correct
{
// breaks the input into tokens (see strtok)
const char* delim = ",";
const char* tokens[ MAX_TOKENS ];
int token_count = 0;
for ( char* cursor = strtok( buf, delim ); cursor; cursor = strtok( 0, delim ) )
{
tokens[ token_count ] = cursor;
++token_count;
}
// ...
dir = atoi( tokens[ 1 ] ); // assumes the token is valid
speed = atof( tokens[ 2 ] ); // assumes the token is valid
//
return true;
}
I do not have a handy Arduino compiler, so the code might need some tweaking.
If your code works well, it's fine, but I recommend to you to write C++ style codes. I wrote part of your code in C++. I believe you can understand how to read the rest of your data using istream.
I also hope you perfer using std::string than const char*, use std::iostream to get or put data from and to. There's more flexible way to split strings using std::regex but I think it's too hard to understand currently.
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main(void)
{
string data = "Q,328,013.83,N,00,1D";
istringstream iss(data);
string tmp;
int wind_dir;
float wind_speed;
getline(iss, tmp, ','); // read until ','
iss >> wind_dir;
getline(iss, tmp, ','); // read until ','
iss >> wind_speed;
return 0;
}
It is a little unclear what you are asking. When you say:
I've written the following code which works, ...
There is no way the code as written above can work. You do not have the control characters 2 (ASCII stx - start transmission) or 3 (ASCII etx - end transmission) embedded in data, so your conditional if (pos1 && pos2) tests false and you exit with return 0; (normally return 1; is used to indicate error, that is the value of EXIT_FAILURE)
To make your code work and include stx at the beginning of data and etx at the end, you can do:
char data[32] = "";
char stx[2] = { 2 };
char etx[2] = { 3 };
char *serial= "Q,328,013.83,N,00,1D";
strcat (data, stx);
strcat (data, serial);
strcat (data, etx);
That now provides data in a form where strchr(data, start); and strchr(data, end); will properly locate the start and end of the serial data in the example.
The remainder of your code DOES work, though note all your variables addr, WinDir, WindSpeed, unit and deviceStatus are pointers to address within cleanData and are only valid while cleanData remains in scope. That is fine here as your code ends thereafter, but if you intend to do this within a function and want to return the values, then you will need to provide storage for each and copy each to storage that will remain valid after cleanData has gone out of scope.
Currently addr, unit and deviceStatus are unused in your code. You can write a short printf to output their values to have your code compile cleanly, without warning (you can also cast to (void*), but I'm not positive that is supported on Arduino). Adding a short printf solves the warning issue and gives you a look at all the data parse from cleanData, e.g.
printf ("\naddr : %s\n"
"WinDir : %d\n"
"WindSpeed : %.2f\n"
"unit : %s\n"
"deviceStatus : %s\n\n",
addr, WindDir, WindSpeed, unit, deviceStatus);
With that, your code compiles cleanly and does work (though you need a final '\n' at the end of printf("CRC: %X, Speed %3.2f, dir %d, ", crc, WindSpeed, WindDir); (or a simple putchar ('\n'); for your code to be POSIX compliant.
I'm not sure if the embedded stx and etx were your stumbling blocks or not, that's not 100% clear from the question, but without them the code does not work. Let me know if you have further questions.
After Question Update
If you are looking for an alternative to parsing the data, you could declare addr, unit and devicestatus a short character arrays, e.g. char [16] (or whatever the longest is), and then simply use sscanf to parse data directly, e.g.
if (sscanf (data, "\x02 %15[^,],%d,%f, %15[^,], %15[^,], \x03",
addr, &winddir, &windspeed, unit, devicestatus) != 5) {
fputs ("error parsing data.\n", stderr);
return 1;
}
sscanf can adequately handle the embedded stx and etx. A minimal example using sscanf on data directly could be:
#include <stdio.h>
#define MINSTR 16
int main (void) {
char *data = "\x02""Q,328,013.83,N,00,\x03""1D",
addr[MINSTR],
unit[MINSTR],
devicestatus[MINSTR];
int winddir;
float windspeed;
if (sscanf (data, "\x02 %15[^,],%d,%f, %15[^,], %15[^,], \x03",
addr, &winddir, &windspeed, unit, devicestatus) != 5) {
fputs ("error parsing data.\n", stderr);
return 1;
}
printf ("addr : %s\n"
"windir : %d\n"
"windspeed : %.2f\n"
"unit : %s\n"
"devicestatus : %s\n", addr, winddir, windspeed, unit, devicestatus);
}
Example Use/Output
$ ./bin/windspeed+dir
addr : Q
windir : 328
windspeed : 13.83
unit : N
devicestatus : 00
RRThe title describes it all. I am reading various files in my program, and once it reaches a relatively large file, the program crashes.
I wrote a shortened version of my program that replicates the issue.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <iostream>
#include <fstream>
char** load_File(char** preComputed, const int lines, const int sLength,
std::string fileName){
//Declarations
FILE *file;
int C = lines+1;
int R = sLength+2;
int i; //Dummy index
int len;
//Create 2-D array on the heap
preComputed = (char**) malloc(C*sizeof(char*));
for(i = 0; i<C; i++) preComputed[i] = (char *) malloc(R*sizeof(char));
//Need to free each element individually later on
//Create temprary char array
char* line = (char *) malloc(R*sizeof(char));
assert(preComputed);
//Open file to read and store values
file = fopen(fileName.c_str(), "r");
if(file == NULL){ perror("\nError opening file"); return NULL;}
else{
i = 0;
while(fgets(line, R, file) != NULL){
//Remove next line
len = R;
if((line[len-1]) == '\n') (line[len-1]) = '\0';
len--; // Decrement length by one because of replacing EOL
// with null terminator
//Copy character set
strcpy(preComputed[i], line);
i++;
}
preComputed[C-1] = NULL; //Append null terminator
free(line);
}
return preComputed;
}
int main(void){
char** preComputed = NULL;
std::string name = "alphaLow3.txt";
system("pause");
preComputed = load_File(preComputed, 17576, 3, name);
if(preComputed == NULL){
std::cout<<"\nAn error has been encountered...";
system("PAUSE");
exit(1);
}
//Free preComputed
for(int y = 0; y < 17576; y++){
free(preComputed[y]);
}
free(preComputed);
}
This program will crash when it is executed. Here are two links to the text files.
alphaLow3.txt
alphaLow2.txt
To run alphaLow2.txt, change the numbers in the load_file call to 676 and 2 respectively.
When this program reads alphaLow2.txt, it executes successfully. However, when it read alphaLow3.txt, it crashes. This file is only 172KB. I have files that are a MB or larger. I thought I allocated enough memory, but I may be missing something.
The program is supposed to be in C, but I've included some C++ functions for ease.
Any constructive input is appreciated.
You must confirm your file length.In the alphaLow3.txt file, a total of 35152 lines.But in your program,you set the line 17576.This is the main reason leading to crash.
In addition,this sentence
if((line[len-1]) == '\n') (line[len-1]) = '\0';
fgets will make the last character NULL.For example the first line should be " 'a''a''a''\n''null' ".So you should do it like this.
if((line[len-2]) == '\n') (line[len-2]) = '\0';
Here is my Code
This code is trying to remove special characters like ",',{,},(,) from a .txt file and replace them with blank space.
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <fcntl.h>
#include <iostream>
#include <time.h>
#include <fstream>
using namespace std;
int main(int argc, char *argv[])
{
int fd;
int i;
int j;
int len;
int count = 0;
int countcoma = 0;
int countquote = 0;
char buf[10];
char spec[] = {',','"',':','{','}','(',')','\''};
fd = open(argv[1],O_RDWR,0777);
while (read(fd,buf,10) != 0) {
len = strlen(buf);
for (i=0;i<len;i++) {
for (j=0;j<8;j++) {
if (buf[i]==spec[j]) {
count =1;
countquote=0;
if (j==1) {
if (countcoma == 0) {
countcoma++;
}
if (countcoma == 1) {
countcoma--;
}
}
if ((j==7) && (countcoma ==1)) {
countquote = 1;
}
break;
}
}
//cout<<countquote;
if ((count != 0) && (countquote == 0)) {
buf[i] = ' ';
}
count = 0;
}
lseek(fd, -sizeof(buf), SEEK_CUR);
write(fd,buf,sizeof(buf));
memset(buf,' ',10);
}
return 0;
}
Now i want the single quotes that are inside the double quotes in my file remain untouched, but all the special characters are replaced with space as mentioned in the code.
I want these kind of single quotes to remain untouched "what's" but after i run the file it becomes what s instead of what's
Have a look at regex and other libraries. (When on UNIX type man regex.) You don't have to code this anymore nowadays, there are a zillion libraries that can do this for you.
Ok, so the problem with your code is that you are doing one thing, that you then undo in the next section. In particular:
if (countcoma == 0) {
countcoma++;
}
if (countcoma == 1) {
countcoma--;
}
Follow the logic: We come in with countcoma as zero. So the first if is true, and it gets incremented. It is now 1. Next if says if (countcoma == 1) so it is now true, and we decrement it.
I replaced it with countcoma = !countcoma; which is a much simpler way to say "if it's 0, make it 1, if it's 1, make it 0. You could put anelseon the back of the firstif` to make the same thing.
There are also a whole bunch of stylistic things: For example hard-coded constants, writing back into the original file (means that if there is a bug, you lose the original file - good thing I didn't close the editor window with my sample file...), including half the universe in header files, and figuring which of the spec characters it is based on the index.
It seems to me that your code is suffering from a more general flaw than what has been pointed out before:
char buf[10]; /* Buffer is un-initialized here!! */
while (read(fd,buf,10) != 0) { /* read up to 10 bytes */
len = strlen(buf); /* What happens here if no \0 byte was read? */
...
lseek(fd, -sizeof(buf), SEEK_CUR); /* skip sizeof(buf) = 10 bytes anyway */
write(fd,buf,sizeof(buf)); /* write sizeof(buf) = 10 bytes anyway */
memset(buf,' ',10); /* initialize buf to contain all spaces
but no \0, so strlen will still result in
reading past the array bounds */
I am currently doing some testing with a new addition to the ICU dictionary-based break iterator.
I have code that allows me to test the word-breaking on a text document but when the text document is too large it gives the error: bash: ./a.out: Argument list too long
I am not sure how to edit the code to break-up the argument list when it gets too long so that a file of any size can be run through the code. The original code author is quite busy, would someone be willing to help out?
I tried removing the printing of what is being examined to see if that would help, but I still get the error on large files (printing what is being examined isn't necessary - I just need the result).
If the code could be modified to read the source text file line by line and export the results line by line to another text file (ending up with all the lines when it is done), that would be perfect.
The code is as follows:
/*
Written by George Rhoten to test how word segmentation works.
Code inspired by the break ICU sample.
Here is an example to run this code under Cygwin.
PATH=$PATH:icu-test/source/lib ./a.exe "`cat input.txt`" > output.txt
Encode input.txt as UTF-8.
The output text is UTF-8.
*/
#include <stdio.h>
#include <unicode/brkiter.h>
#include <unicode/ucnv.h>
#define ZW_SPACE "\xE2\x80\x8B"
void printUnicodeString(const UnicodeString &s) {
int32_t len = s.length() * U8_MAX_LENGTH + 1;
char *charBuf = new char[len];
len = s.extract(0, s.length(), charBuf, len, NULL);
charBuf[len] = 0;
printf("%s", charBuf);
delete charBuf;
}
/* Creating and using text boundaries */
int main(int argc, char **argv)
{
ucnv_setDefaultName("UTF-8");
UnicodeString stringToExamine("Aaa bbb ccc. Ddd eee fff.");
printf("Examining: ");
if (argc > 1) {
// Override the default charset.
stringToExamine = UnicodeString(argv[1]);
if (stringToExamine.charAt(0) == 0xFEFF) {
// Remove the BOM
stringToExamine = UnicodeString(stringToExamine, 1);
}
}
printUnicodeString(stringToExamine);
puts("");
//print each sentence in forward and reverse order
UErrorCode status = U_ZERO_ERROR;
BreakIterator* boundary = BreakIterator::createWordInstance(NULL, status);
if (U_FAILURE(status)) {
printf("Failed to create sentence break iterator. status = %s",
u_errorName(status));
exit(1);
}
printf("Result: ");
//print each word in order
boundary->setText(stringToExamine);
int32_t start = boundary->first();
int32_t end = boundary->next();
while (end != BreakIterator::DONE) {
if (start != 0) {
printf(ZW_SPACE);
}
printUnicodeString(UnicodeString(stringToExamine, start, end-start));
start = end;
end = boundary->next();
}
delete boundary;
return 0;
}
Thanks so much!
-Nathan
The Argument list too long error message is coming from the bash shell and is happening before your code even gets started executing.
The only code you can fix to eliminate this problem is the bash source code (or maybe it is in the kernel) and then, you're always going to run into a limit. If you increase from 2048 files on command line to 10,000, then some day you'll need to process 10,001 files ;-)
There are numerous solutions to managing 'too big' argument lists.
The standardized solution is the xargs utility.
find / -print | xargs echo
is a un-helpful, but working example.
See How to use "xargs" properly when argument list is too long for more info.
Even xargs has problems, because file names can contain spaces, new-line chars, and other unfriendly stuff.
I hope this helps.
The code below reads the content of a file whos name is given as the first parameter on the command-line and places it in a str::buffer. Then, instead of calling the function UnicodeString with argv[1], use that buffer instead.
#include<iostream>
#include<fstream>
using namespace std;
int main(int argc, char **argv)
{
std::string buffer;
if(argc > 1) {
std::ifstream t;
t.open(argv[1]);
std::string line;
while(t){
std::getline(t, line);
buffer += line + '\n';
}
}
cout << buffer;
return 0;
}
Update:
Input to UnicodeString should be char*. The function GetFileIntoCharPointer does that.
Note that only the most rudimentary error checking is implemented below!
#include<iostream>
#include<fstream>
using namespace std;
char * GetFileIntoCharPointer(char *pFile, long &lRet)
{
FILE * fp = fopen(pFile,"rb");
if (fp == NULL) return 0;
fseek(fp, 0, SEEK_END);
long size = ftell(fp);
fseek(fp, 0, SEEK_SET);
char *pData = new char[size + 1];
lRet = fread(pData, sizeof(char), size, fp);
fclose(fp);
return pData;
}
int main(int argc, char **argv)
{
long Len;
char * Data = GetFileIntoCharPointer(argv[1], Len);
std::cout << Data << std::endl;
if (Data != NULL)
delete [] Data;
return 0;
}