Concatenating Two FASTA Files in C++ - c++

I have two FASTA files:
file1.fasta
>foo
ATCGGGG
>bar
CCCCCC
file2.fasta
>qux
ATCGGAAA
What I want to do now is to concatenating them into one file that results:
>foo
ATCGGGG
>bar
CCCCCC
>qux
ATCGGAAA
Thus preserving the name of each sequence that started with ">".
Currently my code below replace that name with index, namely:
>0
ATCGGGG
>1
CCCCCC
>0
ATCGGAAA
What's the right way to modify my code below?
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include<stdio.h>
#include<string>
using namespace std;
#define MAX_LINE_SIZE 1024
int mk_joint_file(char *ctrlFile, char *tgtFile, char *outFile){
char s[MAX_LINE_SIZE];
FILE *ofp = fopen(outFile,"w");
FILE *cfp = fopen(ctrlFile,"r");
FILE *tfp = fopen(tgtFile,"r");
// char *p;
int flg=false;
int line=0;
while(fgets(s,MAX_LINE_SIZE,cfp) != NULL){
if(s[0]=='>'){
flg=true;
fprintf(ofp,">%d\n",line);
line++;
}else{
if(flg==true){
fprintf(ofp,"%s",s);
}
flg=false;
}
}
flg=false;
line=0;
while(fgets(s,MAX_LINE_SIZE,tfp) != NULL){
if(s[0]=='>'){
flg=true;
fprintf(ofp,">%d\n",line);
line++;
}else{
if(flg==true)
fprintf(ofp,"%s",s);
flg=false;
}
}
fclose(cfp);
fclose(tfp);
fclose(ofp);
return(0);
}
int main(int argc, char **argv)
{
string ifname_control = argv[1];
string ifname_target = argv[2];
string ofname = "newjoin.txt";
mk_joint_file((char *)ifname_control.c_str(), (char *)ifname_target.c_str(), (char *)ofname.c_str());
}

Is it any harder than just changing these lines
fprintf(ofp,">%d\n",line);
to
// TODO check fgets() handling of EOL - may not need the \n
fprintf(ofp, %s\n", s);

just change line 29 and 40 to
fprintf(ofp,"%s",s);

Related

How can I solve this problem, i need to export a line from a text file?

I have an input file like this:
Virtual (A) (A) (A) (A) (A) (A) (A) (A) (A) (A) (A) (A)
The electronic state is 1-A.
Alpha occ. eigenvalues -- -0.26426 -0.26166 -0.25915 -0.25885
Alpha occ. eigenvalues -- -0.25284 -0.25172 -0.24273 -0.23559
Alpha occ. eigenvalues -- -0.20078 -0.19615 -0.17676 -0.10810
Alpha virt. eigenvalues -- -0.07062 -0.06520 -0.05969 -0.01767
Alpha virt. eigenvalues -- -0.01604 -0.00951 -0.00428 0.00041
I would like to export the first line obtaining first 11 characters " Alpha virt.". How should I do? I code by C++ language as below code, but it cant finish while loop functio. I dont know why, I am a fresher. Please help me. Thank you so much.
My C++ code:
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <string>
#define FILENAME "filelog.txt"
using namespace std;
int main(void) {
char* line_buf = NULL;
size_t line_buf_size = 0;
int line_count = 0;
string s;
std::string dongsosanh = " Alpha virt.";
FILE* fp = fopen(FILENAME, "r");
getline(&line_buf, &line_buf_size, fp);
std::string STRR(line_buf, 11);
do {
line_count++;
getline(&line_buf, &line_buf_size, fp);
} while(STRR.compare(dongsosanh) != 0);
std::cout << STRR << endl;
return 0;
}
Thank you so much.
Many problems with your program:
line_buf - does not have memory allocated, undef behaviour
line_count - is 0, nothing will be red
You are not closing file at the end.
" Alpha virt." - this line will never be found, it has space at the begining.
STRR is never updated after line has been red, endless loop
Working solution:
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <string>
#define FILENAME "filelog.txt"
using namespace std;
int main(void) {
const std::string dongsosanh = "Alpha virt.";
char* line_buf = new char[100];
size_t line_buf_size = 100;
int line_count = 0;
string s;
FILE* fp = fopen(FILENAME, "r");
do {
line_count++;
getline(&line_buf, &line_buf_size, fp);
std::cout << line_buf;
} while(dongsosanh.compare(0, 11, line_buf, 11));
free(line_buf);
fclose(fp);
return 0;
}
This is to show how it works in your case, but you should use vector instead of char* line_buf.
You could just do this:
std::ifstream input(FILENAME);
std::string line;
while(std::getline(input, line)) {
if(line.substr(0, 11) == "Alpha virt.") {
std::cout << line << endl;
return 0;
}
}
EDIT: added the return statement to make sure only the first line starting with 'Alpha virt.' is printed.

Flex tokens not working with char* hashtable

I am making a simple compiler, and I use flex and a hashtable (unordered_set) to check if an input word is an identifier or a keyword.
%{
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <unordered_set>
using std::unordered_set;
void yyerror(char*);
int yyparse(void);
typedef unordered_set<const char*> cstrset;
const cstrset keywords = {"and", "bool", "class"};
%}
%%
[ \t\n\r\f] ;
[a-z][a-zA-Z0-9_]* { if (keywords.count(yytext) > 0)
printf("%s", yytext);
else
printf("object-identifier"); };
%%
void yyerror(char* str) {printf("ERROR: Could not parse!\n");}
int yywrap() {}
int main(int argc, char** argv)
{
if (argc != 2) {printf("no input file");}
FILE* file = fopen(argv[1], "r");
if (file == NULL) {printf("couldn't open file");}
yyin = file;
yylex();
fclose(file);
return 0;
}
I tried with an input file that has only the word "class" written, and the output is object_identifier, not class.
I tried with a simple program, without using flex and the unordered_set works fine.
int main()
{
cstrset keywords = {"and", "class"};
const char* str = "class";
if (keywords.count(str) > 0)
printf("works");
return 0;
}
What could be the problem?
Use unordered_set<string> instead of your unordered_set<const char*>. You are trying to find the pointer to the char array that obviously cannot exist inside your defined variable.

howto: Read input and store it in another file

I want to make a program that reads the highest value from one file and stores it in another. I've read about ifstream and ofstream but how do I let the ofstream store the highest value from the instream in another file? Here is what I have so far:
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <iterator>
#include <vector>
using namespace std;
struct CsvWhitespace : ctype<char> {
static const mask* make_table() {
static vector<mask> v{classic_table(), classic_table() + table_size};
v[','] |= space; // comma will be classified as whitespace
return v.data();
}
CsvWhitespace(size_t refs = 0) : ctype{make_table(), false, refs} {}
} csvWhitespace;
int main() {
string line;
ifstream myfile ("C:/Users/Username/Desktop/log.csv");
ofstream myfile2 ("C:/Users/Username/Desktop/log2.csv");
return 0;
}
auto v = vector<int>{};
myfile.imbue(locale{myfile.getloc(), &csvWhitespace});
copy(istream_iterator<int>{myfile}, istream_iterator<int>{}, back_inserter(v));
myfile2 << *max_element(begin(v), end(v));
}
Thanks in advance :)
You could just copy from the one file in the other, without having to worry about the format, by treating them in binary mode. Here is an example:
#include <stdio.h>
#include <string.h>
#define bufSize 1024
int main(int argc, char *argv[])
{
FILE *ifp, *ofp;
char buf[bufSize];
if (argc != 3)
{
fprintf(stderr,
"Usage: %s <soure-file> <target-file>\n", argv[0]);
return 1;
}
if ((ifp = fopen(argv[1], "rb")) == NULL)
{ /* Open source file. */
perror("fopen source-file");
return 1;
}
if ((ofp = fopen(argv[2], "wb")) == NULL)
{ /* Open target file. */
perror("fopen target-file");
return 1;
}
while (fgets(buf, sizeof(buf), ifp) != NULL)
{ /* While we don't reach the end of source. */
/* Read characters from source file to fill buffer. */
/* Write characters read to target file. */
fwrite(buf, sizeof(char), strlen(buf), ofp);
}
fclose(ifp);
fclose(ofp);
return 0;
}
which was given as an example in IP, source. You just need to specify the cmd arguments as the desired files.
You can do it like this. Live example using cin and cout rather than files.
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <iterator>
#include <vector>
using namespace std;
struct CsvWhitespace : ctype<char> {
static const mask* make_table() {
static vector<mask> v{classic_table(), classic_table() + table_size};
v[','] |= space; // comma will be classified as whitespace
return v.data();
}
CsvWhitespace(size_t refs = 0) : ctype{make_table(), false, refs} {}
};
int main() {
string line;
ifstream myfile("log.csv");
ofstream myfile2("log2.csv");
auto v = vector<int>{};
myfile.imbue(locale{myfile.getloc(), new CsvWhitespace{}});
copy(istream_iterator<int>{myfile}, istream_iterator<int>{}, back_inserter(v));
myfile2 << *max_element(begin(v), end(v));
}

Linux File Read and Write - C++

I supposed to create a program that reads source.txt's first 100 characters, write them in destination1.txt, and replace all "2" to "S" and write them to destination2.txt. Below is my code
#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <cstdio>
#include <iostream>
using namespace std;
int main(int argc, const char* argv[]){
argv[0] = "source.txt";
argv[1] = "destination1.txt";
argv[2] = "destination2.txt";
int count=100;
char buff[125];
int fid1 = open(argv[0],O_RDWR);
read(fid1,buff,count);
close(fid1);
int fid2 = open(argv[1],O_RDWR);
write(fid2,buff,count);
close(fid2);
//How to change the characters?
return 0;
}
Thanks guys I am able to do the copying. But how to perform the character replacement? If it's fstream I know how to do it with a for loop. But I'm supposed to use Linux system calls.
Define an array out_buf and copy buff into out_buf character by character, replacing 2's to S.
...
read(fid1,buff,count);
close(fid1);
char out_buf [125];
int i;
for (i = 0; i < sizeof (buf); i++) {
if (buff [i] == '2')
out_buf [i] = 'S'
else
out_buf [i] = buff [i]
}
int fid2 = open(argv[1],O_RDWR);
write(fid2, out_buf,count);
close(fid2);
return 0;
You should replace the filename assignments to something like this:
const std::string source_filename = "source.txt";
const std::string dest1_filename = "destination1.txt";
const std::string dest2_filename = "destination2.txt";
There is no guarantee that the OS will allocate 3 variables to your program.

Input parametrs as HEX from command line

I would like to print what the user inserts from the command line as HEX
when i declare my variable as: unsigned char myargv[] = {0x00,0xFF};
it works fine, i get: 11111111
but when i pass my parameters from command line i get different value
Example: myApp.exe FF
i get: 01100010
#include <iostream>
#include <string>
using namespace std;
void writeToScreen(unsigned char *data);
int main(int argc,unsigned char *argv[]){
if(argc != 2){
unsigned char myargv[] = {0x00,0xFF};
writeToScreen(&myargv[1]);
}else{
writeToScreen(argv[1]);
}
system("pause");
return 0;
}
void writeToScreen(unsigned char *data){
unsigned char dat;
dat =*(data);
for (unsigned int i=0;i<8;i++)
{
if (dat & 1)
cout<<"1";
else
cout<<"0";
dat>>=1;
}
cout<<endl;
}
You argument is FF. 'F' is 70 in ASCII, and 70 is 0x46 (0100 0110). You have "0110 0010" which is 0x46 written in reverse.
So first, you need to convert the argument (FF) into a number. Because currently, it's only a string. You can use strtol or std::stringstream (with std::hex) for that for instance.
With strtol:
#include <iostream>
#include <string>
#include <stdlib.h>
using namespace std;
void writeToScreen(char *data);
int main(int argc, char *argv[]){
writeToScreen(argv[1]);
return 0;
}
void writeToScreen(char *data){
unsigned char dat = strtol(data, NULL, 16);
for (unsigned int i=0;i<8;i++)
{
if (dat & 1)
cout<<"1";
else
cout<<"0";
dat>>=1;
}
cout<<endl;
}
Beware that the byte is still printed from LSB to MSB.
Another way to input hex parameters into a program as a command line parameter is with the help of Perl as below,
./main $(perl -e 'print "\xc8\xce"')
This in net effect, will send 2 bytes (0xC8 and 0xCE) of data in to the main program.