Text file (fasta) to list - command does weird things - list

A simple command here gave me a confusing output:
I have to create a list of a number of sequences.
For converting a fasta text file with "/n"s and titles in between (marked with ">") to a list of strings seperated only where a new title begins I tried:
x=int(0)
while x<len(my_enzyme_list):
if my_enzyme_list[x].startswith(">"):
x += 2
else:
my_enzyme_list[x-1:x] = "".join(my_enzyme_list[x-1:x])
x += 1
and it gave me (scroll till the end):
['>human_NAPRT Q6XQN6-1\n', 'MAAEQDPEARAAARPLLTDLYQATMALGYWRAGRARDAAEFELFFRRCPFGGAFALAAGLRDCVRFLRAFRLRDADVQFLASVLPPDTDPAFFEHLRALDCSEVTVRALPEGSLAFPGVPLLQVSGPLLVVQLLETPLLCLVSYASLVATNAARLRLIAGPEKRLLEMGLRRAQGPDGGLTASTYSYLGGFDSSSNVLAGQLRGVPVAGTLAHSFVTSFSGSEVPPDPMLAPAAGEGPGVDLAAKAQVWLEQVCAHLGLGVQEPHPGERAAFVAYALAFPRAFQGLLDTYSVWRSGLPNFLAVALALGELGYRAVGVRLDSGDLLQQAQEIRKVFRAAAAQFQVPWLESVLIVVSNNIDEEALARLAQEGSEVNVIGIGTSVVTCPQQPSLGGVYKLVAVGGQPRMKLTEDPEKQTLPGSKAAFRLLGSDGSPLMDMLQLAEEPVPQAGQELRVWPPGAQEPCTVRPAQVEPLLRLCLQQGQLCEPLPSLAESRALAQLSLSRLSPEHRRLRSPAQYQVVLSERLQALVNSLCAGQSP\n', '>XP_025969437.1 nicotinate phosphoribosyltransferase isoform X2 [Dromaius novaehollandiae]\n', 'M', 'A', 'L', 'L', 'T', 'D', 'L', 'Y', 'Q', 'V', 'T', 'M', 'A', 'Y', 'G', 'Y', 'W', 'R', 'A', 'G', 'R', 'H', 'R', 'V', 'P', 'A', 'A', 'A', 'E', 'L', 'F', 'F', 'R', 'R', 'C', 'P', 'F', 'R', 'G', 'A', 'F', 'A', 'L', 'G', 'A', 'G', 'L', 'A', 'E', 'G', 'L', 'R', 'F', 'L', 'R', 'A', 'F', 'R', 'F', 'S', 'A', 'A', 'D', 'I', 'A', 'Y', 'L', 'R', 'S', 'V', 'L', 'P', 'S', 'T', 'T', 'E', 'E', 'D', 'F', 'F', '\n', 'E', 'Y', 'L', 'A', 'T', 'V', 'D', 'A', 'S', 'E', 'V', 'T', 'I', 'S', 'S', 'V', 'P', 'E', 'G', 'S', 'V', 'V', 'F', 'S', 'R', 'V', 'P', 'L', 'L', 'Q', 'V', 'K', 'G', 'P', 'L', 'L', 'V', 'V', 'Q', 'L', 'L', 'E', 'T', 'T', 'L', 'L', 'C', 'L', 'V', 'S', 'Y', 'A', 'S', 'L', 'V', 'A', 'T', 'N', 'A', 'A', 'R', 'F', 'R', 'L', 'L', 'A', 'G', 'P', 'A', 'T', 'K', 'L', 'M', 'E', 'M', 'G', 'L', 'R', 'R', 'A', '\n', 'Q', 'G', 'P', 'D', 'G', 'G', 'L', 'S', 'A', 'S', 'K', 'Y', 'S', 'Y', 'I', 'G', 'G', 'F', 'D', 'C', 'T', 'S', 'N', 'V', 'L', 'A', 'G', 'K', 'L', 'Y', 'G', 'I', 'P', (and so on)
what happened?

Related

Is there a way to optimize with a variable name loop?

I made a simple code that displays random characters in a 4x4 array. These unique characters are stored in arrays that are named die 1 to die 16. I show you a snippet of the code :
Here are my arrays with their names :
//Use of dice that contain unique characters
char dice1[6]={'E', 'T', 'U', 'K', 'N', 'O'};
char dice2[6]={'E', 'V', 'G', 'T', 'I', 'N'};
char dice3[6]={'D', 'E', 'C', 'A', 'M', 'P'};
char dice4[6]={'I', 'E', 'L', 'R', 'U', 'W'};
char dice5[6]={'E', 'H', 'I', 'F', 'S', 'E'};
char dice6[6]={'R', 'E', 'C', 'A', 'L', 'S'};
Here the output :
int main()
{
//init the random engine
random_device rd;
default_random_engine eng(rd());
uniform_int_distribution<int> distr(MIN, MAX);
cout << distr(eng) << endl;
//Output of a random caracetere in the form of a table
cout << dice1[distr(eng)] ;
cout << dice2[distr(eng)] ;
cout << dice3[distr(eng)] ;
cout << dice4[distr(eng)] ;
cout << endl;
cout << dice5[distr(eng)] ;
cout << dice6[distr(eng)] ;
cout << dice8[distr(eng)] ;
cout << endl;
Can we optimize this code with a loop? I thought about doing a for loop but I didn't find a convincing option to change the number at the end.
Something like this. I didn't add the extra '\n' in there if needed use a indexed for loop (I used a range based for loop):
#include <array>
#include <iostream>
#include <random>
// note do not use : using namespace std;
int main()
{
// initialization of a 2d array char dices[6][6] would also work.
std::array<std::array<char, 6>, 6> dices
{ {
{'E', 'T', 'U', 'K', 'N', 'O'},
{'E', 'V', 'G', 'T', 'I', 'N'},
{'D', 'E', 'C', 'A', 'M', 'P'},
{'I', 'E', 'L', 'R', 'U', 'W'},
{'E', 'H', 'I', 'F', 'S', 'E'},
{'R', 'E', 'C', 'A', 'L', 'S'},
} };
std::random_device rd;
std::default_random_engine eng(rd());
std::uniform_int_distribution<int> distr(0, dices[0].size()-1);
for (const auto& dice : dices)
{
std::cout << dice[distr(eng)];
}
return 0;
}

replace() not changing characters in a string to the intended characters they are supposed to be replaced with

I am creating a program in C++ that encrypts text using the Caesar Cipher it allows the user to pick the offset that is used to encrypt at the moment i have on written it for offset 1 but when i use replace()function as part of the STL, rather than replacing them with the specified characters they should be replaced to it replaces them all with the same letter
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main()
{
int Offset;
string Message;
cout << "What Would You Like To Offset By" << endl;
cin >> Offset;
cout << "Please Enter The Text You Would Like To Encrypt" << endl;
cin >> Message;
switch(Offset)
{
case 1:
{
replace(Message.begin(), Message.end(), 'a', 'b');
replace(Message.begin(), Message.end(), 'A', 'B');
replace(Message.begin(), Message.end(), 'b', 'c');
replace(Message.begin(), Message.end(), 'B', 'C');
replace(Message.begin(), Message.end(), 'c', 'd');
replace(Message.begin(), Message.end(), 'C', 'D');
replace(Message.begin(), Message.end(), 'd', 'e');
replace(Message.begin(), Message.end(), 'D', 'E');
replace(Message.begin(), Message.end(), 'e', 'f');
replace(Message.begin(), Message.end(), 'E', 'F');
replace(Message.begin(), Message.end(), 'f', 'g');
replace(Message.begin(), Message.end(), 'F', 'G');
replace(Message.begin(), Message.end(), 'g', 'h');
replace(Message.begin(), Message.end(), 'G', 'H');
replace(Message.begin(), Message.end(), 'h', 'i');
replace(Message.begin(), Message.end(), 'H', 'I');
replace(Message.begin(), Message.end(), 'i', 'j');
replace(Message.begin(), Message.end(), 'I', 'J');
replace(Message.begin(), Message.end(), 'j', 'k');
replace(Message.begin(), Message.end(), 'J', 'K');
replace(Message.begin(), Message.end(), 'k', 'l');
replace(Message.begin(), Message.end(), 'K', 'L');
replace(Message.begin(), Message.end(), 'l', 'm');
replace(Message.begin(), Message.end(), 'L', 'M');
replace(Message.begin(), Message.end(), 'm', 'n');
replace(Message.begin(), Message.end(), 'M', 'N');
replace(Message.begin(), Message.end(), 'n', 'o');
replace(Message.begin(), Message.end(), 'N', 'O');
replace(Message.begin(), Message.end(), 'o', 'p');
replace(Message.begin(), Message.end(), 'O', 'P');
replace(Message.begin(), Message.end(), 'p', 'q');
replace(Message.begin(), Message.end(), 'P', 'Q');
replace(Message.begin(), Message.end(), 'q', 'r');
replace(Message.begin(), Message.end(), 'Q', 'R');
replace(Message.begin(), Message.end(), 'r', 's');
replace(Message.begin(), Message.end(), 'R', 'S');
replace(Message.begin(), Message.end(), 's', 't');
replace(Message.begin(), Message.end(), 'S', 'T');
replace(Message.begin(), Message.end(), 't', 'u');
replace(Message.begin(), Message.end(), 'T', 'U');
replace(Message.begin(), Message.end(), 'u', 'v');
replace(Message.begin(), Message.end(), 'U', 'V');
replace(Message.begin(), Message.end(), 'v', 'w');
replace(Message.begin(), Message.end(), 'V', 'W');
replace(Message.begin(), Message.end(), 'w', 'x');
replace(Message.begin(), Message.end(), 'W', 'X');
replace(Message.begin(), Message.end(), 'x', 'y');
replace(Message.begin(), Message.end(), 'X', 'Y');
replace(Message.begin(), Message.end(), 'y', 'z');
replace(Message.begin(), Message.end(), 'Y', 'Z');
replace(Message.begin(), Message.end(), 'z', 'a');
replace(Message.begin(), Message.end(), 'Z', 'A');
cout << Message << endl;
break;
}
}
}
The Golden Rule Of Computer Programming states: "Your computer always does exactly what you tell it to do, instead of what you want it to do".
Now let's explore what you told your computer to do.
replace(Message.begin(), Message.end(), 'a', 'b');
You told your computer to replace every occurrence of the letter 'a' with the letter 'b'. Your computer will do exactly that. Two statements later:
replace(Message.begin(), Message.end(), 'b', 'c');
Here, you told your computer to replace every occurrence of the letter 'b' with the letter 'c'. Your computer will do exactly that. This includes both letter 'b' that were in the original text, as well as all letters that were originally 'a' but are now 'b'. That's because, earlier, you told your computer to change all a-s to b-s, and now you have a bunch of bs, and now all of them are indistinguishable from the other.
In this manner, if you work out, on paper, what is the result of everything you told your computer to do, it becomes obvious why the resulting string always winds up with the same letter (two letters, actually, both uppercase and lowercase).
Your obvious goal here is to replace each letter by the next one, rotated. The correct approach will be fundamentally different, but this explains why the text gets replaced "all with the same letter".
your text made me curious, so I tried another approach:
Taking the cin as string;
Converting chars to (ASCII + offset) % 128
and then cout as chars again.
If you're interested, i can provide the code, but don't want to take you the chance, to solve it yourself, (in one of the countless possible ways) :)

How to make a wrap in the output with a nested lists

I have this nested list in my Output:
[['O', 'O', 'X', 'O', 'O', 'O', 'X', 'O', 'O', 'O', 'X', 'O'], ['O',
'O', 'X', 'O', 'O', 'O', 'X', 'O', 'O', 'O', 'X', 'O'], ['O', 'O',
'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]
How can I bring it in this form:
[
['O', 'O', 'X', 'O', 'O', 'O', 'X', 'O', 'O', 'O', 'X', 'O'],
['O', 'O', 'X', 'O', 'O', 'O', 'X', 'O', 'O', 'O', 'X', 'O'],
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
]
In other words: How can I make a wrap in a for-loop ?
For example:
for i in range(3):
my_list.append(i)
# How to make now a wrap ?
pprint.pprint gives a nicely formatted output:
>>> L = [['O', 'O', 'X', 'O', 'O', 'O', 'X', 'O', 'O', 'O', 'X', 'O'], ['O',
'O', 'X', 'O', 'O', 'O', 'X', 'O', 'O', 'O', 'X', 'O'], ['O', 'O',
'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]
>>> import pprint
>>> pprint.pprint(L)
[['O', 'O', 'X', 'O', 'O', 'O', 'X', 'O', 'O', 'O', 'X', 'O'],
['O', 'O', 'X', 'O', 'O', 'O', 'X', 'O', 'O', 'O', 'X', 'O'],
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]

c++ array returning erroneous characters

Currently having an issue with the following code returning an array with symbols appended to the end. This is the smallest I could get the code in order to reproduce the error. What is causing this issue? I assume that maybe the array is getting numbers in it somehow which get interpreted as ascii symbols, but I can't figure out where this is happening.
#include "stdafx.h"
#include <iostream>
#include <iomanip>
#include <string>
using namespace std;
const int numRow = 6;
const int numCol = 26;
char letters[numRow][numCol] = {
{ 'm', 'w', 'r', 'u', 't', 'v', 'n', 'j', 'd', 'j', 'y', 'k', 'k', 'g', 'g', 'd', 'c', 'v', 'n', 'x', 'm', 'd', 'q', 'y', 'u', 't' },
{ 'y', 'e', 'r', 'y', 'e', 't', 'w', 'y', 'u', 'w', 'r', 's', 'f', 'h', 's', 'g', 'a', 'a', 'g', 'd', 'b', 'b', 'b', 'g', 'x', 'z' },
{ 'j', 'd', 'j', 'y', 'k', 'k', 'g', 'g', 'd', 'c', 'v', 'n', 't', 'w', 'y', 'u', 'w', 'r', 's', 'f', 'h', 's', 'g', 'a', 'a', 'g' },
{ 'y', 'e', 't', 'w', 'y', 'u', 'w', 'r', 's', 'f', 'h', 'j', 'y', 'k', 'k', 'g', 'g', 'd', 'c', 'v', 'g', 'a', 'a', 'g', 'd', 'b' },
{ 'e', 'r', 'y', 'e', 't', 't', 'v', 'n', 'j', 'd', 'j', 'y', 'k', 'w', 'r', 's', 'f', 'h', 's', 'g', 'g', 'g', 'd', 'c', 'v', 'g' },
{ 'y', 'u', 'w', 'r', 's', 'f', 'h', 's', 'g', 's', 'f', 'h', 's', 'g', 'a', 'a', 'g', 'd', 'w', 'y', 'u', 'w', 'r', 's', 'f', 'h' }
};
int main()
{
char *ltrptr;
ltrptr = &letters[0][0];
const int arraySize = 6 * 26;
int answer = 0;
cout << " Select row for sort: " << endl;
cin >> answer;
char newArray[numCol];
char *ltrptr2;
ltrptr2 = &newArray[0];
for (int i = 0; i < numCol; i++){
newArray[i] = letters[answer - 1][i];
}
cout << "Selected row: before" << newArray << endl;
selectionSort(ltrptr2, numCol, ascending);
cout << "Selected row: after " << newArray << endl;
getchar();
return 0;
}
It would be better if you actually included what output you got and what output you expected (and also the input given), rather than trying to describe the output.
But your code has an obvious error which is consistent with your vague description: you haven't put a C string in your character array, but you try to print it as if it did contain a C string.
(in particular, to store a C string in a character array, one must store the sequence of characters followed by a null character)

Attempting to access 2d array as a pointer

I am currently attempting to iterate through a 2d array to a function using pointer notation as an exercise. I found an example of how to do this on these forums; it's the if statement within the displayTable function. My compiler is giving me errors about the function call itself stating that the if statement from the displayTable function must be of a pointer type. Why is this not working? The example was up voted.
#include "stdafx.h"
#include <iostream>
#include <iomanip>
#include <string>
using namespace std;
const int numRow = 6;
const int numCol = 26;
char letters[numRow][numCol] = {
{ 'm', 'w', 'r', 'u', 't', 'v', 'n', 'j', 'd', 'j', 'y', 'k', 'k', 'g', 'g', 'd', 'c', 'v', 'n', 'x', 'm', 'd', 'q', 'y', 'u', 't' },
{ 'y', 'e', 'r', 'y', 'e', 't', 'w', 'y', 'u', 'w', 'r', 's', 'f', 'h', 's', 'g', 'a', 'a', 'g', 'd', 'b', 'b', 'b', 'g', 'x', 'z' },
{ 'j', 'd', 'j', 'y', 'k', 'k', 'g', 'g', 'd', 'c', 'v', 'n', 't', 'w', 'y', 'u', 'w', 'r', 's', 'f', 'h', 's', 'g', 'a', 'a', 'g' },
{ 'y', 'e', 't', 'w', 'y', 'u', 'w', 'r', 's', 'f', 'h', 'j', 'y', 'k', 'k', 'g', 'g', 'd', 'c', 'v', 'g', 'a', 'a', 'g', 'd', 'b' },
{ 'e', 'r', 'y', 'e', 't', 't', 'v', 'n', 'j', 'd', 'j', 'y', 'k', 'w', 'r', 's', 'f', 'h', 's', 'g', 'g', 'g', 'd', 'c', 'v', 'g' },
{ 'y', 'u', 'w', 'r', 's', 'f', 'h', 's', 'g', 's', 'f', 'h', 's', 'g', 'a', 'a', 'g', 'd', 'w', 'y', 'u', 'w', 'r', 's', 'f', 'h' }
};
void displayTable(char[][26]);
int main()
{
char *ltrptr = 0;
ltrptr = &letters[0][0];
const int arraySize = 6 * 26;
int answer = 0;
char * arr[6][26];
displayTable(letters);
getchar();
return 0;
}
void displayTable(char ans[][26]){
//pas 2d array, then point to it
cout << " The table as it stands: " << endl;
char * ans1;
ans1 = &ans[0][0];
for (char * iter = &ans1[0][0]; iter != &ans1[0][0] + 6 * 26; iter++){
cout << &ans1[0][0] << endl;
}
}
The problem is that your ans1 is a simple pointer-to-char and you're trying to dereference it twice. The correct way to do the loop would be
for (char * iter = ans1; iter != ans1 + 6 * 26; iter++){
cout << *iter << endl;
}