D lang record separator is being lost after string cast - casting

I am opening a .gz file and reading it chunk by chunk for uncompressing it.
The data in the uncompressed file is like :
aRSbRScRSd, There are record separators(ASCII code 30) between each record (records in my dummy example a,b,c).
File file = File(mylog.gz, "r");
auto uc = new UnCompress();
foreach (ubyte[] curChunk; file.byChunk(4096*1024))
{
auto uncompressed = cast(string)uc.uncompress(curChunk);
writeln(uncompressed);
auto stringRange = uncompressed.splitLines();
foreach (string line; stringRange)
{
***************** Do something with line
The result of the code above is:
abcd unfortunately record separators(ASCII 30) are missing.
I realized by examining the data record separators are missing after I cast ubyte[] to string.
Now I have two questions:
What should I change in the code to keep record separator?
How can I write the code above without for loops? I want to read line by line.
Edit
A more general and understandable code for first question :
ubyte[] temp = [ 65, 30, 66, 30, 67];
writeln(temp);
string tempStr = cast(string) temp;
writeln (tempStr);
Result is : ABC which is not desired.

The character 30 is not a printable character although some editors may show a symbol in its place. It is not being lost, but it doesn't print out.
Also note that casting a ubyte[] to string is usually incorrect because a ubyte[] array is mutable while a string is immutable. It is better to cast a ubyte[] to a char[].

Related

Index out of bounds error using Regex Split

Posting another question here since last time I did the people who answered were extremely helpful. Bear in mind, I'm relatively new to VB.net.
So I'm working on a program that pulls the first and third columns out of a text file using Regex.Split to eliminate the multiple spaces between the alphanumeric characters in the file.
A high level example of what the text file looks like is here:
VARIABLE1 MEAS1 STORAGE1
VARIABLE2 MEAS2 STORAGE2
VARIABLE3 MEAS3 STORAGE3
VARIABLE4 MEAS4 STORAGE4
VARIABLE5 MEAS5 STORAGE5
VARIABLE6 MEAS6 STORAGE6
#VARIABLE7 MEAS7 STORAGE7
VARIABLE8 MEAS8 STORAGE8
VARIABLE9 MEAS9 STORAGE9
VARIABLE10 MEAS10 STORAGE10
VARIABLE11 MEAS11 STORAGE11
VARIABLE12 MEAS12 STORAGE12
VARIABLE13 MEAS13 STORAGE13
VARIABLE14 MEAS14 STORAGE14
The file uses the "#" to denote comments in the file, so in my code I tell the System.IO to ignore that character.
However, when creating a test function to try this, I continuously get an Index out of bounds error, (only on some files. Some in this format work fine, for some reason)
When looking through the execution output, I am receiving the error after it writes the "STORAGE6" line, so there has to be an error traversing from STORAGE6 to VARIABLE7, and I can't quite figure it out. Any insight on this would be extremely appreciated!
The test function I have written is below:
Public Function Testing()
OpenFileDialog1.ShowDialog()
Dim file = System.IO.File.ReadAllLines(OpenFileDialog1.FileName)
For Each line In file
Dim arrWords() As String = System.Text.RegularExpressions.Regex.Split(line, "\s+")
Dim upBound = arrWords.GetUpperBound(0)
If upBound <> 0 Then
If line.Contains("#") Or line.Length = 0 Then
Else
Console.WriteLine(arrWords(0) + " " + arrWords(2))
End If
End If
Next
End Function
I get the out of bounds error when calling "arrWords(2)," which I'm sure was pretty obvious, but just trying to make the question as detailed as possible.
The simple fix is changing these two lines:
If upBound <> 0 Then
If line.Contains("#") Or line.Length = 0 Then
like this:
If upBound > 0 Then
If line.TrimStart().StartsWith("#") OrElse String.IsNullOrWhitespace(line) Then
But I'd really do something more like this:
Public Class DataItem
Public Property Variable As String
Public Property Measure As String
Public Property Storage As String
End Class
Public Function ReadDataFile(fileName As String) As IEnumerable(Of DataItem)
Return File.ReadLines(fileName).
Where(Function(line) Not line.TrimStart().StartsWith("#") AndAlso Not String.IsNullorWhitespace(line)).
Select(Function(line) System.Text.RegularExpressions.Regex.Split(line, "\s+")).
Where(Function(fields) fields.Length = 3).
Select(Function(fields)
Return New DataItem With {
.Variable = fields(0),
.Measure = fields(1),
.Storage = fields(2)}
End Function)
End Function
Public Function Testing()
If OpenFileDialog1.ShowDialog() = DialogResult.OK Then
Dim records = ReadDataFile(OpenFileDialog1.FileName)
For Each record in records
Console.WriteLine($"{record.Variable} {record.Storage}")
Next
End If
End Function

not getting proper string in assign function in c++ when trying to get data from file

data.assign( content,
(content.find_first_of("= ")) + 3,
content.find_first_of('\n')-content.find_first_of("= ") + 3);
I am getting invalid length of string when reading from file and comparing it with local string in my code what checking I have missed in this it can be '\0' or '\n' at the end of the file means I am trying to get the data from file from '=' character I find to end of that line file could be of one line or multiple line I just need data upto '\n' or '\0'
content : Scenario = sampl_verification
data : sampl_verification
data size : 19
data length : 19
cdata : sampl_verification
cdata size : 18
cdata length : 18
Use std::getline instead of std::assign. std::getline reads data upto '\n' or '\0'.
For more info, you can read these -
std::getline
std::assign

C++ Reading a text file backwards from the end of each line up until a space

Is it possible to read a text file backwards from the end of each line up until a space? I need to be able to output the numbers at the end of each line. My text file is formatted as follows:
1 | First Person | 123.45
2 | Second Person | 123.45
3 | Third Person | 123.45
So my output would be, 370.35.
Yes. But in your case, it's most likely more efficient to simply read the whole file and parse out the numbers.
You could do something like this (and I'm writing this in pseudocode so you have to acutally write real code, since that's how you learn):
seek to end of file.
pos = current position
while(pos >= 0)
{
read a char from file.
if (char == space)
{
flag = false;
process string to fetch out number and add to sum.
}
else
{
add char to string
}
if (char == newline)
{
flag = true;
}
pos--
seek to pos-2
}

Writing cell array into a text file

I'm having hard time with writing a cell array into a text file. If anyone can assist me with this,it would be highly appreciated;
Lets say my cell array is C =
[1x5 double] [0.1962] [1x3 double] [2x3 double]
>> C{:}
ans =
0.9864 0.8223 0.1952 0.0121 0.0012
ans =
0.1962
an s =
0.9864 0.2448 0.0014
ans =
0.9864 0.2448 0.0014
0.9863 0.2448 0.0014
I want to print this on a text file in the same format as we see above without 'ans'; I use fprintf and I get all the output in a single row.
[nrows ncols]=size(C);
fid = fopen(saveDataName, 'w');
for row=1:nrows
fprintf(fid, '%12.4f', C{row,:});
fclose(fid);
Can anyone help me with this?
The printf family doesn't implicitly add a newline, so you need to end your format string with "\n". If you want a newline every so many data points, as in your example, just inject fprintf(fid,"\n") as necessary in the loop.

Retrieve particular parts of string from a text file and save it in a new file in MATLAB

I am trying to retrieve particular parts of a string in a text file such as below and i would like to save them in a text file in MATLAB
Original text file
D 1m8ea_ 1m8e A: d.174.1.1 74583 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74583
D 1m8eb_ 1m8e B: d.174.1.1 74584 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74584
D 3e7ia1 3e7i A:77-496 d.174.1.1 158052 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158052
D 3e7ib1 3e7i B:77-496 d.174.1.1 158053 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158053
D 2bhja1 2bhj A:77-497 d.174.1.1 128533 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=128533
So basically, I would like to retrieve the pdbcodes id which are labeled as "1m8e", chainid labeled as "A" the Start values which is "77" and stop values which is "496" and i would like all of these values to be saved inside of a fprintf statment.
Is there some kind of method is which i can use in RegExp stating which index its all starting at and retrieve those strings based on the position in the text file for each line?
In the end, all i want to have in the fprinf statement is 1m8e, A, 77, 496.
So far i have two fopen function which reads a file and one that writes to a new file and to read each line by line, also a fprintf statment:
pdbcode = '';
chainid = '';
start = '';
stop = '';
fin = fopen('dir.cla.scop.txt_1.75.txt', 'r');
fout = fopen('output_scop.txt', 'w');
% TODO: Add error check!
while true
line = fgetl(fin); % Get the next line from the file
if ~ischar(line)
% End of file
break;
end
% Print result into output_cath.txt file
fprintf(fout, 'INSERT INTO cath_domains (scop_pdbcode, scop_chainid, scopbegin, scopend) VALUES("%s", %s, %s, %s);\n', pdbcode, chainid, start, stop);
Thank you.
You should be able to strsplit on whitespace, get the third ("1m8e") and fourth elements ("A:77-496"), then repeat the process on the fourth element using ":" as the split character, and then again on the second of those two arguments using "-" as the split character. That's one approach. For example, you could do:
% split on space and tab, and ignore empty tokens
tokens = strsplit(line, ' \t', true);
pdbcode = tokens(3);
% split fourth token from previous split on colon
tokens = strsplit(tokens(4), ':');
chainid = tokens(1);
% split second token from previous split on dash
tokens = strsplit(tokens(2), '-');
start = tokens(1);
stop = tokens(2);
If you really wanted to use regular expressions, you could try the following
pattern = '\S+\s+\S+\s+(\S+)\s+([A-Za-z]+):([0-9]+)-([0-9]+)';
[mat tok] = regexp(line, pattern, 'match', 'tokens');
pdbcode = cell2mat(tok)(1);
chainid = cell2mat(tok)(2);
start = cell2mat(tok)(3);
stop = cell2mat(tok)(4);