I'm having hard time with writing a cell array into a text file. If anyone can assist me with this,it would be highly appreciated;
Lets say my cell array is C =
[1x5 double] [0.1962] [1x3 double] [2x3 double]
>> C{:}
ans =
0.9864 0.8223 0.1952 0.0121 0.0012
ans =
0.1962
an s =
0.9864 0.2448 0.0014
ans =
0.9864 0.2448 0.0014
0.9863 0.2448 0.0014
I want to print this on a text file in the same format as we see above without 'ans'; I use fprintf and I get all the output in a single row.
[nrows ncols]=size(C);
fid = fopen(saveDataName, 'w');
for row=1:nrows
fprintf(fid, '%12.4f', C{row,:});
fclose(fid);
Can anyone help me with this?
The printf family doesn't implicitly add a newline, so you need to end your format string with "\n". If you want a newline every so many data points, as in your example, just inject fprintf(fid,"\n") as necessary in the loop.
Related
I have more than 10k text files look similar like this, all of them are similar in format but not in size, sometime is bigger or smaller.
[{u'language': u'english', u'area': 3825.8953168044045, u'class': u'machine printed', u'utf8_string': u'troia', u'image_id': 428035, u'box': [426.42422762784093, 225.33333055900806, 75.15151515151516, 50.909090909090864], u'legibility': u'legible', u'id': 1056659}, {u'language': u'na', u'area': 24201.285583103767, u'id': 1056660, u'image_id': 428035, u'box': [223.99998520359847, 249.57575480143228, 172.12121212121215, 140.6060606060606], u'legibility': u'illegible', u'class': u'machine printed'}]
I want to extract two changeable variable in every text using regular expression.
The output should be like this
box = [223.99998520359847, 249.57575480143228, 172.12121212121215, 140.6060606060606]
box1 = .. sometime there is more than one
&
second output
word = troia
word1 = ... sometime there is more than one word
My code 1: for the word extraction
fid = fopen('text1.txt','r');
C = textscan(fid, '%s','Delimiter','');
fclose(fid);
C = C{:};
Lia = ~cellfun(#isempty, strfind(C,'utf8_string'));
output = [C{find(Lia)}];
expression = 'u''utf8_string'': u+'
matchStr = regexp(output, expression,'match');
My code 1 result give me only the
utf8_string
My code 2: for the box number extraction
s = sprintf('text_.txt');
fid = fopen(s);
tline = fgetl(fid);
C = regexp(tline,'u''box'': +\[([0-9\. ,]+)\]','tokens');
C = cellfun(#(x) x{1},C,'UniformOutput',false)';
M = cell2mat(cellfun(#(x) x', cat(1,C2{:}),'UniformOutput',false));
This code 2 is running but not with every text something i got this error
Error using cat Dimensions of matrices being concatenated are not consistent
If you do not insist on regexp: The input strings looks like json, so the following short code does even more than you want:
% Read the whole file
s = fileread('test.txt');
% Remove the odd u'
s = strrep(s, 'u''', '''');
% Replace ' by "
s = strrep(s, '''', '"');
% See http://www.mathworks.com/matlabcentral/fileexchange/20565
t = parse_json(s);
Now t a is cell object containing structs with the data. So
word = t{1}.utf8_string;
box = cell2mat(t{1}.box);
will give you the first word and box. If you have a newer Matlab version you can probably use jsondecode instead of parse_json.
The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for a regular expression of [0-9]+ and then converting the extracted strings to integers and summing up the integers.
I am finding trouble in appending the list. From my below code, it is just appending the first(0) index of the line. Please help me. Thank you.
import re
hand = open ('a.txt')
lst = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('[0-9]+', line)
if len(stuff)!= 1 : continue
num = int (stuff[0])
lst.append(num)
print sum(lst)
import re
ls=[];
text=open('C:/Users/pvkpu/Desktop/py4e/file1.txt');
for line in text:
line=line.rstrip();
l=re.findall('[0-9]+',line);
if len(l)==0:
continue
ls+=l
for i in range(len(ls)):
ls[i]=int(ls[i]);
print(sum(ls));
Great, thank you for including the whole txt file! Your main problem was in the if len(stuff)... line which was skipping if stuff had zero things in it and when it had 2,3 and so on. You were only keeping stuff lists of length 1. I put comments in the code but please ask any questions if something is unclear.
import re
hand = open ('a.txt')
str_num_lst = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('[0-9]+', line)
#If we didn't find anything on this line then continue
if len(stuff) == 0: continue
#if len(stuff)!= 1: continue #<-- This line was wrong as it skip lists with more than 1 element
#If we did find something, stuff will be a list of string:
#(i.e. stuff = ['9607', '4292', '4498'] or stuff = ['4563'])
#For now lets just add this list onto our str_num_list
#without worrying about converting to int.
#We use '+=' instead of 'append' since both stuff and str_num_lst are lists
str_num_lst += stuff
#Print out the str_num_list to check if everything's ok
print str_num_lst
#Get an overall sum by looping over the string numbers in the str_num_lst
#Can convert to int inside the loop
overall_sum = 0
for str_num in str_num_lst:
overall_sum += int(str_num)
#Print sum
print 'Overall sum is:'
print overall_sum
EDIT:
You are right, reading in the entire file as one line is a good solution, and it's not difficult to do. Check out this post. Here is what the code could look like.
import re
hand = open('a.txt')
all_lines = hand.read() #Reads in all lines as one long string
all_str_nums_as_one_line = re.findall('[0-9]+',all_lines)
hand.close() #<-- can close the file now since we've read it in
#Go through all the matches to get a total
tot = 0
for str_num in all_str_nums_as_one_line:
tot += int(str_num)
print('Overall sum is:',tot) #editing to add ()
I am opening a .gz file and reading it chunk by chunk for uncompressing it.
The data in the uncompressed file is like :
aRSbRScRSd, There are record separators(ASCII code 30) between each record (records in my dummy example a,b,c).
File file = File(mylog.gz, "r");
auto uc = new UnCompress();
foreach (ubyte[] curChunk; file.byChunk(4096*1024))
{
auto uncompressed = cast(string)uc.uncompress(curChunk);
writeln(uncompressed);
auto stringRange = uncompressed.splitLines();
foreach (string line; stringRange)
{
***************** Do something with line
The result of the code above is:
abcd unfortunately record separators(ASCII 30) are missing.
I realized by examining the data record separators are missing after I cast ubyte[] to string.
Now I have two questions:
What should I change in the code to keep record separator?
How can I write the code above without for loops? I want to read line by line.
Edit
A more general and understandable code for first question :
ubyte[] temp = [ 65, 30, 66, 30, 67];
writeln(temp);
string tempStr = cast(string) temp;
writeln (tempStr);
Result is : ABC which is not desired.
The character 30 is not a printable character although some editors may show a symbol in its place. It is not being lost, but it doesn't print out.
Also note that casting a ubyte[] to string is usually incorrect because a ubyte[] array is mutable while a string is immutable. It is better to cast a ubyte[] to a char[].
I am beginner for real programming and have the ff problem
I want to read many instances stored in a file/csv/txt/excel
like the folloing
find<S>ing<G>s<p>
Then when I read this file it goes through each character and start from the six position and continue until the 11 position-the max size of a single row is 12
-,-,-,-,-,f,i,n,d,i,n,0
-,-,-,-,f,i,n,d,i,n,g,0
-,-,-,f,i,n,d,i,n,g,s,0
-,-,f,i,n,d,i,n,g,s,-,S//there is an S value next to the letter d
-,f,i,n,d,i,n,g,s,-,-,0
f,i,n,d,i,n,g,s,-,-,-,0
i,n,d,i,n,g,s,-,-,-,-,G // there is a G value here at th end of g
n,d,i,n,g,s,-,-,-,-,-,P */// there is a P value here at th end of s
Here is the code that I tried in python. but can be possible in c++, java, dotNet.
import sys
import os
f = open('/home/mm/exprimentdata/sample3.csv')// can be txt file
string = f.read()
a = []
b = []
i = 0
while (i < len(string)):
if (string[i] != '\n '):
n = string[i]
if (string[i] == ""):
print ' = '
if (string[i] = upper | numeric)
print rep(char).rjust(12),delimiter=','
a.append(n)
i = (i+1)
print (len(a))
print a
my question is how can I compare each string and assign a single char at the rightmost part (position 12 like above G,P,S)
how can I push one step back after aligning the first row?
how can i fix the length
please anyone see fragment and adjust to solve the above case
I don't understand your question.
But some advice:
Firstly, you should be closing the file after you open it.
f = open('/home/mm/exprimentdata/sample3.csv')// can be txt file
string = f.read()
**f.close()**
Secondly, your indentation is problematic. Whitespace matters in Python. (Maybe your real code is indented properly and it's just a StackOverflow thing.)
Thirdly, instead of using a while loop and incrementing, you should be writing:
for i range(len(string)):
# loop code
Fourthly, this line will never evaluate to True:
if (string[i] == ""):
string[i] will always be some character (or cause an out of bounds error).
I advise you read a Python tutorial before you try and write this program.
I am trying to read a file that looks as follows:
Data Sampling Rate: 256 Hz
*************************
Channels in EDF Files:
**********************
Channel 1: FP1-F7
Channel 2: F7-T7
Channel 3: T7-P7
Channel 4: P7-O1
File Name: chb01_02.edf
File Start Time: 12:42:57
File End Time: 13:42:57
Number of Seizures in File: 0
File Name: chb01_03.edf
File Start Time: 13:43:04
File End Time: 14:43:04
Number of Seizures in File: 1
Seizure Start Time: 2996 seconds
Seizure End Time: 3036 seconds
So far I have this code:
fid1= fopen('chb01-summary.txt')
data=struct('id',{},'stime',{},'etime',{},'seizenum',{},'sseize',{},'eseize',{});
if fid1 ==-1
error('File cannot be opened ')
end
tline= fgetl(fid1);
while ischar(tline)
i=1;
disp(tline);
end
I want to use regexp to find the expressions and so I did:
line1 = '(.*\d{2} (\.edf)'
data{1} = regexp(tline, line1);
tline=fgetl(fid1);
time = '^Time: .*\d{2]}: \d{2} :\d{2}' ;
data{2}= regexp(tline,time);
tline=getl(fid1);
seizure = '^File: .*\d';
data{4}= regexp(tline,seizure);
if data{4}>0
stime = '^Time: .*\d{5}';
tline=getl(fid1);
data{5}= regexp(tline,seizure);
tline= getl(fid1);
data{6}= regexp(tline,seizure);
end
I tried using a loop to find the line at which file name starts with:
for (firstline<1) || (firstline>1 )
firstline= strfind(tline, 'File Name')
tline=fgetl(fid1);
end
and now I'm stumped.
Suppose that I am at the line at which the information is there, how do I store the information with regexp? I got an empty array for data after running the code once...
Thanks in advance.
I find it the easiest to read the lines into a cell array first using textscan:
%// Read lines as strings
fid = fopen('input.txt', 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
and then apply regexp on it to do the rest of the manipulations:
%// Parse field names and values
C = regexp(C{:}, '^\s*([^:]+)\s*:\s*(.+)\s*', 'tokens');
C = [C{:}]; %// Flatten the cell array
C = reshape([C{:}], 2, []); %// Reshape into name-value pairs
Now you have a cell array C of field names and their corresponding (string) values, and all you have to do is plug it into struct in the correct syntax (using a comma-separated list in this case). Note that the field names have spaces in them, so this needs to be taken care of before they can be used (e.g replace them with underscores):
C(1, :) = strrep(C(1, :), ' ', '_'); %// Replace spaces with underscores
data = struct(C{:});
Here's what I get for your input file:
data =
Data_Sampling_Rate: '256 Hz'
Channel_1: 'FP1-F7'
Channel_2: 'F7-T7'
Channel_3: 'T7-P7'
Channel_4: 'P7-O1'
File_Name: 'chb01_03.edf'
File_Start_Time: '13:43:04'
File_End_Time: '14:43:04'
Number_of_Seizures_in_File: '1'
Seizure_Start_Time: '2996 seconds'
Seizure_End_Time: '3036 seconds'
Of course, it is possible to prettify it even more by converting all relevant numbers to numerical values, grouping the 'channel' fields together and such, but I'll leave this to you. Good luck!