Strings in Matlab - regex

I have a file of tweets that I have read into MATLAB using dataread and they're stored in a cell. I wanted to find the average number of characters in the tweets. How would I go about doing that? Here is the code I have so far:
fid=fopen('tweets.txt');
lines = dataread('file', 'tweets.txt', '%s', 'delimiter', '\n');
I was thinking I could use something along the lines of cellfun but I'm unsure how to format it. Any help would be greatly appreciated.

Try cellfun(#numel,lines), it returns the length of each line.
btw: fid=fopen('tweets.txt'); is unnecessary if you use dataread this way. Simply delete the line.

Related

Need help in ignoring certain lines when reading a file so that I don't have to include them in the calculations using VScode

Okay, so I am writing a program that reads a file and prints all the lines in it. I don't want it to print lines that start with # and + and print all the other lines.
The code that I have written looks like this Code
The output looks like the following Output. So I want the output to print only those lines without starting with + and #.
I have stuck with this problem for a while and will really appreciate it if anyone can provide any hints as to what to do?
Thanks in Advance
Sorry I couldn't format the right coding style on stack overflow because I'm new but if you need help in understanding it don't hesitate to ask me
Regards,
Abdul Hadi

Is it possible to skip a line in a data file?

I have a data file that I am trying to input and the data is split into sections via a blank line. The data will be read in from a text file.
How do I make my code skip a blank line to read in the next piece of data? I am currently just in the planning stages of my application.
I'm a beginner so I'm not really sure how to go about this.
Can anyone advise a method on how to approach this?
I have just written it out and my code looks like this:
string ship2_id;
char ship2_journey_id[20];
float ship2_l;
int ship2_s;
getline(itinerary_file, ship2_id);
if (ship2_id = ' ')
{
itinerary_file.ignore(numeric_limits<streamsize>::max(), '\n');
}
else
getline(itinerary_file, ship2_id);
cout << ship2_id << endl;
Yes,
stream.ignore(max_number_of_chars_to_be_skipped, '\n');
I usually just use 1ul<<30 or similar for the first parameter, but
this could be a DoS vector if the input is untrusted and slow to skip those chars
the "pedant" value would read std::numeric_limits<std::stream_pos>::max() or similar
I don't what are you using to read the file, but, to search for a blank line, look for two "line breaks" together. Take in account that the "line breaker" character is different for some OS. In Windows, by default, there are two characters that are used together for a line break.

Reading special column while avoiding some special characters in text file in C/C++

Can somebody help me about this please? :
I have a text file in this form:
"1.101511000000E+02","-3.066300000000E+01"
"8.328840000000E+01","-7.020080000000E+01"
"1.053746000000E+02","-4.622800000000E+01"
"1.314320000000E+01","-7.866200000000E+01"
"9.876160000000E+01","-5.844720000000E+01"
"3.129990000000E+01","-7.919930000000E+01"
"7.152530000000E+01","-7.527770000000E+01"
"2.849310000000E+01","-7.933210000000E+01"
"7.602290000000E+01","-7.410480000000E+01"`
it has 4003 lines. I want to read these columns like avoiding the characters: '"' and ','.
Then I read the signs + or - and recognize the 'E' to make the power and write my results to another file and then use it later. I give you an example of what I need:
1.101511000000 +02 -3.066300000000 +01
And then what I have to do is for example once I get the first column :
1.101511000000
and when I know that it's power 2 I do this:
1.101511000000 x 10²
and what I will write to my new file is:
110.1511 -30.663
83.2884 -70.2
105.3746 -46.228
etc.
So the main questions are:
1) How can I read this text file and avoiding these special characters?
2)finding the power written in form e.g E+02?
3)Doing the calculations?
4) avoiding all redundant 000 in float?
5) putting the results at the same time in a new file?
I appreciate you all in advance, but please consider that fact which this is very urgent and important for me now.
Mojdeh
You can use fscanf to read the two numbers with their respective exponents.
Something like this:
fscanf(pFile, "\"%f\", \"%f\"", &number1, &number2);
Then all you have to do is write it back to the second file using fprintf, like this:
fprintf(pFile2, "%g %g", number1, number2);
The %g format prints the number using the shortest representation.
For more information on these functions go to http://www.cplusplus.com/reference/cstdio/fprintf/.

How do I get specific lines from a text file in C++?

I need some help with C++
I am trying to create a program which contains excersises to practice the different German cases.
Hard-coding all questions and respective answers seems like an awful lot of work, and super inefficient.
What I want my program to do, is: grab a random line from file X, and grab the same line number from file Y. (This seems like the easiest way to get both questions and answers from external files.) To me, it seems the most logical to get a random number, and use that as a line number. But, that's about how far I got...
I know basic C++, but am very eager to learn.
Can anyone please explain to me how to pull this off, including all necessary command?
First, I would recommend that you store questions and answers in the same text file, probably by alternating between a question line and then an answer line. This will make correcting mistakes, adding/removing questions, and general maintenance of your data easier.
But if you want to keep them in separate files, the following code snippet will read your text file in and store the questions in an array (an stl vector) which you can then index or iterate any way you'd like:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
int main()
{
std::ifstream file("questions.txt");
std::string line;
std::vector<std::string> questions;
while (std::getline(file, line))
{
questions.push_back( line );
}
// Now do something interesting with your questions. You can index them
// like this: questions[5], or questions[random_index]
}
There are two ways of doing this:
If you are planning on getting question/answer pairs you would be best to just read the who file line by line and store all the lines. Then you just look it up in the array.
If for some reason you only want to get one line at a time you'll have to read lines and count until you've gotten to the line you want.
you may have a keyword for each line, like an id.
that id can be paired to both questions, and answers if you have multiple files. or just pair the question, with the answer same order, or even same file.
You are constructing a database.
You should use a database.
The problem is that the question and answers are variable length records, which make positioning difficult. If all the records were the same length, you could position to a random record much faster.
In order to find a text line, you will need to read past all the other newlines (since they are not in the same column in every line). This is fine if you only need to search once, but very slow to search many times. Now comes the reason for the database.
To make finding the questions and answers faster, create an index file or table. (Starting to smell like a database). The index file will contain records of the form [question #, file position] where file position is the position in the question's file that the question starts on.
You would load this file into memory and use it to index into the "questions" file. By storing the index contents into a file, you won't have to construct it from scratch each time your program starts; only when the question's file changes.

Retrieving file from .dat via getline() w/ c++

I posted this over at Code Review Beta but noticed that there is much less activity there.
I have the following code and it works just fine. It's function is to grab the input from a file and display it out (to confirm that it's been grabbed). My task is to write a program that counts how many times a certain word (string) "abc" is found in the input file.
Is it better to store the input as a string or in arrays/vectors and have each line be stored separately? a[1], a[2] ect? Perhaps someone could also point me to a resource that I can use to learn how to filter through the input data.
Thanks.
input_file.open ("in.dat");
while(!input_file.eof()) // Inputs all the lines until the end of file (eof).
{
getline(input_file,STRING); // Saves the input_file in STRING.
cout<<STRING; // Prints our STRING.
}
input_file.close();
Reading as much of the file into memory is always more efficient than reading one letter or text line at a time. Disk drives take a lot of time to spin up and relocate to a sector. However, your program will run faster if you can minimize the number of reads from the file.
Memory is fast to search.
My recommendation is to read the entire file, or as much as you can into memory, then search the memory for a "word". Remember, that in English, words can have hyphens,'-', and single quotes, "don't". Word recognition may become more difficult if it is split across a line or you include abbreviations (with periods).
Good luck.