How to detect a tab in a text file? - c++

Is detecting tabs same as detecting the spaces? i.e. for detecting a space, I would just compare the space character with its ascii number.
For a tab do I have to search for '\t' character in the file or there is some other way?

if('\t' == myChar)
This would work, and would be better than checking against 9 since 9 may not be a guaranteed value across all architectures.

Assuming you are working with ASCII data, you can just search for a byte with value '\t' (9) in the text file. Tabs are represented as a single byte in text files and most libraries for reading files don't do anything special with those bytes.

A tab is just another character so you can check for the ASCII value if you want.

Although a tab appears as 4 or 8 spaces in an editor, it is actually represented as a single character ('\t', like you mentioned) inside a file. Both the space character and the tab character take up one byte. So basically, u are correct in your assumption.

Related

how to replace char with other in hexdecimal

I'm a new user who using mainframe, I have a file and I need to change all dots '.' in file with space, I was trying to write this statement on command
change X'05' X'40' all
after I converted the file to hexdecimal, but It doesn't work.
How can I change all the dots with space in file, in simple way please?
The dots are non-displayable characters. You can match them using picture strings in the ISPF editor (which is what I assume you're trying to use to edit the file?)
Try the command
change p'.' ' ' all
The "p'.'" part will match any non-displayable character and change it to a blank.
Hans answer above will certainly change any non-displayable character to a space. However you need to make sure you really want to change all non displayable characters to a space. Turn HEX ON to look at the actual data. You can then do a F p'.' to find the non-displayable character(s) prior to changing it. Browse shows non-displayable characters as a dot. However Edit would replace the value with an attribute for display purposes and this keeps you from typing over the data. You have to turn on HEX mode to manually modify the non-displayable value or use the Change command as you were trying. Typically any hex value from x'00' - x'3F' would be non-displayable. So a
C P'.' X'40' ALL
would modify every one of those values to a space. This may or may not be desirable depending on the file.

C++ get the size (in bytes) of EOL

I am reading an ASCII text file. It is defined by the size of each field, in bytes. E.g. Each row consists of a 10 bytes for some string, 8 bytes for a floating point value, 5 bytes for an integer and so on.
My problem is reading the newline character, which has a variable size depending on the OS (usually 2 bytes for windows and 1 byte for linux I believe).
How can I get the size of the EOL character in C++?
For example, in python I can do:
len(os.linesep)
The time honored way to do this is to read a line.
Now, the last char should be \n. Strip it. Then, look at the previous character. It will either be \r or something else. If it's \r, strip it.
For Windows [ascii] text files, there aren't any other possibilities.
This works even if the file is mixed (e.g. some lines are \r\n and some are just \n).
You can tentatively do this on few lines, just to be sure you're not dealing with something weird.
After that, you now know what to expect for most of the file. But, the strip method is the general reliable way. On Windows, you could have a file imported from Unix (or vice versa).
I'm not sure that the translation occurs where you think it is. Look at the following code:
ostringstream buf;
buf<< std::endl;
string s = buf.str();
int i = strlen(s.c_str());
After this, running on Windows, i == 1. So the end of line definition in std is 1 character. As others have commented, this is the "\n" character.

Scintilla: How do you find the byte position given a specific character position

Given a specific character index on a line, e.g. 10th character on line 3, is there an easy way to calculate Scintilla's 'position' of that character?
It's straight forward when using ASCII characters but I can't see an easy way to do it when using multi-byte UTF-8 characters, where a single character may take up several byte positions.
Convert line text to UTF8 and then count the byte positions. Cache conversion if multiple requests may be made.
You should start at the beginning of the string and index into the string however many bytes correspond to the character in the current position, (so that you now index the next character), and keep a count of how many characters you have seen so far. This linear-time indexing is one of the drawbacks of UTF-8. Maybe Scintilla already has a facility to do this.
Did you tried: SCI_FINDCOLUMN ?:
SCI_FINDCOLUMN(int line, int column)
This message returns the position of a column on a line taking the width of tabs into account. It treats a multi-byte character as a single column. Column numbers, like lines start at 0.

Delimiting Character

We are loading a Fixed width text file into a SAS dataset.
The character we are using to delimit multi valued field values is being interpreted as 2 characters by SAS. This breaks things, because the fields are of a fixed width.
We can use characters that appear on the keyboard, but obviously this isn't as safe, because our data could actually contain those characters.
The character we would like to use is 'ยง'.
I'm guessing this may be an encoding issue, but don't know what to do about it.
Could you use the keycode for the character like DLM='09'x and change 09 to the right keycode?

ASCII Value for Nothing

Is there an ascii value I can put into a char in C++, that represents nothing? I tried 0 but it ends up screwing up my file so I can't read it.
ASCII 0 is null. Other than that, there are no "nothing" characters in traditional ASCII. If appropriate, you could use a control character like SOH (start of heading), STX (start of text), or ETX (end of text). Their ASCII values are 1, 2, and 3 respectively.
For the full list of ASCII codes that I used for this explaination, see this site
Sure. Use any character value that won't appear in your regular data. This is commonly referred to as a delimited text file. Popular choices for delimiters include spaces, tabs, commas, semi-colons, vertical-bar characters, and tilde.
In a C++ source file, '\0' represents a 0 byte. However, C++ strings are usually null-terminated, which means that '\0' represents the end of the string - which may be what is messing up your file.
If you really want to store a 0 byte in a data file, you need to use some other encoding. A simplistic one would use some other character - 0xFF, for example - that doesn't appear in your data, or some length/data format or something similar.
Whatever encoding you choose to use, the application writing the file and the one reading it need to agree on what the encoding is. And that is a whole new nightmare.
The null character '\0' still takes up a byte.
Does your software recognize the null character as an end-of-file character?
If your software is reading in this file, you can define a place holder character (one that isn't the same as data) but you'll also need to handle that character. As in, say '*' is your place-holder. You will read in the character but not add it to the structure that stores your data. It will still take up space in your file, but it won't take up space in your data structure.
Am I answering your question or missing it?
Do you mean a value you can write which won't actually change the file? The answer is no.
Maybe post a little more about what you're trying to accomplish.
it would depend on what kind of file it is and who is parsing it.