Imacros Writing Double Quotes to html File, why? - imacros

I have imacros go on a webpage and download the extraction as an html file. But for some reason it is writing doubles quotes inside the file breaking links. How to fix this?
EXAMPLE:

Use JavaScript or EVAL
the code is text=text.replace(/""/g,'"');
http://wiki.imacros.net/EVAL
Edit:
Try this out to:
text=text.replace(/\"\"/g,'"');
Some special characters are written like this \s \n \r /

Related

Why isn't Atom recognizing my regular expressions?

I'm using Atom to format some text data for analysis (I know there are probably better ways of doing it than this so I'm all ears) but it doesn't seem to be recognizing my regular expression.
The text is POS tagged tokens with sentences being delineated with newlines, formatted as such:
good\tJJ\n
workout\tNN\n
.\t.\n
''\t''\n
\n
Perhaps\tRB\n
the\tDT\n
I was able to replace all of the tabs (\t) with a front slash (/) no problem, but I'm now trying to turn all newlines that DON'T delineate sentences with just a space. I tried \S\n and it "wasn't found". I also tried to highlight all delineating newlines with ^\n$ but there were only two matches and only at the end of the document.
Am I doing this wrong? My only usage of regex is with Python, so maybe there's just a different way to do it in Atom.
EDIT: I'm just giving up and gonna use Python to process it. Nothing suggested work. The search function seemed to just be bugging out in general (e.g. one search would not work but then if I closed the search function and reopened it, the same search would work) because it's a long file (700,000+ lines) despite it not being a large file, data-wise (6,235 KB). If anyone can recommend a large file text editor, though, it would be appreciated.

How can I add html into VBA?

I'm trying to parse through some HTML code that I'd like to paste into my VBA function. Essentially, i want to dim a string and then let the string = [INSERT HTML HERE]. Except if i tried to do that VBA would pick up on quotation marks and other symbols as VBA code and then give me a compile error of some sort.
I would parse the html first and either add quotation marks wherever i find a quotation mark or exit out all the symbols, but that would require pasting the html into a compiler that will have the same problem as before. I really can't figure this out!
What can be done about this? Is there a way for me to modify my regex pattern to automatically exit out the symbols? I'm really at a loss. I had one idea, but i can't figure out how to do this either. If i could say string=[FILE CONTAINING HTML] then that would work as well. any suggestions please.
Just open the HTML file and read it into a string variable. Here's an example.
http://www.vbforums.com/showthread.php?342619-Classic-VB-How-can-I-read-write-a-text-file

regexp to replace LF within quotations

I was looking for some help in regards to a csv file that i am trying to upload into a database. The problem I have is that within a csv I have a field of text with quotations and within this text I have a problem where users have added a carriage return (LF) and commas so the database is having some problems in adding the data to the correct fields. What I would like to do, is replace any (LF) within quotations with a space using regular expressions. I have had a look at the following link:
Seeking regex in Notepad++ to search and replace CRLF between two quotation marks ["] only
but the example shown doesnt seem to tackle the problem. If possible can somebody please advise how i can fix this issue.
Thanks in advance.
Try this:
Find What: (\"[^"]*?)(\r\n)([^"]*?\")
Replace With: $1 $3
thanks for all your help. I managed to open the file in Excel and the column that had the (LF) I wrote the formula =CLEAR(cell) and this brought everything into 1 line and when I opened the same file the in Notepad++ the issue was no longer there.
Thanks for taking your time to help me out, really appreciate it.

Remove empty lines in eclipse code editor by find/replace (Ctrl+F)

I want to remove all blank lines from my code by find/replace method in eclipse code editor.
I used regular expression \n\s*\n to find all blank lines but got error "Incompatible line delimiter near index 55110" when replacing the blank line with any string.
Why i got this error and how to properly remove the blank lines? What will the working replacement character ?
Is there any eclipse plugin for these kind of job?
I am not sure if it is the answer to your specific problem but the solution with the \r\ .. indicates it is an incompatibility between Windows and UNIX text encoding . So a simple solution will be to convert the file to UNIX encoding
In Eclipse Menu -> File -> Convert line delimiter -> Unix
You can try replacing this:
^\s*\r?\n
with the empty string.
try using \R instead of \n
\R Any Unicode linebreak sequence \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
I tried your expression, and it combined some lines. I found this one to work:
\n\s*$
with a replacement of [nothing].
Can't help with the mysterious error, though. I wonder if you have a corrupt file, maybe a stray CR/LF confusion.
(As for a plugin... don't know of any, but, well, learn awk, sed, perl... they'll always serve you well for your miscellaneous text-mangling jobs.)
In response to the first part of your question about the incompatible line delimiter near index error, Eclipse seems to have an issue with Replacing the given line delimiter depending on the Text file encoding and New text file line delimiter settings.
I had an issue where a Windows application mistakenly formatted UNIX-formatted source files, inserting CRLF wherever it saw fit. Because of the particular situation I had to replace all of the CRLF's with space. Eclipse wouldn't allow me to do this because of that error, but grabbing the preceding and succeeding characters did the trick:
Find : (.)\r\n(.)
Replace: $1 $2
Using wjans suggested answer:
Find : ^\s*\r?\n(.)
Replace: $1
I hope this helps with those of you still getting the incompatible line delimiter error.
This worked for me for years:
Replace: [\t ]+$
With blank
I've been having this problem (or variations of it) for years and I suspect it's caused by sharing fileservers with Mac users, specifically users of Dreamweaver (graphic artists basically). It looks like it changes the files it edits (uploads?) to mixed/weird line endings that appear to be a combination of NL+CR (hex 0a0d), double-CR (0d0d) and solidary newlines (0a).
If you opened the same file in vim it isn't double-spaced BUT the lines all end with an ^M symbol.
Anyway, none of the solutions on this page worked for me but I found something that does.
You need to perform these steps in order (Eclipse 4.2.2)
1.) File -> Convert Line Delimiters To -> MacOS 9 (CR, \r)
2.) Edit -> Find / Replace (Ctrl - F)
Find: \r$
Replace: leave blank
3.) Replace All
If you don't do it in order or you mess with the file first you'll get an error about "incompatible line delimiters" like in the question.

In Yahoo-Pipes, how to use regex when you can't see non-printable characters and html tags?

I keeping having the problem trying to extract data using regex whereas my result is not what I wanted because there might be some newlines, spaces, html tags, etc in the string, but is there anyway to actually see what is in the string, the debugger seems to show only the real text. How do you deal with this?
If the content of the string is HTML then debugger gives you a choice of viewing "HTML" or "Source". Source should show you any HTML tags that are there.
However if your concern is white space, this may not be enough. Your only option is to "view source" on the original page.
The best course of action is to explicitly handle these possibilities in your regex. For example, if you think you might be getting white space in your target string, use the \s* pattern in the critical positions. That will match zero or more spaces, tabs, and new lines (you must also have the "s" option checked in the regex panel for new lines).
However, without specific examples of source text and the regex you are using - advice can only be generic.
What I do is use a regex tester (whichever uses the same regex engine that you are using) and I test my pattern on it. I've tried using text editors that display invisible characters but to me they only add to the confusion.
So I just go by trial and error. For instance, if a line ends in:
</a>
Then I'll try the following patterns on the regex tester until I find one that works:
</a>.
</a>..
</a>\s
</a>\s*
</a>\n
</a>\r
</a>\r\n
Etc.