How to prevent Amazon SageMaker from splitting my .txt file into lines?

How to prevent Amazon SageMaker from splitting my .txt file into lines? - amazon-web-services

I want to create a labeling job for workers to label my text data. Each text file should be labeled as an entity. SageMaker seems to split my text into lines, so each line can be labeled, which does not make any sense for my project. I used GroundTruth option ‘Create a labeling job’ and could not find any configuration options to prevent the splitting.

Firstly replace all the new line characters in your text i.e "/n" with a <br/> tag. Then you will need to create a custom labelling job , also you can choose from the pre-defined templates for the initial code. Inside the tag just include "skip_autoescape" it will help in considering the <br/> as the line break and you can see the desired output as a single entity.
Follow below docs for more references :
https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html

Related

I am using aws textract StartDocumentTextDetectionCommand and GetDocumentTextDetectionCommand. I want only lines to be returned, not the single words

I am creating an OCR internal tool using aws textract and nodejs to detect text from a scanned pdf, specifically StartDocumentTextDetectionCommand and GetDocumentTextDetectionCommand. Currently returned in a list of block objects with the lines first and then starts detecting each word by word. Is there any way for me to add in a parameter or something where it will just return the lines for me and not the word by word in the pdf.

I would suggest to use the Amazon Textract Textractor library pip install amazon-textract-textractor
It makes parsing and using the Textract output much easier than the raw JSON.
from textractor import Textractor
extractor = Textractor(profile_name="default")
document = extractor.detect_document_text('test.png')
print(document.lines)

No, this is not possible. There are multiple block types, lines link to words via relationships.
Is there some reason why you cannot simply select only the block types you are interested in (lines)?

Response will always contain the lines and words. But you can iterate the response['Blocks'] and find only the blocks with BlockType == 'LINES'.
Eg. below:
for block in response["Blocks"]:
if block["BlockType"] == "LINE":
print(block)

Read-in df from csv before launching main app | Dash

I am trying to get my first dashboard with python dash running.
The whole thing is very similar to this https://github.com/dkrizman/dash-manufacture-spc-dashboard.
At the beginning a Dataframe is read in from a csv. My problem seems to be quite easy to solve but somehow I am not succeeding:
I want to create a initial window that allows the user to select (from e.g. dropdown) the csv file (or accordingly the path) that is read in. All the .csv files look the same but just have different values.
When using the modal components I get problems with the install of bootstrap and I thought there must be an easier way?
Thanks for your help!
Best,
Nik

Grabbing Text from Sublime Text Regions

I'm new to creating Sublime Text 3 Plugins and I'm trying to understand the Plugin API that's offered. I want to be able to "grab" text I highlight with my mouse and move it somewhere else. Is there a way I can do this using the Sublime Text Plugin API?
So far, all I've done is be able to create a whole region:
allcontent = sublime.Region(0, self.view.size())
I've tried to grab all of the text in the region and put it into a lot file:
logfile = open("logfile.txt", "w")
logfile.write(allcontent)
But unsuccessfully of course as the log file is blank after it runs.
I've looked over google and there is not a lot of documentation, except for the unofficial documentation, in which I can't find a way to grab the text. Nor are there many tutorials on this.
Any help is greatly appreciated!

A Region just represents a region of text (i.e. from position 0 to position 10), and isn't tied to any specific view.
To get the underlying text from the view's buffer, you need to call the view.substr method with the region as a parameter.
import os
logpath = os.path.join(sublime.cache_path(), 'logfile.txt')
allcontent = self.view.substr(sublime.Region(0, self.view.size()))
with open(logpath, 'w') as logfile:
logfile.write(allcontent)
print('written buffer contents to', logpath)
To get the region represented by the first selection, you can use self.view.sel()[0] in place of sublime.Region(0, self.view.size()).

How to edit text file data with c++

I have a program that create a text file of stock items, which contains detail of 'total production' , 'stock remaining' and so on. Now my question is how do I edit that text file with my program. For example if I mistake to enter a correct data (like production was 500 pieces but enter only 400) now how can I edit my file to make it correct without effecting other data.

You probably should not create a text file in the first place. Did you consider using sqlite (or indexed files à la GDBM ...) or some real database like PostgreSQL or MongoDb?
If you insist on editing programmatically a textual file, the only way is to process every line : either keep all of them in memory, or copy them (except the one you'll change) to some new file.... But there is no portable way to change the content of a file in the middle.
You might also be interested in textual serialization formats like JSON, YAML (or maybe even XML).

PDFlib first line indent

I using PDFLib as pdf renderer and IDML format from InDesign as input format and my question: is parameter
LeftIndent
only possible way how to create in bullet list indent? I have found in CookBookPdfLib, that PDFLib supports LeftIndent but no such property as in InDesign
FirstLineIndent
I mean that if i have multiple lines with one bullet, on the first line i can apply (in InDesign), so called FirstLineIndent, which moves only first line of text after one bullet.
Example of functionality: image.
With PDFLib i have to create this functionality on my own or it is implemented?

you might check out the option "parindent". Check out the PDFlib API reference for details.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to prevent Amazon SageMaker from splitting my .txt file into lines? - amazon-web-services

Related

I am using aws textract StartDocumentTextDetectionCommand and GetDocumentTextDetectionCommand. I want only lines to be returned, not the single words

Read-in df from csv before launching main app | Dash

Grabbing Text from Sublime Text Regions

How to edit text file data with c++

PDFlib first line indent

Categories

Resources