I have started a news website for a specific area of business one year ago. The website lists news and for every post there is a featured image. Unfortunately, there have been posted about 1500 news in a year and the website is taking 1,07Gbytes of space. This seemed totally insane to me as joomla is had been only some Mbytes and there were no big additions from my side (like files or graphics etc.).
I did a HUGE mistake. I trusted the joomlaworks guys and installed K2. The main reason I did this was that the default joomla article manager did not offer to save a featured image for each post. But this was added in the new 3.0 version !
K2 does something extremely foolish. If you save a photo of 2 Mbytes, then it will save the original and 4 additional ones, one for each size (Small, Large, Medium, XL). Insanely, you upload a 2Mb image and it ends occupying 4Mbytes of space !
The hosting provider gives me 2Gbytes of space to store my files. I have started to lose my sleep at night because the space expands day by day and If gone beyond 2Gbytes, I will have to upgrade the hosting plan and I do not have the money to do this.
I believe I have three choices:
Move all items, categories, images from K2 back to Joomla articles that is much faster and then upgrade to version 3.0 which supports featured images. This seems extremely difficult and I do not know If it's possible at all. Even If I move all table rows from K2 to Joomla, I don't feel comfortable with 1500 ones and the images' paths are not saved in the db. Chaos.
Move everything to wordpress. No idea how to do that at all.
Compress the images that are in cache or search for ways to stop K2 doing that.
k2 saves the images in two different folders. One folder holds exclusively the originals, and the other folder holds all the resized versions. Technically you can just delete the folder with the originals because those are not the ones used in the articles or anywhere else on the website. Let's not speak poorly of k2 because they save the originals. I think it's a good feature. I once needed to go into that folder on my host and find a file that was deleted from my computer. Also you could easily in the future use that folder to rebuild all the resized files in case you want to change the sizing in layout.
I would just back up the folder every once in a while and delete the copy on your host. That should save a lot of space. Also you can set an option that the resized files are lowered in quality so they don't take up so much space. There is an option in the back-end. At 70-80% the photo quality is still great.
Why do you think that creating small, medium and big image is extremely bad? Do you actually have preview of the image, where it appears in a smaller size? If so - this is a wise way to do it.
If you really do not use any of smaller images - I would recommend go line by line through K2 plugin (or whatever it is) and find where exactly are the lines with saving these additional images and comment them.
Just another thing. How you ended up with 2 Mb images for a news site. In my opinion this should be really high resolution images, because the normal size is like 300kb.
In the folder www__TemplateName__\media\k2\items you will see two subffolders "cache" and "src" - the later one is for source files. Contents can be safely moved out to a local drive once a month. Yet, I'd say if you end up with 1.5k news, datadase would take lots of space too. And most hosters count database space in too. And you won't be able to do ANYTHING about it - you just can't throw away DB...
Then, most likely you have email server on the same host even though you don't use it (probably). If you have 1.5k news in one year - I can imagine how much spam would end up in your mail folder - that takes space out of your 2 gigs at hoster... Check your mail folder - kill all what you don't need there...
You're saying "I need an answer from a joomla expert here" - Joomla expert won't tell you much - k2 expert is needed. And the answer was given - reduce the quality of image cashed by k2 to 70% - it'll do just great - save lots of space and the quality drop won't be visible - this setting is set once and works for all authors...
In the case with DB, I'd highly recommend to have http://extensions.joomla.org/extensions/access-a-security/site-security/site-protection/14087 installed and then click Clean Temp and Repare Tables - it helps too.
Then, one other thing is to batch resize files in original's folder for k2 - there're tonns of different scripts for that in the pit of internet. Run it there from time to time and crazy big files from your users will shrink unbelievably!
But most of all - in these days, having a host of 2gigs??? That's crazy low. In my case $50 per year give me 6gigs - and that's not the cheapest host here... So... Change your host!
Related
My Mac's hard drive failed and I had a replacement drive installed. All R related programs were installed as if my computer was brand new.
Prior to the failure, if I inserted a graphic into a RMardown document, the rendered size of the graphic was consistent and directly related to the actual size of the screen shot image.
Now, relatively small screen captures render large in size. I attempt to add a pic to show the problem. Not certain what I need to do to fix this situation. This is a book that will soon go to publication. As is stands, the publication is delayed unless I am able to somehow return to 'normal.' A Time Machine restore is out of the question for reasons I cannot go into, even though that was attempted. Several Apple senior advisors with whom I spoke do not recommend the 'Time Machine' restore option. As such, I seek another solution - hopefully not individually resizing 1,200+ text images. pic of mis-sizing
My application stores MANY MANY images in S3 - we use Rails 5.2 ActiveStorage for that. The images are used a lot for 6 to 9 months. Then they are used VERY rarely until they are 15 months old and deleted automatically by ActiveStorage.
To save some money I'd like to move the files from 'S3-Standard' to 'S3-Infrequent Access (S3-IA)' after 9months of the file creation (This can be done automatically in AWS).
My question is: Will ActiveStorage still be able to find/display the image in 'S3-IA' in the rare case someone wants to see it? Will ActiveStorage still be able to find the file to delete it at 15months. Bottom Line: I don't want ActiveStorage to loose track of the file when it goes from 'S3-Standard' to 'S3-IA'
S3-IA just changes the pricing of an object. It doesn't change the visibility of the object, or the time needed to retrieve it (unlike GLACIER storage class).
One thing to be aware of is that IA pricing is based on a minimum object size of 128k. If you have a lot of objects that are smaller, then your costs may actually increase if you save them as IA.
docs
I haven’t tested, but Active Storage should be able to find the object as long as its name doesn’t change.
i'm working on an academic project(a search engine), the main functions of this search engine are:
1/-crawling
2/-storing
3/-indexing
4/-page ranking
all the sites that my search engine will crawl are available locally which means it's an intranet search engine.
after storing the files found by the crawler, these files need to be served quickly for caching purpose.
so i wonder what is the fastest way to store and retrieve these file ?
the first idea that came up is to use FTP or SSH, but these protocols are connection based protocols, the time to connect, search for the file and get it is lengthy.
i've already read about google's anatomy, i saw that they use a data repository, i'd like to do the same but i don't know how.
NOTES: i'm using Linux/debian, and the search engine back-end is coded using C/C++. HELP !
Storing individual files is quite easy - wget -r http://www.example.com will store a local copy of example.com's entire (crawlable) content.
Of course, beware of generated pages, where the content is different depending on when (or from where) you access the page.
Another thing to consider is that maybe you don't really want to store all the pages yourself, but just forward to the site that actually contains the pages - that way, you only need to store a reference to what page contains what words, not the entire page. Since a lot of pages will have much repeated content, you only really need to store the unique words in your database and a list of pages that contain that word (if you also filter out words that occur on nearly every page, such as "if", "and", "it", "to", "do", etc, you can reduce the amount of data that you need to store. Do a count of the number of each word on each page, and then see compare different pages, to find the pages that are meaningless to search.
Well, if the program is to be constantly running during operation, you could just store the pages in RAM - grab a gigabyte of RAM and you'd be able to store a great many pages. This would be much faster than caching them to the hard disk.
I gather from the question that the user is on a different machine from the search engine, and therefore cache. Perhaps I am overlooking something obvious here, but couldn't you just sent them the HTML over the connection already established between the user and the search engine? Text is very light data-wise, after all, so it shouldn't be too much of a strain on the connection.
Let me start by giving a quick background on myself (please forgive me). I have an intense interest in programming and computers/technical things in general. I took a year of C/C++ in college and a semester of assembly. I have messed around with Visual BASIC. So, almost all of my programming knowledge is limited to these three languages in order of proficiency:
C/C++
Assembly
Visual BASIC
I have a job at a small business that can't justify hiring a trained/"certified" programmer where I have tasked myself with automating a process that must be completed on a monthly basis. It involves:
Sending faxes that are to be filled out with numbers
Receiving those faxes that are returned (all incoming faxes go to network folder as PDF)
Collecting the numbers from received faxes and entering these numbers into Excel (some are Word format for some reason) and then into QuickBooks after calculations
Sending emails
Receiving replies to these emails that contain numbers
Manually entering these numbers into Excel and then QuickBooks after calculations
Collecting numbers from a website written in Javascript. Numbers from website can be outputted to *.csv file.
Finally, printing invoices out from QuickBooks using the calculated numbers that have been entered.
My goal is to automate this entire process. As of now, everything is done manually. Emails and faxes are sent one at a time. Numbers from website are read and entered into Excel one at a time. Numbers are put into QB and invoices are printed one at a time.
So far I have added an email scheduling add-on to Outlook that automatically sends the emails every month. I am working on setting up faxes to be sent automatically (the only thing I can think of off the top of my head is manipulating Windows Scan/Fax with API library in either VB or VC++).
Also, I am automating the calculations that must be performed in order to prep the collected numbers for entry into QB using VBA/Excel and, potentially, Access.
Right now I'm brainstorming a way to automatically collect the numbers (along with customer name) from the returned faxes. My idea was to create a new fax sheet that forced the customer to "bubble in" the numbers like a ScanTron sheet. This way I could write a program (perhaps in C++) to parse the PDF looking for a certain colored pixel in a specific spot in order to piece together the number (I wonder if I could automatically OCR the PDFs and collect the customer name simply by extracting text from each PDF?) which could then be sent to a database or perhaps directly to an Excel sheet (the Excel sheets have to stay so that hard copies of data can be printed--though I supposed this could be accomplished without Excel).
And lastly, since some customers refuse to use any of those methods available to them, we have to manually call some of them. Once I am finished with all of the aforementioned work I would like to develop a way to allow customers to call a specific phone number and key in the information via voice prompt which would then deposit the information in database somewhere. This will be complicated and require special equipment so it will be last and lowest priority. Not worried about this right now.
Since my experience with programming is only moderate (though I'm sure my working knowledge will expand quickly once I get started since a lot of it is already in my brain somewhere) I wanted to give myself the best advantage and tools possible to tackle this project before I got so far into it that changing my methods would waste a lot of time/work. To sum up, I need to make an outline of exactly what I need to do/learn and what techniques/applications to use.
This is the site I always come to when searching for my programming questions and I have come to the conclusion that the people here are generally extremely knowledgeable, patient and helpful. I will appreciate any contribution of information, advice and/or insights no matter how small. I realize that in this situation I am the "beggar" and thus will be grateful for whatever I get.
Thanks in advance.
P.S. Before anyone says anything: I have "UTFSE" extensively and have assimilated lots of info from it. However, we all know that there's no equal to a human's problem solving capabilities--especially when proficient in the specific field.
Nice work! You are definitely on the right track. That was a lot of information so I apologize if I repeat anything you already know.
1) Faxes - Microsoft has an excellent resource for learning how to send faxes (they even provide the code). Check this out: http://msdn.microsoft.com/en-us/library/windows/desktop/ms693482(v=vs.85).aspx
2) You will have to OCR the PDF's (as you mentioned) and then you can extract the information. But (as you seem to understand), you cannot modify a pdf with c++.
3) C++ does allow you to save (and open) a file in Excel format. However, it's a very complicated format and will probably cause some problems. One of them is that it will want to save all of your data to one cell. A way to get around this is to I/O to Excel with .csv files. A comma separates the columns and a new line the rows. For example,
A1, B1, C1
A2, B2, C2
A3, B3, C3
Excel will open and read these files correctly. However, you won't be able to format font, borders, etc... automatically.
This is the extent of my knowledge, I have never worked with emails or Quickbooks. Hope it helps!
I am looking for a very general answer to the feasibility of the idea, not a specific implementation.
If you want to serve small variations of the same media file to different people (say, an ePub or music file), is it possible to serve most of the file to everybody but individualized small portions of the file to each recipient for watermarking using something like Amazon WS.
If yes, would it be possible to create a dropbox-like file hosting service with these individualized media files where all users “see” most of the same physical stored file but with tiny parts of the file served individually? If, say, 1000 users had the same 10 MB mp3 file with different watermarks on a server that would amount to 10 GB. But if the same 1000 users were served the same file except for a tiny 10 kB individual watermarked portion it would only amount to 20 MB in total.
An EPUB is a single file and must be served/downloaded as such, not in pieces. Why don't you implement simple server-side logic to customize the necessary components, build the EPUB from the common assets and the customized ones, and then let users download that?
The answer is, of course, yes, it can be done, using an EC2 instance -- or any other machine that can run a web server, for that matter. The problem is that any given type of media file has different levels of complexity when it comes to customizing the file... from the simplest, where the file contains a string of bytes at a known position that can simply be overwritten with your watermark data, to a more complex format that would have to be fully or partially disassembled and repackaged every time a download is requested.
The bottom line is that for any format I can think of, the server would spent some amount of CPU resources -- possibly a significant amount -- crunching the data and preparing/reassembling the file for download. The ultimate solution would be very format-specific, and, as a side note, has really nothing to do with anything AWS other than the fact that you can host web servers in EC2.