Save and vacuum

Home Forums cherrytree Save and vacuum

Tagged: ,

This topic contains 15 replies, has 4 voices, and was last updated by  eureka 5 months, 3 weeks ago.

Viewing 15 posts - 1 through 15 (of 16 total)
  • Author
    Posts
  • #124680

    AndyChow
    Member

    What does the save and vacuum feature do? It doesn’t seem in the documentation, and the search on this site says nothing. I saw some entries in the changelogs, but none that described what it does. I’m sorry if I’m asking an obvious question, but I can’t figure it out.

    #124681

    Cosmo
    Member

    Out of some reason the space in the database does not get free, if data gets deleted, but the database backend (here sqlite) marks the space as unused. Especially in case, where you have importet very big files, you would get a file, which is more and more growing, even if you would delete the complete content.

    Before CT 0.38 it as necessary to write a new file via “save as”, now the vacuum command does the same for the current file.

    #124685

    Klaas Vaak
    Member

    @cosmo: it sounds to me that that represents some sort of bug. The previous “solution” you mention, “save as”, was, in fact, a “workaround”. Now “save & vacuum” is a shortcut “workaround”, but still a “workaround” to a bug.

    If it is a bug the best ultimate solution would be to get rid of the bug, unless it is a specificity of SQLITE.

    #124687

    Cosmo
    Member

    No, it is no bug and it is not specific to sqlite.

    A database is meant to store big amount (possibly huge amount) of data. But usually if you change / change / delete anything in the database, this is usually only a small (possibly tiny) fraction f the complete database. If now for each and every change the complete database file would have to get written to disk it would take a noticeable time. You wouldn’t like it. So only the changed part gets written, in case of deletion some kind of white space, which overwrites that, what you have deleted. That is the technical consequence how a database works – in contrast to file like e. g. text files where indeed the complete content gets saved, if the file gets saved.

    You find this problem also with other databases inclusive for example for the Windows registry, which is also a database file.

    In principle it would be thinkable to automate the vacuum (I prefer the word “compacting”) process, but it is easy to imagine, that this might than happen in the very wrong moment, where you are working and you need quick reaction by the database, which would during the re-write process be impossible.

    The problem appears in my experience with CT nearly only, if you had inserted big files (e. g. pictures) and delete them at a later time. As long as you use CT only for text (inclusive tables) it should be enough to compact the database perhaps once per year.

    #124688

    Klaas Vaak
    Member

    @cosmo: thanks for your expansive reply, it is clearer to me now.
    What still puzzles me is that, on the one hand, you say “That is the technical consequence how a database works…”, but on the other hand you say “The problem appears in my experience with CT nearly only…”.
    Those 2 statements are contradictory, which makes me think that somehow this is a CT bug.

    #124689

    Cosmo
    Member

    No, this is not a contradiction. I admit, that the last paragraph in my previous post had been written with your delay problem in mind and would perhaps been better placed in the other thread.

    My CT database is 6 MB great and – with spellchecking disabled – I have no delays (used in a Linux system). In my database is only text (with code-boxes, tables and links), so no imported files. But I did some time ago (AFAIR 2 or 3 years ago) some tests during beta testing a new release of CT where I imported for testing very big files, about 1 GB in size and I stumbled about some problems. IIRC giuspen added the vacuum feature to CT as a consequence of the discussion, which we did at that time. This is the background for the last paragraph in my previous post.

    To my remembrance I did the compacting only once, at the time after the release of CT 0.38.0. I actually have forgotten this feature, as I did not find the necessity for it and my database works trouble-free.

    #124690

    Klaas Vaak
    Member

    I use Vacuum regularly now and have turned off spellchecker. I also only have text, tables, and links. And although the delay has shortened, it is still there. You mention compacting, which I have looked for but can’t find in either the menu or the manual. How can I compact the database? Is there a risk?

    #124692

    Cosmo
    Member

    As I wrote 2 days ago I prefer the word “compacting” over “vacuum”; I thinks it says more clearly, what it does. But actually both words stand for the same thing.

    To continue with the delay topic you should post your question in your already existing thread about that problem, as it really does no more belong to this topic here.

    #124694

    eureka
    Member

    Reading this thread for my own education I read about the “vacuum” command through these links.

    https://sqlite.org/lang_vacuum.html

    http://www.sqlitetutorial.net/sqlite-vacuum/

    http://sqlite.1065341.n5.nabble.com/a-small-defrag-tool-for-SQLite-database-file-td91968.html

    I consider vacuum to be like file defragment in Windows.
    However in my Linux Ubuntu environment I don’t worry about defragment.

    I tend to use cherrytree XML format where vacuum should not be needed.

    One untested idea. Could you (Klaas) open the cherrytree SQLite db and save as XML. Then save XML as SQLite format (perhaps a different name to be safe)? This might clear out any fragments in db. Hopefully you have backup files to fallback on?

    • This reply was modified 5 months, 3 weeks ago by  eureka.
    #124696

    Klaas Vaak
    Member

    @cosmo: thanks for your reply & esp. your patience. I understand the concept better now, and Eureka’s 1st link below has also helped to clear it up for me.
    I do not think there is a need to continue in the other thread because the cause of the delay issue has been identified, with the 1st step solution being the “Save & Vacuum” command, possibly in combination with the disablement of the spellchecker.
    Many thanks for your time & help.

    #124697

    Klaas Vaak
    Member

    Hey Eureka, good to hear from you again. Thanks for posting those 3 links, the 1st & 2nd one makes the “vacuum” concept crystal clear. Your comparison with Windows’ fragmentation is a good one.
    As for the separate tool to do the vacuuming faster, I am not quite sure how to run it – perhaps with a command prompt command?

    As for your SQLite db –> XML –> SQLite db, the only db file I have been able to find in the CT folder is pgsc_locales.dbin C:\Program Files (x86)\CherryTree\bin\glade:

    is that the db you mean?
    If so, is that the db for all my CT files?

    #124699

    eureka
    Member

    Klaas

    I would forget the third link. I added it to show that vacuum is a generic problem.

    Regarding the files I was referring to they are in the location where you normally save your CherryTree files.
    The four options are ..
    *.ctd and *.ctz which are XML formats.
    *.ctb and *.ctx which are SQLite formats.

    In fact there is an easier experiment you might run.

    (1) Create a new empty XML file
    (2) Import a very large cherrytree file (.ctb or .ctx)
    (3) Save as XML file.

    Q. Does the XML file run faster (or not)?

    #124700

    Klaas Vaak
    Member

    Eureka, I tried your easier experiment, and indeed the XML file runs faster. I notice that everything (fonts, colours, lay-out, …) are the same in the XML file as in the SQLite file.

    A couple of questions:
    * what is the difference between the 2 formats?
    * if XML is more problem-free, what would be the rationale for using SQLite at all?
    * is there any reason not to use XML?
    * my SQLite file is 4 Mb, the XML file is 4.9 Mb – is there a reason for that? Not that I mind because I have plenty of space.

    P.S.: off-topic, but in terms of Linux I am going to test drive Antergos off a USB stick to get a feel for it. If I like it I will install a VM & install Antergos in there.

    • This reply was modified 5 months, 3 weeks ago by  Klaas Vaak.
    #124702

    eureka
    Member

    Re: SQLite vs. XML

    It is easier to point you to a full explanation.

    https://stackoverflow.com/questions/77726/xml-or-sqlite-when-to-drop-xml-for-a-database

    Basically XML files are processed in memory and SQLite files are in smaller compressed files. But there comes a point where there are tradeoffs between performance and memory.

    My approach is to have a number of smaller XML files and import other XML files as needed. In other words to have a CherryTree containerfile which imports other files as sections or chapters.

    My end goal is to integrate CherryTree as a client to a NoSQL database eXist-db which runs with Java.

    #124704

    Klaas Vaak
    Member

    @eureka: many thanks for the Stack Overflow article which is exactly what I was looking for. All things considered, I will stay with SQLite:the file that is causing a delay will continue to grow, so eventually SQLite is better than XML anyway. Using the vacuum method will keep the delay within reasonable limits. The other CT files will remain relatively limited to probably less than 10 Mb each.

    In terms of your use

    In other words to have a CherryTree container file which imports other files as sections or chapters.

    that almost sounds like TiddlyWiki would be more appropriate since that is set up for small files which you combine every which way you want in accordance with your needs.

    • This reply was modified 5 months, 3 weeks ago by  Klaas Vaak.
Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic.