-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Fileformat - basic steps #937
Comments
I would keep compatibility for the old format, as it requires virtually no overhead (10 lines of code). Therefore we do not need to keep anything hidden behind compiler flags. |
What we could do is preventing saving of files in the old format after 2-5 releases, so we can then drop the support of the old format after some more releases to reduce code complexity. But to get no complains we must provide a tool to convert to the new format by then. |
No, we cannot drop the old format, we have an export as .xoj, which is the old format without some additions. .xoj is good supported by Xournal, by MrWrite, and may by other tool. Dropping the old format is therefore no Option. |
Maybe I'm late to the party but wouldn't it be possible to save the annotations directly to the pdf file? The windows application "xodo" seems to do this and I think that might be a great solution (for all files without audio an custom stuff). |
It is possible, but it would be very complicated. Some problems include:
Given the above problems, it's far easier for us to keep the annotation files and the PDF files separate, but bundle them together. Our "source of truth" (the PDF files) will remain untouched, as they should; and our annotations are, in fact, annotations: they add to the files without changing them. |
We should reevaluate our new file format in terms of using quadtrees for searching and just-in-time reading. Quadtree should be easy as it only requires one new element type. |
I will write a completely new reader for this as the current one is hard to maintain with all the overlap between deprecated file versions and new ones. |
EDIT: My bad, ignore this. I was looking as Save As, did not occur to me to look in Export As. It's all good.
I know this is a closed issue, but I would like to return to this. The "real world" is not up to "testing" or even the latest LTS release. I have run into diifculties compilng the current source code on EVERY distribution I use, save one (Ubuntu 19.10 - buster/sid - and even there I had to go and install alternate versions of a few packages). On no system one could apt-get install xournalpp. Given that, the ability to share files across platforms with xournal (which, for example is installed on every lectern computer at my home university) is really important. While I may prefer to use xournalpp on my desktop, I must be able to open the .xoj format file I save in xournalpp, in xournal. Clearly, some functionality may be lost, and so a warning must pop up. LibreOffice saves things in .doc format, and any spreadsheet can save in .csv format, with a similar wording. I consider this ability to open, edit, and SAVE .xoj files as an ESSENTIAL feature of the program, so I am asking for this issue to be reopened. |
Upon further reflection, I want to again suggest that the current choice of not offering the option of .xoj in Save As, as opposed to Export As be reconsidered. xournalpp is not yet a stable program, in my view. I cannot ask our IT department to install it campus-wide. There is a significant stock of lecture notes files, in .xoj. The current policy of quietly forcing a conversion to the .xopp format is poorly thought out. I never want someone to save their lecture notes on their office computer, and NOT be able to open them on the lectern computer which may only have xournal. This, in fact, is discouraging me from recommending that my colleagues try xournalpp. In addition, the current quiet "policy" is not quite correctly described on the front page. I checked it against a very common scenario: I opened an old .xoj file in xournalpp, changed it, exported it as .xoj under the same name, overwriting what was there. Closed, reopened it again, edited again, and just Saved. In other words, I opened an .xoj file created by xournalpp. According to the front page, this should have saved back as .xoj. Instead, it saved as .xopp, without a warning. If I was counting on the .xoj file to have my latest changes when I took it to class, they would not have made it there. One incident like that and a user will never use xournalpp again. Therefore, I urge the developers to reconsider this policy, and to offer the LibreOffice-style pop-up ("Save as a Xournal++ file / Save as a Xournal file - compatibility mode" ) dialog if ANY .xoj file is being opened and saved. When, and if, xournalpp becomes stable and is installed everywhere, the users will migrate on their own, without "enforcement". |
This sounds like a bug--we'll look into it.
This is definitely a UI/UX improvement; thanks for the suggestion. |
I'm interested in Xournal++ specifically because the annotations are not saved in the original file. There are many reasons why this is desirable, for instance, in this issue that made me abandon Okular annotations: https://bugs.kde.org/show_bug.cgi?id=394775#c5 I have different concerns thou:
|
I think both native pdf annotation and separate ones have its own merits. So, it might be better to keep these two separate:
Despite what @victoriajeegreen said, in my team workflow we are heavily rely on PDF annotation most importantly:
At the moment I use Xournalpp only for personal note-taking and very seldom for PDF annotation but I like to have its nice stylus capabilities in PDF as well (with implementing small special tools along current one). However, this is not a big deal, the mail focus for me is personal note taking (floats, images, audio, etc). This is far more important. |
I'm not sure about the detail, but you might consider |
Just to chime in: linking to images (the way xournalpp does for pdf backgrounds) would be nice as an option, independent of the fact that the new format might store images in a zip (which is still "embedded"), but dependent on a switch to a new format version, of course. |
It would be cool to use this opportunity and extract the file format handling among other things into a standalone library. This would allow third-party tools to become interoperable with it more easily and it would also be a good step towards #2476. |
That's what we plan to do next after releasing 1.1.0 :) |
I was doing some planning for the new file format and realized that it is a ton of work. Given constraints on free time, motivation, etc., it will probably be difficult to get into a fully ready state for 1.2, unless people are willing to wait for 1+ years for the next feature release. Instead, let's just take small steps and get something done in a reasonable amount of time. Here are the goals that I have in mind for the basic version:
I propose the following roadmap for the "basic steps" required to meet these goals:
|
I reviews this ticket and a few others related to the new Some thoughts & proposalThe autosave is implemented calling Both the current For large files (think in a book), this is quite expensive. My idea is to split the At least in theory, this will allow us:
From the user's perspective this is totally hidden: the "parts" files generated by the autosave are stored in a cache folder. When the user requests an explicit save or save-as, a compressed single bundled We cannot avoid the cost of doing a full file creation here, but it an explicit save / save-as should be less frequent than an autosave. Next step:With this general idea, I will try to design and implement a file format for the "pages part" as a first step. In principle I will leave out PNG, PDF, Audio and other file embed and focus only on Xournal-specific attributes (pages, layers, strokes, text). My idea is to use Cap'n Proto (suggested here) and see how feasible is to support "mergeable" files (feature request) @LittleHuba you suggested to support quadtrees and just-in-time readings (context). What did you mean? I would like to take that into account. |
Hi, A general recommendation for reading big files is to not read the full file at any given moment. Your idea with the parts files is already going in that direction but not far enough. Imagine we have N pages and want to render just one page that is near the end of the document. In the worst case that would mean we parse N-1 pages before we find any relevant data. There are two ways around this:
I would strongly recommend option two as it decreases the effort for this undertaking tremendously. A Google search should help you with finding appropriate file formats with libraries that already take care of serializing your data. Quadtrees JIT Hope that helps you in your planing. |
That's correct. I will assume for now that As you mentioned, I will probably go with the index solution.
I took a quick look on the open issues and it seems to be 2 name-similar but different feature requests:
On "infinite number of pages", the file format needs to be "sparse" so we don't assume that if a 100th page exists, then 1st to 99th also exist. On "beyond boundaries page", the file format must not assume that a stroke is in a particular page, the stroke's coordinates/points are not necessary within the boundaries of the page (this applies to any other element, no just a stroke) Food for thoughtRegarding this last point, what would show Xournal++ when you have a PDF file with "beyond boundaries pages" and you are displaying more than 1 page? Could be a stroke to be draw on top of page 1, continue drawing beyond its boundary and stop it on top of page 2? Assume that Xournal++ is displaying page 2 below the page 1, the stroke would be a kind-of vertical line from up to down. Now the user changes the layout to display page 2 on the right of page 1. What would happen with the stroke? From the file format point of view I'm seeing (a priori) two possible scenarios:
|
Update on the progress of the
|
As far as the thread tells, the current file container format is supposed to be a ZIP archive. I just want to throw SQLite as a possible container format into the room, maybe it could bring some advantages with it. Here's a good article about some of the benefits it could bring with it: https://sqlite.org/appfileformat.html The summary of the article:
|
@RafaelKr that's correct. Current For the moment I can write some thoughts of why SQLite is or is not a viable solution. The following are the reasons that SQLite authors wrote in https://sqlite.org/appfileformat.html with more details and my thoughts.
Yes, implementing a custom
True.
Correct and this was one of the reason to not pick SQLite. Like any other relational database it offers a SQL language, query planner and executor. Non of this is needed in Xournal++. The data in the app is, roughly speaking, quite plain: pages contain layers and layers contain drawings.
I cannot see the point here. If I want to inspect a SQLite database I need a SQLite program. For
This is by far a really good point for SQLite and something that I need to resolve for
Absolutely true and
True but on the other hand writing and flushing (fsync) frequently hurts the SSD lifetime and battery.
Nice for a database but it is not the primary goal for
Yes,
It is not an intrinsic property of SQLite so this can be applied to other formats like
Yes, but this is not a property of SQLite per se either. How much is required to modify the application code for adding new stuff highly depends on how much coupling exists.
True but this is how the application works and how the data is stored, not necessary a property of SQLite.
No idea but I'm going to check that (and steal some ideas from SQLite!)
Not necessary in Xournal++ by the moment but we can easily implement multi-reader/single-writer.
I'm planing to offer a Python wrapper for a low level API for
Yes and no. It is unfair to say that you only need to document the schema because if I have a Puff, a lot of writing, eh? |
It's probably too late to implement this, but I thought it might be useful idea to add to the mix (even if only for provoking some discussion). Would it be possible to make this new format work as a sort of "hybrid pdf" like how LibreOffice or Adobe Illustrator do it? The benefit would be that such files can be viewed with a generic PDF viewer, which is nice for e.g. mobile devices that lack a xournal++ port. |
I will just throw my opinion here for the sake of discussion after reading sqllites' article and the entire discussion. Take this comment with a big spoon of salt because I am not involved in this project at all, so I don't know the internals. What am I hoping to happen for the sake of "opensource wide" development of handwritten notes apps (I use rnote because xournal++ lags on my laptop. Might be the difference between gtk3 and gtk4?). Standard library for graphical structuresIt defines everything (brushes, shapes, pure text, mathematical equations?, images, pdf, pdf page, external widgets that could be used to implement showing other stuff such as graphs). Rnote uses rust's piet crate. Standard handwritten notes formatFrom what I've read in this discussion, there is a shit-ton of work done on improving the existing file format and lots of discussion on the new one. So here are my 2 cents on this topic. Note that some of these concepts were already mentioned in this discussion and might have been already implemented. My "ideal" (abstract) handwritten notes application file format:
Structure of the application file format
Tooling for the programA library for converting this application file format to other formats such as a pdf, etc. |
I considered this and it looks a nice thing to have but I'm not sure how it would work in practice. Xournal++ never modifies the PDF file, only "adds" things on top of it. When the user exports a new PDF file, the original and the "added" things are merged together. Having the Related with the new format |
Yes, in fact, I remember someone did an experiment with different compression algorithms and zstd won in almost all the cases (there is an issue in github but I don't recall which was). But definitely we want compression in
There are good arguments for storing embedded files outside of
Yes, while the current
Naming will be supported by
I'm having plans to make a Python API (may be Lua as well) on top of the |
The main issue on he fileformat is, that it's not possible to embedded resources like PDFs, Audio Files, Images etc.
@LittleHuba now would start improving it, but I think we should make the plans public before start.
@morrolinux may also have a look at it.
Technical we would use .zip instead of .gz, which support multiple files within the .xopp file.
For this we thinks about the library libzip
ReadingWriting of the new fileformat needs to be enabled by a compiler option, onlywritingreading will be available, for backward compatibility. Previewer, open file with double click etc. need to be configured. (this should be for at least 2 Releases)The contents of the new .zip file (have a look at .odt!):
/mimetype => Textfile with String "application/xournal++"
/Thumbnails/thumbnail.png => The .png Thubnail, which is now embedded as preview in the .xml
/content.xml => the current XML, with the same format
/META-INF/version => "current=6\nmin=6" Current document version/Which version needs to be supported, to read the .xopp, so we decide here, if we do backward compatible changes or not.
We increase /META-INF/version and /META-INF/compatible and put e.g.the images into a separate Folder like "Pictures" for the images, and reference them in the content.xml
Also Audio files and .pdfs can be packed.
Audio files and images don't need to be compressed, they are already compressed.
The text was updated successfully, but these errors were encountered: