Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Fileformat - basic steps #937

Open
andreasb242 opened this issue Feb 25, 2019 · 28 comments
Open

New Fileformat - basic steps #937

andreasb242 opened this issue Feb 25, 2019 · 28 comments
Assignees
Labels
breaking change This is or requires a change that will break existing files if there is no migration code. difficulty::hard enhancement priority::high

Comments

@andreasb242
Copy link
Contributor

andreasb242 commented Feb 25, 2019

The main issue on he fileformat is, that it's not possible to embedded resources like PDFs, Audio Files, Images etc.

@LittleHuba now would start improving it, but I think we should make the plans public before start.
@morrolinux may also have a look at it.

Technical we would use .zip instead of .gz, which support multiple files within the .xopp file.
For this we thinks about the library libzip

  1. Reading Writing of the new fileformat needs to be enabled by a compiler option, only writing reading will be available, for backward compatibility. Previewer, open file with double click etc. need to be configured. (this should be for at least 2 Releases)
    The contents of the new .zip file (have a look at .odt!):
    /mimetype => Textfile with String "application/xournal++"
    /Thumbnails/thumbnail.png => The .png Thubnail, which is now embedded as preview in the .xml
    /content.xml => the current XML, with the same format
    /META-INF/version => "current=6\nmin=6" Current document version/Which version needs to be supported, to read the .xopp, so we decide here, if we do backward compatible changes or not.

  2. We increase /META-INF/version and /META-INF/compatible and put e.g.the images into a separate Folder like "Pictures" for the images, and reference them in the content.xml
    Also Audio files and .pdfs can be packed.
    Audio files and images don't need to be compressed, they are already compressed.

@LittleHuba
Copy link
Member

Reading of the new fileformat needs to be enabled by a compiler option, only writing will be available, for backward compatibility.

I would keep compatibility for the old format, as it requires virtually no overhead (10 lines of code). Therefore we do not need to keep anything hidden behind compiler flags.

@LittleHuba
Copy link
Member

What we could do is preventing saving of files in the old format after 2-5 releases, so we can then drop the support of the old format after some more releases to reduce code complexity.

But to get no complains we must provide a tool to convert to the new format by then.

@andreasb242
Copy link
Contributor Author

No, we cannot drop the old format, we have an export as .xoj, which is the old format without some additions. .xoj is good supported by Xournal, by MrWrite, and may by other tool. Dropping the old format is therefore no Option.

@Technius Technius added the breaking change This is or requires a change that will break existing files if there is no migration code. label May 11, 2019
@tuxflo
Copy link

tuxflo commented Aug 13, 2019

Maybe I'm late to the party but wouldn't it be possible to save the annotations directly to the pdf file? The windows application "xodo" seems to do this and I think that might be a great solution (for all files without audio an custom stuff).

@Technius
Copy link
Member

It is possible, but it would be very complicated. Some problems include:

  • It's easy to go from .xopp to .pdf, but very difficult to go the other way around without rewriting a lot of our logic. We would not be able to distinguish between annotations added by Xournal++ and objects that are inherently present in a PDF file (as well as images, text, etc.). So it would then be impossible to "clear" annotations.
  • PDF files were not designed to be edited. For example, if you add or remove pages, the entire table of contents will need to be adjusted.
  • PDF files are extremely complicated. The specification is over 1000 pages long! Because of this, it is very difficult to find well-maintained libraries that support PDF formats.
  • We use cairo as our backend, but only cairo 1.17 and newer support PDF metadata features. We can't enable these features until 1.17+ is available on most systems.

Given the above problems, it's far easier for us to keep the annotation files and the PDF files separate, but bundle them together. Our "source of truth" (the PDF files) will remain untouched, as they should; and our annotations are, in fact, annotations: they add to the files without changing them.

@LittleHuba
Copy link
Member

We should reevaluate our new file format in terms of using quadtrees for searching and just-in-time reading. Quadtree should be easy as it only requires one new element type.
Just-in-time reading will be harder...

@LittleHuba
Copy link
Member

I will write a completely new reader for this as the current one is hard to maintain with all the overlap between deprecated file versions and new ones.

@esternin
Copy link

esternin commented Jan 9, 2020

EDIT: My bad, ignore this. I was looking as Save As, did not occur to me to look in Export As. It's all good.

No, we cannot drop the old format, we have an export as .xoj, which is the old format without some additions. .xoj is good supported by Xournal, by MrWrite, and may by other tool. Dropping the old format is therefore no Option.

I know this is a closed issue, but I would like to return to this. The "real world" is not up to "testing" or even the latest LTS release. I have run into diifculties compilng the current source code on EVERY distribution I use, save one (Ubuntu 19.10 - buster/sid - and even there I had to go and install alternate versions of a few packages). On no system one could apt-get install xournalpp.

Given that, the ability to share files across platforms with xournal (which, for example is installed on every lectern computer at my home university) is really important. While I may prefer to use xournalpp on my desktop, I must be able to open the .xoj format file I save in xournalpp, in xournal.

Clearly, some functionality may be lost, and so a warning must pop up. LibreOffice saves things in .doc format, and any spreadsheet can save in .csv format, with a similar wording.

I consider this ability to open, edit, and SAVE .xoj files as an ESSENTIAL feature of the program, so I am asking for this issue to be reopened.

@esternin
Copy link

Upon further reflection, I want to again suggest that the current choice of not offering the option of .xoj in Save As, as opposed to Export As be reconsidered.

xournalpp is not yet a stable program, in my view. I cannot ask our IT department to install it campus-wide. There is a significant stock of lecture notes files, in .xoj. The current policy of quietly forcing a conversion to the .xopp format is poorly thought out. I never want someone to save their lecture notes on their office computer, and NOT be able to open them on the lectern computer which may only have xournal. This, in fact, is discouraging me from recommending that my colleagues try xournalpp.

In addition, the current quiet "policy" is not quite correctly described on the front page. I checked it against a very common scenario: I opened an old .xoj file in xournalpp, changed it, exported it as .xoj under the same name, overwriting what was there. Closed, reopened it again, edited again, and just Saved. In other words, I opened an .xoj file created by xournalpp. According to the front page, this should have saved back as .xoj. Instead, it saved as .xopp, without a warning. If I was counting on the .xoj file to have my latest changes when I took it to class, they would not have made it there.

One incident like that and a user will never use xournalpp again.

Therefore, I urge the developers to reconsider this policy, and to offer the LibreOffice-style pop-up ("Save as a Xournal++ file / Save as a Xournal file - compatibility mode" ) dialog if ANY .xoj file is being opened and saved.

When, and if, xournalpp becomes stable and is installed everywhere, the users will migrate on their own, without "enforcement".

@Technius
Copy link
Member

I opened an old .xoj file in xournalpp, changed it, exported it as .xoj under the same name, overwriting what was there. Closed, reopened it again, edited again, and just Saved. In other words, I opened an .xoj file created by xournalpp. According to the front page, this should have saved back as .xoj. Instead, it saved as .xopp, without a warning. If I was counting on the .xoj file to have my latest changes when I took it to class, they would not have made it there.

This sounds like a bug--we'll look into it.

Therefore, I urge the developers to reconsider this policy, and to offer the LibreOffice-style pop-up ("Save as a Xournal++ file / Save as a Xournal file - compatibility mode" ) dialog if ANY .xoj file is being opened and saved.

This is definitely a UI/UX improvement; thanks for the suggestion.

@victoriajeegreen
Copy link

victoriajeegreen commented Aug 7, 2020

Maybe I'm late to the party but wouldn't it be possible to save the annotations directly to the pdf file?

I'm interested in Xournal++ specifically because the annotations are not saved in the original file.

There are many reasons why this is desirable, for instance, in this issue that made me abandon Okular annotations: https://bugs.kde.org/show_bug.cgi?id=394775#c5

I have different concerns thou:

  • I don't want to leak personal notes that I add to a document when I add file attachments to email.
  • Semantic desktop indexers will extract plain text out of PDF files to apply full-text searching. I don't want to re-trigger the full-text search indexer when I add notes to large PDF books.

@mizhozan
Copy link

I think both native pdf annotation and separate ones have its own merits. So, it might be better to keep these two separate:

  • As mentioned it would be very complicated to port all annotation to PDF native format (not even possible).
  • However, it might be possible to have specialized tools for PDF annotation e.g. alongside normal draw, we might have a PDF draw or a native PDF Note and highlight which is very useful.

Despite what @victoriajeegreen said, in my team workflow we are heavily rely on PDF annotation most importantly:

  • Highlights with notes
  • Small drawing annotations
  • Notes

At the moment I use Xournalpp only for personal note-taking and very seldom for PDF annotation but I like to have its nice stylus capabilities in PDF as well (with implementing small special tools along current one). However, this is not a big deal, the mail focus for me is personal note taking (floats, images, audio, etc). This is far more important.

@mizhozan
Copy link

Technical we would use .zip instead of .gz, which support multiple files within the .xopp file.
For this we thinks about the library libzip

I'm not sure about the detail, but you might consider zstd performance wise, here

@mjg
Copy link
Contributor

mjg commented Mar 24, 2021

Just to chime in: linking to images (the way xournalpp does for pdf backgrounds) would be nice as an option, independent of the fact that the new format might store images in a zip (which is still "embedded"), but dependent on a switch to a new format version, of course.

@piegamesde
Copy link
Contributor

It would be cool to use this opportunity and extract the file format handling among other things into a standalone library. This would allow third-party tools to become interoperable with it more easily and it would also be a good step towards #2476.

@Technius
Copy link
Member

That's what we plan to do next after releasing 1.1.0 :)

@Technius Technius self-assigned this Sep 27, 2021
@Technius
Copy link
Member

Technius commented Feb 18, 2022

I was doing some planning for the new file format and realized that it is a ton of work. Given constraints on free time, motivation, etc., it will probably be difficult to get into a fully ready state for 1.2, unless people are willing to wait for 1+ years for the next feature release.

Instead, let's just take small steps and get something done in a reasonable amount of time. Here are the goals that I have in mind for the basic version:

  • Refactor the existing save/load code, which is very hard to maintain and extend.
  • No additional dependencies allowed at this point.
  • Focus on loading from xopz files for now, as that's the hardest part. Saving is a concern but not a priority.
  • Maintain backwards compatibility. Anything that is loadable by 1.1.1 should be loadable after the changes made to support xopz.
  • The xopz format enables some features that currently aren't available.

I propose the following roadmap for the "basic steps" required to meet these goals:

  • Allow loading arbitrary format images from xopz Enable image loading from xopz format #3782.
    • Currently, images are stored in PNG format regardless of their original file format, which leads to file size bloat. Enabling arbitrary image format loading would be a big step towards solving issues like File size efficiency with inserted images #3416.
    • Note that distinguishing between raster (e.g., jpg, png) and vector images (e.g., svg) formats is out of scope of these steps due to backwards compatibility issues, and should be addressed in a later step.
    • An important point is that this task is loading only--we can ignore saving .xopp files for now, as images must still be saved as PNG in base64 encoding to maintain backwards compatibility.
  • Clean up the save AND load logic. Currently, it's a bunch of spaghetti code tied to the current file format.
    • When a xopz file is loaded (or an xoj file for that matter), saving should be disabled because there will not be a backwards compatibility layer (yet).
    • A backwards compatible saving mode should be implemented later. This can also help with exporting to .xoj format.
  • Implement some logic to track "resource" use. For example, the application should be able to enumerate all PDF and image files stored as resources in the xopz.
    • Need to be careful to distinguish between resources that are currently stored in the loaded file and resources that have been added but not saved yet.
    • (Optional--low priority?) Users should be able to view a list of resources with a GUI interface
  • Integrate "resources" throughout the application. For example, multiple background PDFs should be supported, and image elements should be loaded directly from the packaged resources instead of directly serialized as images.
  • Allow audio recordings to be loaded from xopz.

@eldipa
Copy link

eldipa commented Apr 17, 2023

I reviews this ticket and a few others related to the new .xopz file format and what new features and use-cases may provide (or enable).

Some thoughts & proposal

The autosave is implemented calling SaveHandler::saveTo. That means that everything X minutes a full .xopp will be created and written to disk.

Both the current .xopp and the (future) .xopz file formats contain all the state of the current file and they don't allow partial updates: on save, the whole file is recreated.

For large files (think in a book), this is quite expensive.

My idea is to split the .xopz into uncompressed "parts" where each part will contain an aspect of the current file.

At least in theory, this will allow us:

  • if we have a "pdf part" and a "pages part", a modification on one page will not require any rewrite of the "pdf part" (same goes for images, audio and other resources)
  • if only a few pages are modified, the modifications may be appended to the "pages part" which should be much efficient.

From the user's perspective this is totally hidden: the "parts" files generated by the autosave are stored in a cache folder.

When the user requests an explicit save or save-as, a compressed single bundled .xopz file is created which will be the same
"parts" files but in a zip file.

We cannot avoid the cost of doing a full file creation here, but it an explicit save / save-as should be less frequent than an autosave.

Next step:

With this general idea, I will try to design and implement a file format for the "pages part" as a first step.

In principle I will leave out PNG, PDF, Audio and other file embed and focus only on Xournal-specific attributes (pages, layers, strokes, text).

My idea is to use Cap'n Proto (suggested here) and see how feasible is to support "mergeable" files (feature request)

@LittleHuba you suggested to support quadtrees and just-in-time readings (context). What did you mean? I would like to take that into account.

@LittleHuba
Copy link
Member

Hi,
Sounds good so far except for a few issues with big files you did not consider so far.

A general recommendation for reading big files is to not read the full file at any given moment. Your idea with the parts files is already going in that direction but not far enough. Imagine we have N pages and want to render just one page that is near the end of the document. In the worst case that would mean we parse N-1 pages before we find any relevant data.

There are two ways around this:

  1. Separate the parts files into chunks. Kind of like a hash set you can then just read a chunk and parse the contained pages. With M chunks your complexity will decrease to (N-1)/M to find your page data.
  2. Use a file format where within the file an index exists that exactly tells you where your data is. Common examples for such formats are used in scientific domains.

I would strongly recommend option two as it decreases the effort for this undertaking tremendously. A Google search should help you with finding appropriate file formats with libraries that already take care of serializing your data.

Quadtrees
This is a data structure to organize elements spread in a 2D-area in an easily searchable way.
Space is divided into rectangles. If a rectangle contains more than a given amount of elements it is again divided into four rectangles and so on. Wikipedia has quite a good article. This is especially important for infinite pages.

JIT
Just in time reading is then the last building block to support big files. Instead of reading the full file when the user opens it, you just read whatever you need to visualize the currently visible area. Anything else you read when you need it.
With that, files can grow into the region of several GBs and you still have snappy responses of your application. I recommend scientific file formats as they already support such scenarios. This part is interesting if you want to support attachments like movies or audio that are easily several hundred MBs big.

Hope that helps you in your planing.

@eldipa
Copy link

eldipa commented Apr 18, 2023

A general recommendation for reading big files is to not read the full file at any given moment.

That's correct. I will assume for now that LoadHandler reads the full file as it is currently doing it but I will design a file format with support for random access. JIT reading is highly related with this.

As you mentioned, I will probably go with the index solution.

Quadtrees [are] (...) important for infinite pages.

I took a quick look on the open issues and it seems to be 2 name-similar but different feature requests:

  • "infinite pages": the possibility of having an infinite number of pages.
  • "infinite page": the possibility to write beyond the boundaries of a page (backed or no by a PDF).

On "infinite number of pages", the file format needs to be "sparse" so we don't assume that if a 100th page exists, then 1st to 99th also exist.

On "beyond boundaries page", the file format must not assume that a stroke is in a particular page, the stroke's coordinates/points are not necessary within the boundaries of the page (this applies to any other element, no just a stroke)

Food for thought

Regarding this last point, what would show Xournal++ when you have a PDF file with "beyond boundaries pages" and you are displaying more than 1 page?

Could be a stroke to be draw on top of page 1, continue drawing beyond its boundary and stop it on top of page 2?

Assume that Xournal++ is displaying page 2 below the page 1, the stroke would be a kind-of vertical line from up to down. Now the user changes the layout to display page 2 on the right of page 1. What would happen with the stroke?

From the file format point of view I'm seeing (a priori) two possible scenarios:

  • the stroke always belong to 1 and only 1 page (this is the case of the current .xopp file). The stroke is not limited by the boundaries of the page however.
  • the stroke does not belong to any page, it is a free object floating in a global space. In this case a quadtree makes sense to index them.

@eldipa
Copy link

eldipa commented Aug 16, 2023

Update on the progress of the xoz file format

So far we have two milestones completed:

  • we can read and write extents of blocks with data
  • we can request free blocks to write stuff and release them so the blocks can be used for something else.

It may sound too little but the devil is in the details. Managing free/in-use blocks is challenging because the allocation/deallocation can happen in any order. The xoz library tries hard to fulfill a free space request with enough blocks from previously released blocks, avoiding getting new blocks (which grows and makes the file larger).

Even with arbitrary allocations and deallocations the xoz library manages to get 0% external fragmentation and ~1% to ~2% of internal fragmentation. This means that most of file is being used to store useful data and therefore, it has the minimum size.

Next steps

Indexing. Modeling how these blocks links together and create the layers and pages of a Xournal++ document.
Because a small corruption in an index may render the file useless, we need to handle the consistency and self-healing topics too.

Where is the source code?

By the moment it is in a separated repository. In the xoz/ folder it is the source code and in test/ the unit test. The code has a quite large battery of tests, passes cppcheck, cpplint and valgrind, and makes heavy use of RAII idiom (hence virtually no raw pointer or direct new calls).

@RafaelKr
Copy link

RafaelKr commented Nov 15, 2023

As far as the thread tells, the current file container format is supposed to be a ZIP archive.

I just want to throw SQLite as a possible container format into the room, maybe it could bring some advantages with it.
I'm not following this thread in-depth, it's just something that you could consider having a look into.

Here's a good article about some of the benefits it could bring with it: https://sqlite.org/appfileformat.html

The summary of the article:

An SQLite database file with a defined schema often makes an excellent application file format. Here are a dozen reasons why this is so:

  1. Simplified Application Development
  2. Single-File Documents
  3. High-Level Query Language
  4. Accessible Content
  5. Cross-Platform
  6. Atomic Transactions
  7. Incremental And Continuous Updates
  8. Easily Extensible
  9. Performance
  10. Concurrent Use By Multiple Processes
  11. Multiple Programming Languages
  12. Better Applications

@eldipa
Copy link

eldipa commented Nov 17, 2023

@RafaelKr that's correct. Current .xopp format is a compressed archive (similar but not equal to .zip). SQLite offers a much more efficient alternative and I considered as a possible base format for Xournal++ but I eventually decided to go for a custom file format .xoz.
If this was the correct choice it is something that we will see with time.

For the moment I can write some thoughts of why SQLite is or is not a viable solution. The following are the reasons that SQLite authors wrote in https://sqlite.org/appfileformat.html with more details and my thoughts.

  1. Simplified Application Development. No new code is needed for reading or writing the application file. [...]

Yes, implementing a custom .xoz format requires much more code than using SQLite (just because SQLite is already coded!) but the file format is only half of the battle. The issues in the performance of Xournal++ are not in the file format but in how the application reads and writes. Unfortunately the current .xopp format does not allow any improvement.
Hence, the need of .xoz, to enable the application to be optimized and this translate to rewrite a non-trivial part of Xournal++, regardless of using SQLite or not.

  1. Single-File Documents.

True. .xoz also aims to be a single-file document. However there are use cases that we may not want this. Examples are deattached PDF and Font files. It is not clear to me how the application should present these options to the user but a custom .xoz at least should not enforce one or other road.
(.xoz will allow to embed PDF and Fonts files but also it will allow to reference them as external files).

  1. High-Level Query Language. SQLite is a complete relational database engine, which means that the application can access content using high-level queries.

Correct and this was one of the reason to not pick SQLite. Like any other relational database it offers a SQL language, query planner and executor. Non of this is needed in Xournal++. The data in the app is, roughly speaking, quite plain: pages contain layers and layers contain drawings.
I couldn't find/think an use case for things like select lines from drawings where page_no > 10 and line_length < 10. For such complex queries I'm all in with SQLite but otherwise SQLite sounds too much for Xournal++.

  1. Accessible Content. Information held in an SQLite database file is accessible using commonly available open-source command-line tools

I cannot see the point here. If I want to inspect a SQLite database I need a SQLite program. For .xoz will happen the same.
I agree that once the SQLite database is opened, there are tons of programs that visualize the data in different ways (think in databases viewers or explorers). Not sure if Xournal++ will need this.

  1. Cross-Platform.

.xoz will be cross platform too.

  1. Atomic Transactions. Writes to an SQLite database are atomic.

This is by far a really good point for SQLite and something that I need to resolve for .xoz. Unfortunately --if I don't recall wrong-- SQLite uses a journal and does frequent writes to disk. For Xournal++ that would be unhappy as the application may be running in devices using SSD of low quality (think in tablets in comparison with full desktops/notebooks).

  1. Incremental And Continuous Updates. When writing to an SQLite database file, only those parts of the file that actually change are written out to disk. [...]

Absolutely true and .xoz currently does the same.

[...] SQLite also supports continuous update. Instead of collecting changes in memory and then writing them to disk only on a File/Save action, changes can be written back to the disk as they occur. This avoids loss of work on a system crash or power failure. [...]

True but on the other hand writing and flushing (fsync) frequently hurts the SSD lifetime and battery.
Minimizing the data loss is something that I still need to figure out but there is a tradeoff between it and performance. For a database's perspective makes totally sense to minimize the data losss; for Xournal++ it is not so clear.

[...] An automated undo/redo stack [...]

Nice for a database but it is not the primary goal for .xoz.

  1. Easily Extensible. [...] (by ) adding new tables to the schema or by adding new columns to existing tables.

Yes, .xoz is extensible too. It is a matter of designing for that.

[...] so with a modicum of care to ensuring that the meaning of legacy columns and tables are preserved, backwards compatibility is maintained.

It is not an intrinsic property of SQLite so this can be applied to other formats like .xoz. And trust me, backwards and forward compatibility is not trivial.

It is possible to extend custom or pile-of-files formats too [...] then all application code that changes the corresponding tables must be located and modified (as well) [...]

Yes, but this is not a property of SQLite per se either. How much is required to modify the application code for adding new stuff highly depends on how much coupling exists.
For Xournal++, if a new compression algorithm is added, only the library that reads/writes .xoz should modified; if a new kind of drawing is added, Xournal++ will be modified but for .xoz it would be mostly transparent.
I'm not saying that .xoz guaranties zero coupling and a a life without pain, just we need to apply good practices and good designs.

  1. Performance. [...] SQLite can often dramatically improves start-up times because instead of having to read and parse the entire document into memory, the application can do queries to extract only the information needed for the initial screen.

True but this is how the application works and how the data is stored, not necessary a property of SQLite.
For .xoz, Xournal++ will have to read the index (pages and layers ids) that will fit without problem in memory (using a plain std::map). From there, a few reads to fetch the data that Xournal++ wants, no need to read anything else.

[...] SQLite can read and write smaller BLOBs (less than about 100KB in size) from its database faster than those same blobs can be read or written as separate files from the filesystem. [...]

No idea but I'm going to check that (and steal some ideas from SQLite!)

  1. Concurrent Use By Multiple Processes. SQLite automatically coordinates concurrent access to the same document from multiple threads and/or processes.[...]

Not necessary in Xournal++ by the moment but we can easily implement multi-reader/single-writer.

  1. Multiple Programming Languages. Though SQLite is itself written in ANSI-C, interfaces exist for [...] C++, C#, Objective-C, Java, [...]

I'm planing to offer a Python wrapper for a low level API for .xoz. There are a few open issues in this repository of people wanting to do some "scripting" on their .xopp files.

  1. Better Applications. If the application file format is an SQLite database, the complete documentation for that file format consists of the database schema, with perhaps a few extra words about what each table and column represents. The description of a custom file format, on the other hand, typically runs on for hundreds of pages. [...]

Yes and no. It is unfair to say that you only need to document the schema because if I have a .sqlite file I need to know how to read that binary file before trying to guess the schema.
With .xoz is the same, we need to document the binary layout.
And no, hundreds of pages are for really complex stuff (like PDF). .xoz is much more simpler (but not trivial).

Puff, a lot of writing, eh?

@axiopaladin
Copy link

It's probably too late to implement this, but I thought it might be useful idea to add to the mix (even if only for provoking some discussion).

Would it be possible to make this new format work as a sort of "hybrid pdf" like how LibreOffice or Adobe Illustrator do it? The benefit would be that such files can be viewed with a generic PDF viewer, which is nice for e.g. mobile devices that lack a xournal++ port.

@Sporknife
Copy link

Sporknife commented Jan 7, 2024

I will just throw my opinion here for the sake of discussion after reading sqllites' article and the entire discussion. Take this comment with a big spoon of salt because I am not involved in this project at all, so I don't know the internals.

What am I hoping to happen for the sake of "opensource wide" development of handwritten notes apps (I use rnote because xournal++ lags on my laptop. Might be the difference between gtk3 and gtk4?).

Standard library for graphical structures

It defines everything (brushes, shapes, pure text, mathematical equations?, images, pdf, pdf page, external widgets that could be used to implement showing other stuff such as graphs).

Rnote uses rust's piet crate.
Mentions from piet's GitHub page: skia graphics library, C++ 2D graphics api proposal.

Standard handwritten notes format

From what I've read in this discussion, there is a shit-ton of work done on improving the existing file format and lots of discussion on the new one. So here are my 2 cents on this topic. Note that some of these concepts were already mentioned in this discussion and might have been already implemented.

My "ideal" (abstract) handwritten notes application file format:

  • multi-file archive compressed with ZSTD compression algorithm (its already used for system packages in linux afaik, why not use it here as well).
  • embedded files should be stored separately.
  • pages should be stored per page and not everything in one page. As said before (I believe), this would allow apps to load pages that are currently shown and save those that were edited specifically. I also believe that each page should have a unique name that is not connected to its page number (this would allow making links to a random page inside the document without connecting it to a specific page. Easier in the long run). The "problem" with the previously mentioned approach is that a file that specifies the order of pages would need to be stored somewhere.
  • INFO.json containing information about our document (description, which format it uses in case we want to change it in the future, etc.).
Structure of the application file format
my-application-file-format/
├─ INFO.json
├─ embedded/
│  ├─ brain.png
│  ├─ doc.pdf
│  ├─ specific-page.pdf
├─ pages/
│  ├─ {unique_name1}
│  ├─ {unique_name2}

Tooling for the program

A library for converting this application file format to other formats such as a pdf, etc.

@eldipa
Copy link

eldipa commented Feb 13, 2024

Would it be possible to make this new format work as a sort of "hybrid pdf" like how LibreOffice or Adobe Illustrator do it? The benefit would be that such files can be viewed with a generic PDF viewer, which is nice for e.g. mobile devices that lack a xournal++ port.

@axiopaladin

I considered this and it looks a nice thing to have but I'm not sure how it would work in practice. Xournal++ never modifies the PDF file, only "adds" things on top of it. When the user exports a new PDF file, the original and the "added" things are merged together.
This clearly shows two kind of files (the PDF (either the original or the exported) and the xopp file) and two different operations (export a PDF or save a xopp file).

Having the xopp file embedded in the PDF file may be useful but I think we need to think more how it would be used, specially how this feature would work in the case of users not working with a PDF at all.

Related with the new format xoz, the format is designed to be editable (the file is edited during the execution of Xournal++, no only when the user does the save) . This imposes a few constraints on where xoz can live and by the moment, living inside a PDF is not possible. But again, I don't discard the idea.

@eldipa
Copy link

eldipa commented Feb 13, 2024

@Sporknife

* multi-file archive compressed with ZSTD compression algorithm (its already used for system packages in linux afaik, why not use it here as well).

Yes, in fact, I remember someone did an experiment with different compression algorithms and zstd won in almost all the cases (there is an issue in github but I don't recall which was).
The use of compression however cannot be used lightly: if we compress everything we may not be able to read/edit some part without having to uncompress large chunks of data and this would hurt the performance.

But definitely we want compression in xoz.

* embedded files should be stored separately.

There are good arguments for storing embedded files outside of xoz (not longer embedded) and inside (embedded). xoz does not take sides both can be implemented.

* pages should be stored per page and not everything in one page. As said before (I believe), this would allow apps to load pages that are currently shown and save those that were edited specifically. 

Yes, while the current xopp format stores the data in different pages, Xournal++ loads everything in memory and saves everything on each save call. The new xoz format will enable the read/write of only a part (like you said, a page) but with xoz will not be enough: we may have to do some deep changed in Xournal++ to support this.

I also believe that each page should have a unique name that is not connected to its page number (this would allow making links to a random page inside the document without connecting it to a specific page. Easier in the long run).

Naming will be supported by xoz (in addition to a coordinate system). Links will also be supported. But some of these features will require changes in the UI of Xournal++ and that may take some time. I think that we should write down the concrete use cases in the meantime.

INFO.json containing information about our document (description, which format it uses in case we want to change it in the future, etc.).
[...]
A library for converting this application file format to other formats such as a pdf, etc.

I'm having plans to make a Python API (may be Lua as well) on top of the xoz lib so things like getting information or converting to another formats would be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change This is or requires a change that will break existing files if there is no migration code. difficulty::hard enhancement priority::high
Projects
None yet
Development

No branches or pull requests