Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify History Structure #437

Open
vanyauhalin opened this issue Aug 1, 2023 · 2 comments
Open

Simplify History Structure #437

vanyauhalin opened this issue Aug 1, 2023 · 2 comments

Comments

@vanyauhalin
Copy link
Contributor

vanyauhalin commented Aug 1, 2023

In this issue, I would like to discuss how I believe we can simplify the structure of the history, explain why I think it's important, and highlight the potential problems that the current implementation might cause.

Contents

  1. The Prelude
    1. In General
    2. For Other Vendors
  2. The Dive
    1. Current Implementation
      1. Create Document
      2. Open The Editor
      3. Write The First Changes
      4. Write The Second Changes
    2. Houston, we have a problem
      1. Create Document
      2. Open The Editor
      3. Write The First Changes
      4. Write The Second Changes
  3. The Potential Solutions
    1. History Structure
      1. Create The Source Directory
      2. Create The Versions Directory
      3. Create The First Version
      4. Write The Changes
      5. Rename The Source File
      6. Write The Changes After Renaming
      7. Restore The Specific Version
    2. History Object

1. The Prelude

Before I dive into our examples, I would like to do some research on how history works in general and also learn a bit about how history works for other vendors in the field.

1.1. In General

Let's imagine that I'm creating a new Node.js tool that simply prints a friendly welcome message to the console. That's all there is to it. My tool will include two files. One of them is main.js, which is where my program starts, and the other is version, which indicates the version of the tool sources.

/tool
$ find .
├─ main.js
└─ version

Let's create the initial version of the tool. I'll add a print message to the console and pin the version.

/tool
$ cat main.js
console.log("Hi Buddy!")

/tool
$ cat version
1

Well, I would like to share my brilliant tool with friends, so I need to publish it in the registry. Imagine that the structure of the registry is as simple as the structure of the program. The registry contains directories named after versions of the tool, and these subdirectories contain the sources of the specific version. After executing the magic command, my program gets published in the registry with the initial version.

/registry
$ find .
└─ versions
   └─ 1
      ├─ main.js
      └─ version

As time goes by, I've made new friends who mostly speak French. I want them to be able to use my instrument as well. In order to achieve this, I've created a new version.

/tool
$ cat main.js
console.log("Hi Buddy! Salut mon pote!")

/tool
$ cat version
2

And of course, I published it in the registry.

/registry
$ find .
└─ versions
   ├─ 1
   │  ├─ main.js
   │  └─ version
   └─ 2
      ├─ main.js
      └─ version

Everything was going well, but my old friends were unaware for a long time that my tool had a new version. The issue arose because they had installed a specific version of the program. To resolve this, I added a symbolic link in the registry that will always point to the latest version.

/registry
$ find .
├─ latest -> /registry/versions/2/
└─ versions
   ├─ 1
   │  ├─ main.js
   │  └─ version
   └─ 2
      ├─ main.js
      └─ version

Starting now, whenever I add a new version, the registry will update the link so that all my friends can be informed about its release.

1.2. For Other Vendors

I'm a neophyte in the development of office tools, so sometimes I look at how my colleagues in the field implement certain things. This time, I was particularly interested in understanding how the first version of a document is created.

After playing with the most popular office services, I discovered that when a user creates a file, it automatically generates the initial, first version of the file. For me, this was non-intuitive because, in my experience, when I create a Node.js tool, the first version already contains some code. In this case, the initial version refers to an empty document.

2. The Dive

In the following, I'll first focus on the current implementation of the history. Then, I'll discuss potential problems. And finally, propose a possible solutions.

I'll consider the base scenario where a user creates a file and then edits it multiple times. I'll be working on an example written in Python because it contains sketches of a new history structure. I'll make sure to include links to all the implementations in all the examples.

2.1. Current Implementation

The user starts the document server with the project and opens the home page.

2.1.1 Create Document

The user creates a new document file. Under the hood, the following processes are launched.

  1. A new blank document is being created.12345678
  2. A file with meta information about new file is being created.910111213141516

The structure of the user storage directory might look something like this.
$ find .
├─ new.docx
└─ new.docx-hist
   └─ createdInfo.json

$ open new.docx
Blank document

$ cat new.docx-hist/createdInfo.json
{
  "created": "2023-07-31 12:05:45",
  "uid": "uid-1",
  "uname": "John Smith"
}

2.1.2 Open The Editor

The user open the editor and this action launched the following processes:

  1. A key is being generated on the fly.1718192021222324
  2. A history object is being updated.2526272829303132

The history object might look something like this.
{
  "history": {
    "currentVersion": 1,
    "history": [
      {
        "key": "871885520052496533",
        "version": 1,
        "created": "2023-07-31 12:05:45",
        "user": {
          "id": "uid-1",
          "name": "John Smith"
        }
      }
    ]
  },
  "historyData": {
    "0": {
      "fileType": "docx",
      "key": "871885520052496533",
      "version": 1,
      "url": "http://proxy/static/172.19.0.3/new.docx",
      "token": "..."
    }
  }
}

2.1.3 Write The First Changes

The user writes their first set of changes. For our convenience, they've added a "V2" sentence to their document. Here I refer to my reflections at the very beginning.

When a document is created, it's automatically assigned the initial, first version.

So the content of "V2" suggests to us that this document should, in theory, be the second version.

The structure of the user storage directory is shown below.
$ find .
├─ new.docx
└─ new.docx-hist
   ├─ 1
   │  ├─ changes.json
   │  ├─ diff.zip
   │  ├─ key.txt
   │  └─ prev.docx
   └─ createdInfo.json

$ open new.docx
V2

$ cat new.docx-hist/1/changes.json
{
  "serverVersion": "7.3.3",
  "changes": [
    {
      "created": "2023-07-31 12:57:43",
      "user": {
        "id": "uid-1",
        "name": "John Smith"
      }
    }
  ]
}

$ unzip new.docx-hist/1/diff.zip
Blank document -> V2

$ open new.docx-hist/1/prev.docx
Blank document

The history object might is shown below.
{
  "history": {
    "currentVersion": 2,
    "history": [
      {
        "key": "-1139790747860777407",
        "version": 1,
        "created": "2023-07-31 12:57:07",
        "user": {
          "id": "uid-1",
          "name": "John Smith"
        }
      },
      {
        "key": "-5826708658554727445",
        "version": 2,
        "changes": [
          {
            "created": "2023-07-31 12:57:43",
            "user": {
              "id": "uid-1",
              "name": "John Smith"
            }
          }
        ],
        "serverVersion": "7.3.3",
        "created": "2023-07-31 12:57:43",
        "user": {
          "id": "uid-1",
          "name": "John Smith"
        }
      }
    ]
  },
  "historyData": {
    "0": {
      "fileType": "docx",
      "key": "-1139790747860777407",
      "version": 1,
      "url": "http://proxy/downloadhistory?fileName=new.docx&ver=1&file=prev.docx&userAddress=172.19.0.3",
      "token": "..."
    },
    "1": {
      "fileType": "docx",
      "key": "-5826708658554727445",
      "version": 2,
      "url": "http://proxy/static/172.19.0.3/new.docx",
      "previous": {
        "fileType": "docx",
        "key": "-1139790747860777407",
        "url": "http://proxy/downloadhistory?fileName=new.docx&ver=1&file=prev.docx&userAddress=172.19.0.3"
      },
      "changesUrl": "http://proxy/downloadhistory?fileName=new.docx&ver=1&file=diff.zip&userAddress=172.19.0.3",
      "token": "..."
    }
  }
}

2.1.4. Write The Second Changes

By analogy with the previous section, the user writes the second changes by appending "V3".

The structure of the user storage directory is shown below.
$ find .
├─ new.docx
└─ new.docx-hist
   ├─ 1
   │  ├─ changes.json
   │  ├─ diff.zip
   │  ├─ key.txt
   │  └─ prev.docx
   ├─ 2
   │  ├─ changes.json
   │  ├─ diff.zip
   │  ├─ key.txt
   │  └─ prev.docx
   └─ createdInfo.json

$ open new.docx
V2 V3

$ cat new.docx-hist/2/changes.json
{
  "serverVersion": "7.3.3",
  "changes": [
    {
      "created": "2023-07-31 13:20:53",
      "user": {
        "id": "uid-1",
        "name": "John Smith"
      }
    }
  ]
}

$ unzip new.docx-hist/2/diff.zip
V2 -> V2 V3

$ open new.docx-hist/2/prev.docx
V2

The history object might is shown below.
{
  "history": {
    "currentVersion": 3,
    "history": [
      {
        "key": "-1139790747860777407",
        "version": 1,
        "created": "2023-07-31 12:57:07",
        "user": {
          "id": "uid-1",
          "name": "John Smith"
        }
      },
      {
        "key": "-5826708658554727445",
        "version": 2,
        "changes": [
          {
            "created": "2023-07-31 12:57:43",
            "user": {
              "id": "uid-1",
              "name": "John Smith"
            }
          }
        ],
        "serverVersion": "7.3.3",
        "created": "2023-07-31 12:57:43",
        "user": {
          "id": "uid-1",
          "name": "John Smith"
        }
      },
      {
        "key": "3159726309752106985",
        "version": 3,
        "changes": [
          {
            "created": "2023-07-31 13:20:53",
            "user": {
              "id": "uid-1",
              "name": "John Smith"
            }
          }
        ],
        "serverVersion": "7.3.3",
        "created": "2023-07-31 13:20:53",
        "user": {
          "id": "uid-1",
          "name": "John Smith"
        }
      }
    ]
  },
  "historyData": {
    "0": {
      "fileType": "docx",
      "key": "-1139790747860777407",
      "version": 1,
      "url": "http://proxy/downloadhistory?fileName=new.docx&ver=1&file=prev.docx&userAddress=172.19.0.3",
      "token": "..."
    },
    "1": {
      "fileType": "docx",
      "key": "-5826708658554727445",
      "version": 2,
      "url": "http://proxy/downloadhistory?fileName=new.docx&ver=2&file=prev.docx&userAddress=172.19.0.3",
      "previous": {
        "fileType": "docx",
        "key": "-1139790747860777407",
        "url": "http://proxy/downloadhistory?fileName=new.docx&ver=1&file=prev.docx&userAddress=172.19.0.3"
      },
      "changesUrl": "http://proxy/downloadhistory?fileName=new.docx&ver=1&file=diff.zip&userAddress=172.19.0.3",
      "token": "..."
    },
    "2": {
      "fileType": "docx",
      "key": "3159726309752106985",
      "version": 3,
      "url": "http://proxy/static/172.19.0.3/new.docx",
      "previous": {
        "fileType": "docx",
        "key": "-5826708658554727445",
        "url": "http://proxy/downloadhistory?fileName=new.docx&ver=2&file=prev.docx&userAddress=172.19.0.3"
      },
      "changesUrl": "http://proxy/downloadhistory?fileName=new.docx&ver=2&file=diff.zip&userAddress=172.19.0.3",
      "token": "..."
    }
  }
}

2.2. Houston, we have a problem

I propose to move in the same order as we moved earlier considering the base scenario.

2.2.1. Create Document

My misunderstanding starts right from the moment of creation of the file.

Note
Further along the text, I'll include small notes in the margins. These notes will highlight certain things that I would like to mention, but they may not be directly related to the current issue.

Let's remember how our registry handles versions of the tool. It creates a new directory for each version that contains the sources of that version. Additionally, it creates a symbolic link for the latest version.

Let's see how our examples handle the initial version.

Note
And the first thing I would like to consecrate is that the process of creation isn't implemented in the same way everywhere. Somewhere it's an endpoint with one name, somewhere with a different one, and somewhere without it at all.

$ find .
├─ new.docx
└─ new.docx-hist
   └─ createdInfo.json

— Do you see the first version of the file?
— No.
— But it exists.

Here new.docx is its the first version and at the same time the last one. The problem is that this file isn't in its designated, versioned subdirectory, and it's important to always remember this. For instance, when determining the latest version, we begin counting from one,3332333435363738 which may not seem logical because the latest version could actually be zero, indicating that the file doesn't exist.

The presence of createdInfo.json has also raised some questions. This file only contains information about the initial version, so it makes sense to store it near the first version. Placing it in the root of the history directory, where only versioned directories should be stored, may create additional branches in the business logic. This situation is already observed in the function that determining the latest version.39404142434445

When creating, we don't generate a file with a key. For now, I'll leave this judgment without explanation, since I'll return to this later.

2.2.2. Open The Editor

A key is being generated on the fly.

Let's break the key generation process into stages.4647484921222324

Note
Key generation isn't always wrapped in a function everywhere. In some examples, it's copy-pasted throughout the codebase. The algorithm may also vary, but not significantly.

  1. Get the path to the source file. File that is located outside the history directory.
  2. Generate a download URL for the source file.
  3. Read the meta information of the source file.
  4. Concatenate the download URL with the time of the most recent content modification of the source file.
  5. Get the hash value of the concatenated string.
  6. Trim the hash to the first 20 characters.

Behind the seemingly high complexity, in fact, unreliability is hidden.

While I was typing this text, @aleksandrfedorov97 found a unique situation. He discovered that the standard method of the File in Java doesn't copy the meta information of a file when copying it on the Windows platform. Surprisingly, we even have our own implementation of the file-copying!50 We rely on unreliable data and then try to tune it to make it a bit more reliable.

Tritely, if the user opens a file, adds a space, and then immediately deletes it, we theoretically shouldn't create a new version of the file, because the content hasn't been changed. However, since the file has been modified, a new key will be generated for the unchanged content.

The problem is exacerbated by the fact that for the some cases, we've a function that generates a key from the download URL5152535455565758 provided by the document server5960616263646566 for a file that we already stored. However, it seems that the document server always returns the key of the file it worked with, if I understand correctly.67

It appears to me that the current approach to key generation is like having a loaded gun on the wall, which is bound to go off more than once.

Looking ahead, I want to mention that currently we're unable to replace the key generation with something simpler and more standardized, such as UUID. If two users open the same file, they will receive two different keys because:

A key is being generated on the fly.

Not storing the key also has an impact on the generation of the history.

Note
I want to dwell on the structure of the history object. It's a combination of two documented structures: History and HistoryData, which serve different purposes.6869 When querying the History, we don't always require the HistoryData. Additionally, we often only need the HistoryData of specific elements, rather than all of them. It would be more logical and simpler to have separate endpoints for these two cases, just as the document server provides two separate callbacks for these two structures.

When the document server talks to our examples, it sends a key, and we save it. However, we don't store the key for the latest version. Therefore, when we generate the history object, we need to check if the version in the cycle is the latest. If it's, then we use the generated key. In all other cases, we read the key from the file. Essentially, we manually add branching to the universal process, making it non-generic.7071727374757677

Warning
Further along the text, I'll rely on changes.json. This is incorrect because the file is a black box from the document server, and its content may be changed at any moment. Instead, we should create our own tracking system. Anyway, my future points will be relevant if we replace this box with our own system.

The situation is similar to the meta for the initial version. All files have a standardized method of storing additional information, but for the first version, we use something different. Once again, we introduce unnecessary complexity to the generic process.7879808182838485

2.2.3. Write The First Changes

It's time to take a look at the first changes.

$ find .
├─ new.docx
└─ new.docx-hist
   ├─ 1
   │  ├─ changes.json
   │  ├─ diff.zip
   │  ├─ key.txt
   │  └─ prev.docx
   └─ createdInfo.json

We noticed that a versioned subdirectory isn't created for the latest version. Therefore, the new.docx file must be the second version. When our user initially created these files, they added specific content inside to indicate the version, making it easier for us.

$ open new.docx
V2

That's good. The file is indeed the second version, just as we suspected. Now, let's take a look at the other files in the history directory.

Here we'll find that an initial versioned directory has been created. Following the same logic as presented in the Node.js registry, this directory should contain sources, files of the specific, first version. But what do we've?

$ cat new.docx-hist/1/changes.json
{
  "serverVersion": "7.3.3",
  "changes": [
    {
      "created": "2023-07-31 12:57:43",
      "user": {
        "id": "uid-1",
        "name": "John Smith"
      }
    }
  ]
}

The black box from the document server contains an object that represents the changes made to the source file. This object was submitted after the initial changes were made, specifically after the user typed "V2". Based on this, I can infer that this object doesn't belong to the first version; rather, it's a generation, the starting point for the second version. Another approach is to examine the date of these changes and compare it with the creation date stored in the createdInfo.json.

$ cat new.docx-hist/createdInfo.json | jq -r ".created"
2023-07-31 12:05:45

The next file shows the difference between one version and its earlier one. The first version is a blank document, so it's impossible to have a diff for this version, it doesn't make sense to compare nothing to an empty document. We can unzip this archive and see for ourselves.

$ unzip new.docx-hist/1/diff.zip
Blank document -> V2

The next one is the file with a key. This is the key for the first version, so it's almost correct. I'll explain why it's only almost correct later on.

The last on this list is the previous file. This file serves as the source for this version. I'm not sure why it was named as a previous file. Previous to what?

$ open new.docx-hist/1/prev.docx
Blank document

2.2.4. Write The Second Changes

In this section, I won't analyze the addition of the second changes in such detail because the essence of the movement is similar to those described above. Instead, I want to summarize how the history structure looks in the end.

$ find .
├─ new.docx             The source file of the latest, third versions.
└─ new.docx-hist                                           (latest, 3)
   ├─ 1
   │  ├─ changes.json   The black box of the second version. (2)
   │  ├─ diff.zip       The diff of the second version. (2)
   │  ├─ key.txt        The key of the first version. (1)
   │  └─ prev.docx      The source file of the first version. (1)
   ├─ 2
   │  ├─ changes.json   The black box of the third version. (3)
   │  ├─ diff.zip       The diff of the third version. (3)
   │  ├─ key.txt        The key of the second version. (2)
   │  └─ prev.docx      The source file of the second version. (2)
   └─ createdInfo.json  The meta of the first version. (1)

To summarize, I would say that there is no structure here. It's more like a chaotic collection of files.

The features of this structure are naturally reflected in the history functions and, as a result, in the history object.

$ curl /history | jq '.historyData."0"'
{
  "fileType": "docx",
  "key": "-1139790747860777407",
  "version": 1,
  "url": "http://proxy/downloadhistory?fileName=new.docx&ver=1&file=prev.docx&userAddress=172.19.0.3",
  "token": "..."
}

Here we've a HistoryData object for the first version. This object contains the URL to download the source file for this version. Go ahead.

$ curl /history | jq '.historyData."2"'
{
  "fileType": "docx",
  "key": "3159726309752106985",
  "version": 3,
  "url": "http://proxy/static/172.19.0.3/new.docx",
  "previous": {
    "fileType": "docx",
    "key": "-5826708658554727445",
    "url": "http://proxy/downloadhistory?fileName=new.docx&ver=2&file=prev.docx&userAddress=172.19.0.3"
  },
  "changesUrl": "http://proxy/downloadhistory?fileName=new.docx&ver=2&file=diff.zip&userAddress=172.19.0.3",
  "token": "..."
}

Here we've a HistoryData object fot the third version.

This object contains the URL to download the source file, but with a different endpoint. This is because there is no versioned directory for the latest version in the history structure. Two endpoints for the same things.

This object also contains the URL to download the source file of the previous version. The URL embeds the version number, which is the second version. This is correct because the second version is the previous version of the third version.

However, the object also includes the URL to download the diff between the third and the second versions. This URL actually contains the second version, which by logic should indicates that it's a diff between the second and first versions, but not today, not today.

3. The Potential Solutions

In the final chapter of my text, I would like to discuss and propose potential solutions to the problems that I've previously mentioned.

3.1. History Structure

Let's start with structure.

3.1.1. Create The Source Directory

It's important for me to mention the source directory before discussing the history directory. Currently, we append the history postfix to the source file basename to create the history directory, which may seem strange. Tomorrow we'll introduce a new feature involving files and add another postfix. This process may continue repeatedly, resulting in a messy file structure. I believe it would be easier to keep all the files related to a specific source file in its own directory.

I won't reinvent the wheel. Instead, I'll simply take a hash string from the file basename that the user will create or upload.

Warning
To avoid straining our eyes, I'll visually trim the hash string. However, please refrain from doing this in the source code.

$ echo "new.docx" | openssl sha256 | awk '{print $2}'
0f08dbc2...

$ mkdir 0f08dbc2

Next, for example, if the user creates a new file, I'll copy the document sample from the templates to the source directory.

$ cp templates/new.docx 0f08dbc2/new.docx

In this step, the structure will be as follows.

$ find .
└─ 0f08dbc2
   └─ new.docx

3.1.2. Create The Versions Directory

In a separate step, I would like to move the creation of the versions directory (also known as the *-hist in the current implementation). My goal is to ensure that in the future when working with versions, we never encounter a situation where we've to create a version directory using force. In other words, I want to ensure that whenever we create a subdirectory, we can always be certain that the versions directory already exists. This will simplify the logic and prevent unexpected errors.

$ mkdir 0f08dbc2/versions

In this step, the structure will be as follows.

$ find .
└─ 0f08dbc2
   ├─ new.docx
   └─ versions

3.1.3. Create The First Version

Instead of the current implementation, I suggest explicitly creating an initial version of the file when it's created or uploaded. To do this, first need to create a versioned directory.

$ mkdir 0f08dbc2/versions/1

Copy the source file to the versioned directory without renaming.

$ cp 0f08dbc2/new.docx 0f08dbc2/versions/1/new.docx

Currently, when a file is created, we generate the createdInfo.json which includes the date of creation, user ID, and user name. Additionally, we generate a key on the fly and save it as a key file at a later time.

Instead, I suggest generating a standard UUID here. To take the creation date and only the user ID without their name. Also, from now on, we need to keep the basename as we're not renaming the version file to the prev.* anymore. We can easily retrieve the user name at any time since it's hardcoded in the source code.

$ cat 0f08dbc2/versions/1/meta.json
{
  "key": "55077772-da00-4ee4-bb76-0cac08498d20",
  "basename": "new.docx",
  "created": "2023-07-31 12:57:07",
  "user": {
    "id": "uid-1"
  }
}

In this step, the structure will be as follows.

$ find .
└─ 0f08dbc2
   ├─ new.docx
   └─ versions
      └─ 1
         ├─ meta.json
         └─ new.docx

3.1.4. Write The Changes

The process of modifying a file is essentially very similar to how this file was created.

Create a subdirectory for the next version.

$ mkdir 0f08dbc2/versions/2

Copy the source file to the versioned directory.

$ cp 0f08dbc2/new.docx 0f08dbc2/versions/2/new.docx

Finally, place the additional files near the version file.

$ echo ... > 0f08dbc2/versions/2/meta.json
$ echo ... > 0f08dbc2/versions/2/diff.zip
$ echo ... > 0f08dbc2/versions/2/changes.json

In this step, the structure will be as follows.

$ find .
└─ 0f08dbc2
   ├─ new.docx
   └─ versions
      ├─ 2
      │  ├─ changes.json
      │  ├─ diff.zip
      │  ├─ meta.json
      │  └─ new.docx
      └─ 1
         ├─ meta.json
         └─ new.docx

3.1.5. Rename The Source File

In my implementation, I add a directory for the source file and throughout all the steps, I keep the source file's basename. Therefore, I want to discuss the file renaming process.

In the beginning, like when creating a file, need to take a hash string from the new name. For example, when the user wants to rename the new.docx to the my.docx.

$ echo "my.docx" | openssl sha256 | awk '{print $2}'
a0c1d714...

After this, need to rename the directory of the source file.

$ mv 0f08dbc2 a0c1d714

And rename the source file itself.

$ mv a0c1d714/new.docx a0c1d714/my.docx

In this step, the structure will be as follows.

$ find .
└─ a0c1d714
   ├─ my.docx
   └─ versions
      ├─ 2
      │  ├─ changes.json
      │  ├─ diff.zip
      │  ├─ meta.json
      │  └─ new.docx
      └─ 1
         ├─ meta.json
         └─ new.docx

3.1.6. Write The Changes After Renaming

Making changes after renaming the file won't be much different from the process described earlier. The only difference is that the meta will now include the new file basename.

Note
In the future, thanks to the basename in the meta, we might consider adding a history of how the file name has changed over time.

$ cat a0c1d714/versions/3/meta.json | jq -r ".basename"
my.docx

In this step, the structure will be as follows.

$ find .
└─ a0c1d714
   ├─ my.docx
   └─ versions
      ├─ 3
      │  ├─ changes.json
      │  ├─ diff.zip
      │  ├─ meta.json
      │  └─ my.docx
      ├─ 2
      │  ├─ changes.json
      │  ├─ diff.zip
      │  ├─ meta.json
      │  └─ new.docx
      └─ 1
         ├─ meta.json
         └─ new.docx

3.1.7. Restore The Specific Version

Thanks to the new structure, restoring a specific version has become much simpler.

Replace the current source file with the file with the requested version.

$ cp a0c1d714/versions/2/new.docx a0c1d714/my.docx

Run the process of bootstrapping the new version, just like we did when we first created the file.

$ mkdir a0c1d714/versions/4
$ cp a0c1d714/my.docx a0c1d714/versions/4/my.docx
$ echo ... > a0c1d714/versions/4/meta.json

In this step, the structure will be as follows.

$ find .
└─ a0c1d714
   ├─ my.docx
   └─ versions
      ├─ 4
      │  ├─ meta.json
      │  └─ my.docx
      ├─ 3
      │  ├─ changes.json
      │  ├─ diff.zip
      │  ├─ meta.json
      │  └─ my.docx
      ├─ 2
      │  ├─ changes.json
      │  ├─ diff.zip
      │  ├─ meta.json
      │  └─ new.docx
      └─ 1
         ├─ meta.json
         └─ new.docx

3.2. History Object

Changing the structure also has a positive impact on generating the history objects.

Below I've imagined how the history endpoints could potentially look different, but that isn't the most important aspect. First and foremost, please focus on the embedded versions.

GET /history/{{hash}}/{{version}}/data
GET /history/{{hash}}/{{version}}/download/{{basename}}?userIp=
$ curl /history/a0c1d714/1/data
{
  "fileType": "docx",
  "key": "55077772-da00-4ee4-bb76-0cac08498d20",
  "version": 1,
  "url": "http://proxy/history/a0c1d714/1/download/new.docx?userIp=172.19.0.3",
  "token": "..."
}
$ curl /history/a0c1d714/3/data
{
  "fileType": "docx",
  "key": "a397ffb3-a02d-47d4-9f07-b43025485cd6",
  "version": 3,
  "url": "http://proxy/history/a0c1d714/3/download/my.docx?userIp=172.19.0.3",
  "previous": {
    "fileType": "docx",
    "key": "a7e590f0-603c-48ac-b614-219af9d77766",
    "url": "http://proxy/history/a0c1d714/2/download/new.docx?userIp=172.19.0.3",
  },
  "changesUrl": "http://proxy/history/a0c1d714/3/download/diff.zip?userIp=172.19.0.3",
  "token": "..."
}

The Postscript

I'm concerned that if we keep adding new features to the current structure, it could become confusing and illogical. It took me a while to understand how the current implementation works. I've tried to provide a detailed explanation of the issue I encountered. You can find some sketches of my proposed solution in the feature/python-raw-history-endpoints and feature/ruby-raw-history-manager.

Footnotes

  1. Python, endpoint that creates an initial file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/views/actions.py#L97

  2. Node.js, endpoint that creates an initial file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/app.js#L273

  3. Ruby, endpoint that creates an initial file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/controllers/home_controller.rb#L41

  4. PHP, function that creates an initial file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L825

  5. Java, function that creates an initial file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/helpers/DocumentManager.java#L304

  6. Java Spring, function that creates an initial file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/managers/document/DefaultDocumentManager.java#L217

  7. C#, function that creates an initial file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/DocEditor.aspx.cs#L608

  8. C# MVC, function that creates an initial file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Controllers/HomeController.cs#L49

  9. Python, function that creates the meta. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/historyManager.py#L82

  10. Node.js, function that creates the meta. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/helpers/docManager.js#L147

  11. Ruby, function that creates the meta. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/models/document_helper.rb#L182

  12. PHP, function that creates the meta. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L399

  13. Java, function that creates the meta. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/helpers/DocumentManager.java#L270

  14. Java Spring, function that creates the meta. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/storage/LocalFileStorage.java#L311

  15. C#, function that creates the meta. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/DocEditor.aspx.cs#L644

  16. C# MVC, function that creates the meta. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Helpers/DocManagerHelper.cs#L229

  17. Python, key generation on the fly. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/views/actions.py#L175

  18. Node.js, key generation on the fly. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/app.js#L936

  19. Ruby, key generation on the fly. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/models/file_model.rb#L193

  20. PHP, key generation on the fly. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/views/DocEditorView.php#L73

  21. Java, key generation on the fly. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/entities/FileModel.java#L73 2

  22. Java Spring, key generation on the fly. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/services/configurers/implementations/DefaultDocumentConfigurer.java#L72 2

  23. C#, key generation on the fly. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/DocEditor.aspx.cs#L47 2

  24. C# MVC, key generation on the fly. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Models/FileModel.cs#L59 2

  25. Python, function that generates the history. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/historyManager.py#L148

  26. Node.js, function that generates the history. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/helpers/docManager.js#L442

  27. Ruby, function that generates the history. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/models/file_model.rb#L190

  28. PHP, function that generates the history. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L928

  29. Java, function that generates the history. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/entities/FileModel.java#L196

  30. Java Spring, function that generates the history. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/managers/history/DefaultHistoryManager.java#L67

  31. C#, function that generates the history. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/DocEditor.aspx.cs#L339

  32. C# MVC, function that generates the history. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Models/FileModel.cs#L240 2

  33. Python, force counting from one. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/historyManager.py#L155 2

  34. PHP, force counting from one. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L317

  35. Java, force counting from one. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/helpers/DocumentManager.java#L231

  36. Java Spring, force counting from one. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/storage/LocalFileStorage.java#L375

  37. C#, force counting from one. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/Default.aspx.cs#L246

  38. C# MVC, force counting from one. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Helpers/DocManagerHelper.cs#L167

  39. Python, check if the history directory contains a file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/historyManager.py#L47

  40. Ruby, check if the history directory contains a file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/models/document_helper.rb#L139

  41. PHP, check if the history directory contains a file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L324

  42. Java, check if the history directory contains a file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/helpers/DocumentManager.java#L241

  43. Java Spring, check if the history directory contains a file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/storage/LocalFileStorage.java#L382

  44. C#, check if the history directory contains a file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/Default.aspx.cs#L247

  45. C# MVC, check if the history directory contains a file. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Helpers/DocManagerHelper.cs#L168

  46. Python, function that generates a key. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/docManager.py#L255

  47. Node.js, function that generates a key. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/helpers/docManager.js#L403

  48. Ruby, function that generates a key. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/models/file_model.rb#L62

  49. PHP, function that generates a key. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L501

  50. Java, own implementation of the file-creating. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/helpers/DocumentManager.java#L315-L320

  51. Python, function that generates a key from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/docManager.py#L265

  52. Node.js, function that generates a key from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/helpers/documentService.js#L108

  53. Ruby, function that generates a key from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/models/service_converter.rb#L95

  54. PHP, function that generates a key from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L597

  55. Java, function that generates a key from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/helpers/ServiceConverter.java#L241

  56. Java Spring, function that generates a key from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/util/service/DefaultServiceConverter.java#L187

  57. C#, function that generates a key from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/DocumentConverter.cs#L170

  58. C# MVC, function that generates a key from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Helpers/DocumentConverter.cs#L168

  59. Python, key generation from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/trackManager.py#L62

  60. Node.js, key generation from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/app.js#L771

  61. Ruby, key generation from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/controllers/home_controller.rb#L112

  62. PHP, key generation from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/trackmanager.php#L117

  63. Java, key generation from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/controllers/IndexServlet.java#L278

  64. Java Spring, key generation from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/controllers/FileController.java#L214

  65. C#, key generation from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/Default.aspx.cs#L480

  66. C# MVC, key generation from the URL. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/WebEditor.ashx.cs#L239

  67. Documentation, description of the key. https://api.onlyoffice.com/editors/callback#key

  68. Documentation, description of the onRequestHistory. https://api.onlyoffice.com/editors/config/events#onRequestHistory

  69. Documentation, description of the onRequestHistoryData. https://api.onlyoffice.com/editors/config/events#onRequestHistoryData

  70. Python, check if the file is the latest version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/historyManager.py#L162

  71. Node.js, check if the file is the latest version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/helpers/docManager.js#L468

  72. Ruby, check if the file is the latest version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/models/file_model.rb#L218

  73. PHP, check if the file is the latest version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L944

  74. Java, check if the file is the latest version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/entities/FileModel.java#L218

  75. Java Spring, check if the file is the latest version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/managers/history/DefaultHistoryManager.java#L83

  76. C#, check if the file is the latest version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/DocEditor.aspx.cs#L360

  77. C# MVC, check if the file is the latest version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Models/FileModel.cs#L261

  78. Python, check if the file is the first version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/python/src/utils/historyManager.py#L170

  79. Node.js, check if the file is the first version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/nodejs/helpers/docManager.js#L480

  80. Ruby, check if the file is the first version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/ruby/app/models/file_model.rb#L218

  81. PHP, check if the file is the first version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/php/functions.php#L944

  82. Java, check if the file is the first version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java/src/main/java/entities/FileModel.java#L218

  83. Java Spring, check if the file is the first version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/managers/history/DefaultHistoryManager.java#L88

  84. C#, check if the file is the first version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp/DocEditor.aspx.cs#L360

  85. C# MVC, check if the file is the first version. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/csharp-mvc/Models/FileModel.cs#L261

@vanyauhalin
Copy link
Contributor Author

Note
This comment is another note in the margins.

I was working on implementing the restore endpoint for the example in Java with Spring and everything was going well. However, when I restored the document as the same user who created it, I noticed that the restored version has a different color in the history tree in the editor.

$ tree .
.
├── new.docx
└── new.docx-hist
    ├── 1
    │   ├── changes.json
    │   ├── diff.zip
    │   ├── key.txt
    │   └── prev.docx
    ├── 2
    │   ├── changes.json
    │   ├── diff.zip
    │   ├── key.txt
    │   └── prev.docx
    ├── 3
    │   ├── changes.json
    │   ├── key.txt
    │   └── prev.docx
    └── createdInfo.json

I was confused by this, so I decided to check if I had created the changes.json with a different user, but I didn't.

$ cat new.docx-hist/3/changes.json | jq
{
  "serverVersion": null,
  "changes": [
    {
      "created": "2023-08-03 07:28:29",
      "user": {
        "name": "John Smith",
        "id": "uid-1"
      }
    }
  ]
}

I also decided to take a look at the created.json file for a previous version. Instead of the usual user ID with the uid- prefix, I found a plain integer value.

$ cat new.docx-hist/1/changes.json | jq
{
  "serverVersion": "7.3.3",
  "changes": [
    {
      "created": "2023-08-03 07:27:20",
      "user": {
        "id": "1",
        "name": "John Smith"
      }
    }
  ]
}

My guess is that the misunderstanding is how the relationship is set up between the User entity1 and the User model.2 The User entity is stored in the database and has an integer primary key (also known as ID), while the User model stored the ID as a string.

Footnotes

  1. Java Spring, User entity. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/entities/User.java#L36

  2. Java Spring, User model. https://github.com/ONLYOFFICE/document-server-integration/blob/7f659bc95e7e92c5d428b00b17b68de5f374da99/web/documentserver-example/java-spring/src/main/java/com/onlyoffice/integration/documentserver/models/filemodel/User.java#L31

@vanyauhalin
Copy link
Contributor Author

Some additions after talking with the team.

File Directory Name

I suggested taking a hash from the file name to use it in the directory name to avoid possible problems when working with the directory (spaces in the name, file extension in the name, and so on). But this approach isn't very good because when the user creates a file with the same name as the existing one. If we're working with a hash, then we must increase the counter, take a new hash, and check if the directory with this hash exists. If it does, then we must repeat the process until we find a non-existent version.

It's better to just create a new UUID for each directory. With this approach, we can store many files with the same name, and everything will be fine.

Key Length

Previously, the document server had a key length limit of 20 characters. However, if I understand correctly, this limitation isn't too strict anymore. Therefore, we can now use UUID for the file key.

Latest Version

It may not make sense to store the latest version outside of the versions directory since we already store it in the versioned directory.

Diff and Changes Locations

In the current history implementation, a diff.zip and change.json for the next version are kept near the current version directory. This is based on the assumption that these files are instructions for reaching the next version and that these instructions can only be applied to the current version.

Part of a snippet from the first issue message.

├─ 1
│  ├─ changes.json   The black box of the second version. (2)
│  ├─ diff.zip       The diff of the second version. (2)
│  ├─ key.txt        The key of the first version. (1)
│  └─ prev.docx      The source file of the first version. (1)

On the one hand, I now understand why this approach was chosen. On the other hand, it doesn't seem logical to me. I'll try to demonstrate this with a primitive example of how history works in Git.

Let's imagine we have a main branch and a feature and fix based on that branch. Each of the new branches has its own changes to the source file.

                      ┌───────────────────┐
                      │  branch: main     │
                   ┌──┤  source: main.js  ├──┐
                   │  └───────────────────┘  │
┌──────────────────┴─┐                     ┌─┴──────────────────────┐
│  branch: fix       │                     │  branch: feature       │
│  source: main.js   │                     │  source: main.js       │
│  diff:   fix.diff  │                     │  diff:   feature.diff  │
└────────────────────┘                     └────────────────────────┘

Yes, there is no such branching in our product, but the essence doesn't change from that. It's more logical to store a diff.zip and changes.json not near the file to which you want to apply these changes, but near the file that will be obtained after applying these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant