Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong MIME type on HTML5 pages in XHTML namespace #198

Closed
ghost opened this issue Jul 9, 2018 · 7 comments
Closed

Wrong MIME type on HTML5 pages in XHTML namespace #198

ghost opened this issue Jul 9, 2018 · 7 comments

Comments

@ghost
Copy link

ghost commented Jul 9, 2018

@dariok commented on Jul 8, 2018, 12:59 PM UTC:

What is the problem

When using eXide to create an HTML file with the option "HTML5 file in XHTML namespace", this file is saved (and consequently served) with the incorrect MIME type of text/html. Additionally, the doctype is omitted by Jetty. While it is possible to manually change the MIME type, it is changed back to text/html each time you save the file from eXide.

The beginning of the file in eXide as auto-created:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>

The response of eXist with such a file:

HTTP header: Content-Type: text/html;charset=utf-8
Beginning of file:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>

What did you expect

  1. The correct namespace, application/xhtml+xml;
  2. The file with correct doctype - just a it is in eXide

Describe how to reproduce or add a test

Create a file in eXide with the above parameters and check the response as served by Jetty.

Please always add the following information

  • eXist 3.6.1, Build 201801032119
  • Java 1.8.0_151
  • Ubuntu 16.04 / Kernel Linux 4.4.0-042stab126.1 amd64
  • JAR installer
  • No custom changes

This issue was moved by adamretter from eXist-db/exist#2006.

@ghost
Copy link
Author

ghost commented Jul 9, 2018

@joewiz commented on Jul 8, 2018, 1:36 PM UTC:

Which file extension are you using? If you’re using .html could you please test .xhtml and report your results?

In my next reply, I’ll cover the difference between eXide’s templates, eXide’s saving behavior, eXist’s file extension-based mime-type resolution, and XQuery serialization.

@ghost
Copy link
Author

ghost commented Jul 9, 2018

@dariok commented on Jul 8, 2018, 1:39 PM UTC:

The extension does not seem to make a difference. Cf. http://exist.baukast.digital:8888/exist/apps/pageCount/index.xhtml which also is returned as text/html.


Added: I assumed that there are 4 things working here, eXide's templates, its saving mechanism, what eXist actually does save and lastly how it is served (e.g. by Jetty). This is why I decided not to open this issue in the eXide repo.
However, if I explicitly choose to create an XHTML document, these 4 steps should work in unison to actually (save and) serve an XHTML file with the correct MIME type.

@joewiz
Copy link
Member

joewiz commented Jul 10, 2018

@dariok Ok, so it sounds like you've anticipated these points, but here are the various components involved in the create-save-edit-save-serve processes you described:

  1. eXide's templates are starter files provided for your convenience and associated with an eXide editing "mode"—which defines syntax highlighting, linting, and mime type for the contents of the new editor pane. See https://github.com/eXist-db/eXide/blob/develop/templates/documents.xml for the template definitions, and https://github.com/eXist-db/eXide/blob/develop/src/editor.js#L573-L580 where the modes are instantiated.
  2. eXide's "Save" action triggers an HTTP PUT request to eXide's /store endpoint, with a Content-Type header corresponding to the mode of the editor pane. This header is used to supply a mime type parameter to the xmldb:store function in https://github.com/eXist-db/eXide/blob/develop/modules/store.xql#L74-L80. You'll see some other rules for binary documents and fallbacks, but this is the basic path for saving documents/resources via eXide.
  3. eXist's file extension-based mime-type resolution is used when storing documents/resources into the database when an explicit mime-type isn't available. See https://github.com/eXist-db/exist/blob/develop/mime-types.xml.tmpl for the list of associations. This explains why eXist raises an error when evaluating xmldb:store("/db", "test.html", "blah") but allows xmldb:store("/db", "test.txt", "blah") without error; the parsing/validation behavior and mime type assigned to the resource are determined based on the supplied file extension.
  4. XQuery serialization rules affect how a document/resource is served/sent to the client. In eXide, the results of a query are stored to the current session, then serialized according to eXide's output method dropdown: https://github.com/eXist-db/eXide/blob/develop/modules/session.xql#L37-L56. For example, to see the <!DOCTYPE html> doctype declaration, select the "HTML5 Output" option in eXide.

To answer your question more directly, eXist does not store a file's doctype when it is ingesting a file; nor does it particularly distinguish between different kinds of HTML documents, as long as they're well-formed XML documents. eXide, in particular, conflates HTML and XHTML documents as belonging to the same "mode" and assigns the same mime type to these documents. Once a document is stored, if you want to serve (serialize) it using HTML5 rules and a doctype declaration (as opposed to the default XML method applied to ), you need to tell eXist to serialize it as such; see http://exist-db.org/exist/apps/doc/xquery#serialization.

I do see one possible opportunity for avoiding the unexpected behavior you described in your original post: We could add an "XHTML" mode to eXide, which preserves the mime type of application/xhtml+xml instead of conflating XHTML files with text/html. That way, files you store with this mode will keep this mime type, and this mime-type will persist after you edit/re-save create/edit actions in eXide. I'm not sure if this will work (or if the fix is within my abilities), but if you think this will help, we could refocus the issue on this request for enhancement.

@dariok
Copy link
Contributor

dariok commented Jul 10, 2018

@joewiz Let me start from the back:

ad 4 I of course know about XQuery serialization but I do not understand how the XQuery serialization rules apply here. They obviously do if Ihave an XQuery that generates or otherwise handles the XHTML file. Setting the output option for the way a query result displayed is fair enough, but not applicable here as the file is not part of any query.
In this case, the file is stored as an XHTML file in the database and is addressed directly (i.e. there is no templating in use in this case). Thus, there is, as far as I can see, nowhere I could set any serialization option.

ad 3 That indeed seems to choose the correct MIME type:

let $orig := doc('/db/apps/pageCount/index.xhtml')
return xmldb:store('/db/apps/pageCount', 'test.xhtml', $orig)

http://exist.baukast.digital:8888/exist/apps/pageCount/test.xhtml is served as application/xhtml+xml.

ad 2 and 1 If I want to create an HTML file from within eXide, I have to choose this after selecting "new". Obviously, other selections in this dialog result in different MIME types. That is the way I expect it to be.

Now, if I choose HTML, I have two options, one is actually an HTML fragment for templating – which thus should never be served on its own and thus it's MIME typ is of secondary importance –, the other the option in question: “HTML5 Document in XHTML Namespace”.

It is obvious that this file is meant to be an XHTML file and eXide's template dutifully includes the XML declaration on top. Consequently, the MIME type should be application/xhtml+xml and nothing else. If a user decides to change things so that the file is not an XHTML file any more, then changing the MIME type is up to them. But if I stick to the template, I actually expect it to do what it says on the box.

I myself do not actually need the file to be XHTML expressly. I'd be perfectly happy with an HTML5 file served as text/html in no namespace and without an XML declaration.

I think there are three different options:

  1. When the second option for the type “HTML” is selected from the “New” dialog, save the file with the correct MIME type of application/xhtml+xml;
  2. leave out the XML declaration and the namespace, change the name of the second option to reflect this and stick to text/html (which means that this file would have to be considered a binary resource, I presume, for it may be HTML tag soup);
  3. as above but actually consider it an XML file and thus enforce wellformedness but still use text/html.

As regards the DOCTYPE: I do not really understand why eXist should drop <!DOCTYPE html>. If the file is treated as a binary, it should be stored as-is without interfering with the file's contents – after all, nobody would expect eXist to replace any occurence of “joe” by “dario” ;)

If the file is considered to be an XML file, I'd be puzzled if eXist drops the doctype declaration. After all, it is a part of the XML spec and, albeit used less and less frequently due to current alternatives, still an integral part of many XML files. I do not know whether that still is the case, but I remember reading or hearing that for XML files the doctype part is stored separately but added back by the serializer.

Of course, an XHTML file does not need the doctype, while a normal HTML5 does need it.

In any case, if it is dropped anyway, the template should not contain it in the first place. It would be better, though, to retain it for HTML files as it is a required part of the standard (even though the file should get rendered by browsers nonetheless).


Options 2 and 3 from above would need the doctype to be included while it could be left out for option 1.
Which solution is chosen is of course up to you. However, it really should be standards compliant.

@joewiz
Copy link
Member

joewiz commented Jul 13, 2018

@dariok Ah, I wasn't clear exactly how you were serving up the files. Please disregard my explanation of serialization options then.

As regards the DOCTYPE, if you'd like to pursue the issue of eXist storing a document's doctype in addition to the document, you're right that this is an eXist issue and not an eXide issue. Please see https://markmail.org/thread/27fxcjicang6yk3q for the most recent discussion about this, and open an issue or discussion on exist-open specifically about this.

Would you be interested to file a PRs improving eXide's templates or adding support for an XHTML mode? I think the links to the source in my first note would provide pointers to the places that would need work.

@dariok
Copy link
Contributor

dariok commented Jul 13, 2018

@joewiz I'll gladly have a look at it next week when I'm back in the office.

Thank you for the pointer to the discussion, I will look at it, too, and see hiw i can contribute.

All best!

@joewiz
Copy link
Member

joewiz commented Aug 22, 2018

Closed by #199.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants