MHTML – the path to enlightenment…

Further to my cryptic last entry, I thought I’d better explain a bit about this MHTML thing. In essence, MHTML (MIME HTML) is an Internet standard that allows for the MIME encapsulation of aggregate documents, such as HTML. In other words, it allows you to save (and therefore distribute) a HTML page with all images and other resources in it in one handy file.

More technical information can be found here and here.

If you’ve been working with web pages for a while, you’ll know that at some point you want to send a web page – complete with images (and occasionally other resources, such as embedded applets, sound files etc) to a friend. Sending the HTML itself is easy, almost all browsers will let you save a page as a .html file. Making the images go along with it isn’t so easy.

Microsofts Internet Explorer allows users to save web pages as “Web archive, single file (*.mht)”, which compiles the current page – along with images – into a MHTML file. Just click File > Save As then choose the right option. Fantastic, very useful.

However, that’s not very useful when we want web applications to be independent of particular browsers technology. So, a solution had to be found.

The problem I had was that my software produced nicely-formatted HTML reports which the client wanted in Word format, so they could amend bits if required before printing. I could get the HTML file to appear to the browser as a HTML file by a bit of ContentType jiggery-pokery:

(All these examples will be ASP, VBScript, suitable for use on a Windows web server)

Response.ContentType = "application/msword"
Response.AddHeader "content-disposition", "attachment; filename=Report.doc"
%>

However, that missed the images out when the user saved the document, it being a HTML-faked-to-Word document. In case you didn’t know, changing the file extension of a HTML document to “.doc” will produce a reasonably accurate Word file of the web page (albeit without images), as Word likes to think of itself as a HTML editor.

So, a solution was needed to get the images back in there. In steps MHTML, and a CDO (collaborative data objects) object in VBScript designed for use creating HTML emails. Here’s the code, with comments for you geeky-types out there:

' Create an instance of the CDO.Message object
Set iMsg = CreateObject("CDO.Message")

' Convert a URL to MHTML (yes, it really is this easy)
iMsg.CreateMHTMLBody ("http://www.google.com")

' Chek if we have any errors
If Err.Number 0 Then

' if so, write it to the screen
response.write "Error: " + Err.Description

Else

' if not, create a new Word file on the server
Set f= fs.CreateTextFile(server.mappath("Report.doc"), True)

' write the MHTML to the file
f.Write iMsg.GetStream.ReadText

' close the file
f.Close

End If
%>

Easy! Of course, you could do all sorts of things after that. What I do is then attach the file I’ve just created to the document (using the ContentType code above) so that it’s offered as a download to the user.

So, using ASP you can create a single-file archive of a complete web page including images and save it as a Word document. Oh, saving it as a .eml file will make it a valid Outlook HTML email as well!

Problems: I have met one or two problems using images with the widths set using percentages and centimetres (for instance, when creating a dynamic bar chart), but images left with their natural size appear just fine.

Content follows form – or does it?…

Over at OK/Cancel (a very fine comic, by the way) they say that content should follow form, as there’s no use showing a very long piece of text on a mobile phone. I agree – to a certain extent.

I’m not sure that the writer (the estimable Tom Chi) has a full grip on what web standards (CSS

In practice, the CSS step is helpful for dealing with maintenance of web layouts/styles, but truly valuable separation happens in a different place – it happens simply in deciding to store/serve your data using a database instead of hand-coding static content.

But this isn’t the only way to do it. Consider a long piece of text. I’ve shortened it into 4 “paragraphs”. You’ll have to imagine lots more text in each paragraph.

Paragraph 1

Paragraph 2

Paragraph 3

Paragraph 4

So, on a screen (computer monitor, TV, projector screen etc) this would be fine. But on a mobile phone or other small output device, it would be too much to read. In comes CSS, like the Lone Ranger, to sort out all our problems. Add a class to each paragraph:

Paragraph 1

Paragraph 2

Paragraph 3

Paragraph 4

Then you can use CSS to select which paragraphs to send to each device, like this:

@media screen
{
p.par1, p.par2, p.par3, p.par4
{
display: block;
}
}
@media handheld
{
p.par1
{
display: block;
}
p.par2, p.par3, p.par4
{
display: none;
}
}

More information on media types can be found here. There, problem sorted.

You could go a lot further and use the very powerful child selectors (due to get even better in CSS3) to automatically select the first X number of paragraphs for display to different media. An, of course, a smattering of DOM manipulation could add a link to the full-length article for devices that don’t get all the text by default.

So, while content can follow form, with CSS it doesn’t have to.