Search word highlighting…

Just a quickie. I wanted to highlight search words in a HTML document (using PHP this time) without breaking links, and couldn’t get the regular expression right. But Aidan Lister came to my rescue, what a guy!

MHTML – the path to enlightenment…

Further to my cryptic last entry, I thought I’d better explain a bit about this MHTML thing. In essence, MHTML (MIME HTML) is an Internet standard that allows for the MIME encapsulation of aggregate documents, such as HTML. In other words, it allows you to save (and therefore distribute) a HTML page with all images and other resources in it in one handy file.

More technical information can be found here and here.

If you’ve been working with web pages for a while, you’ll know that at some point you want to send a web page – complete with images (and occasionally other resources, such as embedded applets, sound files etc) to a friend. Sending the HTML itself is easy, almost all browsers will let you save a page as a .html file. Making the images go along with it isn’t so easy.

Microsofts Internet Explorer allows users to save web pages as “Web archive, single file (*.mht)”, which compiles the current page – along with images – into a MHTML file. Just click File > Save As then choose the right option. Fantastic, very useful.

However, that’s not very useful when we want web applications to be independent of particular browsers technology. So, a solution had to be found.

The problem I had was that my software produced nicely-formatted HTML reports which the client wanted in Word format, so they could amend bits if required before printing. I could get the HTML file to appear to the browser as a HTML file by a bit of ContentType jiggery-pokery:

(All these examples will be ASP, VBScript, suitable for use on a Windows web server)

Response.ContentType = "application/msword"
Response.AddHeader "content-disposition", "attachment; filename=Report.doc"
%>

However, that missed the images out when the user saved the document, it being a HTML-faked-to-Word document. In case you didn’t know, changing the file extension of a HTML document to “.doc” will produce a reasonably accurate Word file of the web page (albeit without images), as Word likes to think of itself as a HTML editor.

So, a solution was needed to get the images back in there. In steps MHTML, and a CDO (collaborative data objects) object in VBScript designed for use creating HTML emails. Here’s the code, with comments for you geeky-types out there:

' Create an instance of the CDO.Message object
Set iMsg = CreateObject("CDO.Message")

' Convert a URL to MHTML (yes, it really is this easy)
iMsg.CreateMHTMLBody ("http://www.google.com")

' Chek if we have any errors
If Err.Number 0 Then

' if so, write it to the screen
response.write "Error: " + Err.Description

Else

' if not, create a new Word file on the server
Set f= fs.CreateTextFile(server.mappath("Report.doc"), True)

' write the MHTML to the file
f.Write iMsg.GetStream.ReadText

' close the file
f.Close

End If
%>

Easy! Of course, you could do all sorts of things after that. What I do is then attach the file I’ve just created to the document (using the ContentType code above) so that it’s offered as a download to the user.

So, using ASP you can create a single-file archive of a complete web page including images and save it as a Word document. Oh, saving it as a .eml file will make it a valid Outlook HTML email as well!

Problems: I have met one or two problems using images with the widths set using percentages and centimetres (for instance, when creating a dynamic bar chart), but images left with their natural size appear just fine.