Utility: Beautify HTML

by Andy Prevost

Friday August 25 2023

I nearly gave up after trying several HTML "prettify" tools.

All work, well, sort of. Tidy is a built-in PHP feature. The documentation is poor, very poor. They describe each option in a way that doesn't really explain what the option does.

Tidy, like many software tools, tries to be all things. It's bloated with too many options. It adds tags, adds tag attributes, and modifies existing attributes – all on its own with no apparent way to alter that behaviour.

I did use Tidy for years ... but I had to give up on a small project where trying to figure out the correct options was taking longer than manually doing the work.

Next on the list was the DOMDocument. This is accessing the web page at its root level and modifying the structure on the fly. I ran into problems similar to those of Tidy, though. Including poor documentation and examples.

The closest I came to perfection with the DOMDocument was processing the web page as XML and then removing the XML artifacts (mainly a header line).

Both Tidy and DOMDocument had some features to ignore (or overlook). That was mainly how it handled the tiers and spacing of styles and scripts.

There are a few other "open source" alternatives available. Most are applications with multiple files - again most are trying to be all things and end up bloated. For example, one package had modules for beautifying HTML, another module for minimizing CSS, and another for minimizing Javascript. There are language files and other options. Really, all I want is the ability to specify how many spaces to use as a indent (2) and what character to use (space).

I did end up finding an older program called "Beautify Html class" originally ported to PHP by Ivan Weiler (in 2014). The package is now MIT licensed with a copyright by Einar Lielmanis. Turns out that this single file class is about 95% of what I want and need. I modified the class to suit my needs. The two major changes are: add support for XHTML, and fix an issue with textarea support. One feature of tiers and spacing in this package is that it puts a line ending after an opening tag, then the text on a separate line with its own line ending, and then the closing tag - all with indentation levels. With the textarea tag, this is not workable. Anything between the start and end textarea tags is part of the value. That means the indentation levels become part of the textarea value. On initial startup of a page, that is likely empty - triggering the display of a placeholder. With empty spaces from the indentation levels, the placeholder never gets to display and assist with fill-in information for the user.

Let's deal with a few questions:

  1. Why "beautify" HTML? no one ever sees it.
    That's true, unless you view the source, no one ever sees the underlying code, text and tags. I find it important to be able to read the page and find what elements. I am more interested, however, in the XHTML variation ... that is identify those tags that are malformed or missing closing tags and insert them for me. This helps with search engine optimization.
  2. Why not just use the basics of Tidy?
    The basics of Tidy are far more than simply spacing and indent levels. There is no way of avoiding setting options and finding the correct ones and their corresponding values that work for the specific project.
  3. Why not just use the basics of DOMDocument?
    Again, no such thing. The DOMDocument is not a utility, it is accessing the underlying roots of a processor ... you have to tell it everything you want to do.

I am thrilled to have found Beautify Html class. You can download it yourself ... since it is MIT licensed, you can use it in any way you want, make any modifications you want. I will share two of my modifications, the first is a fix for the XHTML variation of required:

    $data = str_ireplace('required=""','required',$data);

and the second is the fix for the textarea spacing issue:

    $data = preg_replace('~<textarea(.*?)>\s*(.*?)\s*</textarea>~si','<textarea$1>$2</textarea>', $data);

Enjoy!

 

◀ Previous Next ▶

Post a Comment