gw_logo_08.gif (1982 bytes) 
Last edit: 05-12-13 Graham Wideman
HTML/Excel/Access/etc
HTMLTagClean: Fixing Office's Baroque HTML
Article created: 2005-03-04

Overview

If you've ever wanted to paste data from Excel to an HTML page, say in FrontPage, you've confronted the problem that Excel's HTML is heavily laden with gratuitous formatting.

In theory there's an argument that this is so that the HTML can be read back into Excel with enough detail to recreate the original... but as often as not it leaves you tearing your hair out trying to get rid of all that ^&%$# formatting. And FrontPage's "Remove Formatting" is not much help here, as at least some of the formatting is in the form of styles and attributes arguments that this function ignores.

Typical Example:

It looks simple in Excel:

... but once copied and pasted into Frontpage, the html is hideous:

<table x:str border="0" cellpadding="0" cellspacing="0" width="192" style="border-collapse: collapse;width:144pt" id="table2">
<colgroup>
<col width="64" span="3" style="width:48pt">
</colgroup>
<tr height="15" style="height:11.25pt">
<td height="15" align="right" width="64"
style="height: 11.25pt; width: 48pt; font-size: 8.0pt; color: windowtext; font-weight: 400; font-style: normal; text-decoration: none; font-family: Arial; text-align: general; vertical-align: bottom; white-space: nowrap; border: medium none; padding-left: 1px; padding-right: 1px; padding-top: 1px"
x:num>1</td>
<td width="64"
style="width: 48pt; font-size: 8.0pt; font-weight: 700; font-family: Arial, sans-serif; color: windowtext; font-style: normal; text-decoration: none; text-align: general; vertical-align: bottom; white-space: nowrap; border: medium none; padding-left: 1px; padding-right: 1px; padding-top: 1px">
One</td>
<td width="64"
style="width: 48pt; font-size: 8.0pt; font-weight: 700; font-family: Arial, sans-serif; color: windowtext; font-style: normal; text-decoration: none; text-align: general; vertical-align: bottom; white-space: nowrap; border: medium none; padding-left: 1px; padding-right: 1px; padding-top: 1px">
Ten</td>
</tr>
<tr height="15" style="height:11.25pt">
<td height="15" align="right"
style="height: 11.25pt; color: red; font-size: 8.0pt; font-weight: 400; font-style: normal; text-decoration: none; font-family: Arial; text-align: general; vertical-align: bottom; white-space: nowrap; border: medium none; padding-left: 1px; padding-right: 1px; padding-top: 1px"
x:num>2</td>
<td style="font-size: 8.0pt; font-weight: 700; font-family: Arial, sans-serif; color: windowtext; font-style: normal; text-decoration: none; text-align: general; vertical-align: bottom; white-space: nowrap; border: medium none; padding-left: 1px; padding-right: 1px; padding-top: 1px">
Two</td>
<td style="font-size: 8.0pt; font-weight: 700; font-family: Arial, sans-serif; color: windowtext; font-style: normal; text-decoration: none; text-align: general; vertical-align: bottom; white-space: nowrap; border: medium none; padding-left: 1px; padding-right: 1px; padding-top: 1px">
Twenty</td>
</tr>

.... and that's just the first couple of rows! It's faster to just retype it from scratch. And what Excel saves as an HTML file is similar.

Enter HTMLTagClean

In a few quick steps, HTMLTagClean helps get rid of all attributes from selected tags, and can remove other tags completely:

Step Description
Launch HTMLTagClean ... if it's not already running. (In Windows Explorer, double-click on HTMLTagClean.exe, or use shortcut.) 
Copy the HTML from the HTML editor. Select and copy the chunk of HTML (source) that you want to fix. Hint: If you've pasted an Excel table into Frontpage, then select it while in Design view (because that's easy), then flip to Code view, then copy it.

...or for a quick test, just hit the (Example) button in HTMLTagClean

Paste into HTMLTagClean Press the Clear button if there's existing text.  Press the Paste button:

Clean attribs from tags

... or ...

a) In the "2. Remove Attribs from Tags" area, enter a list of tags into the slot next to the "From these tags:" button.

b) Decide whether you want to remove or retain particular attribs, and set the Retain/Remove radio button and attribs slot accordingly.

c) Hit the "From these tags" button.
 

Remove tags themselves a) In the "3. Remove Tags" area, enter the tags to be removed.

b) Hit the "Remove these tags" button.

Results

... aaahhhh! What a relief!

Select, Copy, Paster Use the Select All...Copy button to copy the cleaned HTML to the clipboard. Then paste into Frontpage in place of the original HTML
Format Now you can format the minimal HTML the way you  want to.

 Download and Release Notes

Item Version Date Description/Release Notes
HTMLTagClean_111.zip 1.1.1 2005-12-03 Added Retain/Remove feature. (Thanks to Lee Tagg for the  suggestions.)
Added Example button
Revised HTML buffer to avoid truncating really long lines (over 1000 characters)
HTMLTagClean_101.zip 1.0.1 2005-08-27 Minor adjustments to layout and memo behavior
HTMLTagClean 1.0.0 2005-03-15 Original version
       

Installation

Step Description
Download .... from this page
Unzip Using WinXP Windows Explorer, or WinZip. Copy contents to some convenient folder, perhaps:
c:\Program Files\HTMLTagClean
Shortcuts Optionally create a shortcut on desktop or in Start menu in the usual way, by Alt-Dragging the executable file from Windows Explorer.
   

 


Go to:  gw_logo_08.gif (1982 bytes)