The
HtmlTransformer
class is a powerful and efficient edge function helper designed to modify HTML responses from the origin server. It serves as a wrapper for the lol_html::HtmlRewriter Class Rust crate, ensuring seamless compatibility and supporting streaming HTML response bodies.This class provides a simple and intuitive API for transforming HTML responses. It allows you to define transformations based on HTML selectors, such as elements, comments, and text, and apply these transformations to the HTML response as it streams in from the origin server. This approach is more efficient and avoids memory limitations, making it ideal for processing large HTML responses.
To use the
HtmlTransformer
, you’ll first need to define the transformations you want to apply to the HTML response. The following sample code demonstrates how to create an instance of the HtmlTransformer
class and define the transformations:JavaScript
1const transformerDefinitions = [2 {3 selector: 'a[href]',4 element: async (el) => {5 const href = el.get_attribute('href');6 el.set_attribute('href', href.replace('http://', 'https://'));7 },8 },9 {10 selector: 'body',11 comment: async (c) => {12 c.remove();13 },14 },15 {16 doc_end: async (d) => {17 d.append(18 `<!-- Transformed at ${new Date().toISOString()} by Edg.io -->`,19 'html'20 );21 },22 },23];
There are two types of rewriter callback definitions that can be passed into the
HtmlTransformer
:- Transformations that match an HTML selector and trigger a callback function when the selector is found.
comment
: Operates on any HTML comment matching the specified selector.element
: Operates on the HTML element matching the specified selector and the element’s attributes.text
: Operates on any text matching the specified selector.
- Transformations that trigger the callback function when the HTML document is found. This type of transformation does not require an HTML selector.
doc_comment
: Operates on any HTML comment in the document.doc_text
: Operates on any text in the document.doc_type
: Provides read-only information on the HTML document type.doc_end
: Triggered when the end of the HTML document is reached.
Learn more about Definitions and Selectors.
Quick Start Examples
Example 1: Basic Usage
Here is an edge function that uses HtmlTransform to:
- Ensure all
<a href="...">
links are HTTPS. - Remove all HTML comments.
- Append a comment for the transformation timestamp to the end of the document.
JavaScript
1export async function handleHttpRequest(request, context) {2 const transformerDefinitions = [3 // This first definition replaces http with https in the <a href=...> elements4 {5 // This selector will match all <a> elements which have an href attribute6 selector: 'a[href]',7 element: async (el) => {8 const href = el.get_attribute('href');9 el.set_attribute('href', href.replace('http://', 'https://'));10 },11 },1213 // This second definition removes all comments from the <body>14 {15 selector: 'body',16 comment: async (c) => {17 // remove the comment18 c.remove();19 },20 },2122 // This third definition appends a timestamp to the end of the HTML document23 {24 // Since this operates on the document, there is no need to specify a selector25 doc_end: async (d) => {26 // Use the append() method to append the timestamp to the end of the document27 // Specify 'html' as the second arguemnt to indicate the content is HTML, not plain text28 d.append(29 `<!-- Transformed at ${new Date().toISOString()} by Edg.io -->`,30 'html'31 );32 },33 },34 ];3536 // Get the HTML from the origin server and stream the response body through the HtmlTransformer37 return fetch(request.url, {edgio: {origin: 'api_backend'}}).then(38 (response) => {39 let transformedResponse = HtmlTransformer.stream(40 transformerDefinitions,41 response42 );43 // Make any changes to the response headers here.44 transformedResponse.headers.set('x-html-transformer-ran', 'true');45 return transformedResponse;46 }47 );48}
We will now examine how this edge function will transform the following HTML response provided by an origin server:
HTML
1<!DOCTYPE html>2<html>3<head><title>Script Example</title></head>4<body>5 <h1>Script Example</h1>6 <p>Script example.7 <!-- This is a <p> comment -->8 </p>9 <a href="http://edg.io/">Edgio Homepage</a>10 <!-- This is a <body> comment -->11</body>12</html>
This edge function transforms the above HTML to ensure HTTPS links, remove HTML comments, and append a transformation timestamp. The transformed HTML is shown below.
HTML
1<!DOCTYPE html>2<html>3 <head>4 <title>Script Example</title>5 </head>6 <body>7 <h1>Script Example</h1>8 <p>Script example.</p>9 <a href="https://edg.io/">Edgio Homepage</a>10 </body>11</html>12<!-- Transformed at 2023-11-29T00:52:09.942Z by Edg.io -->
Example 2: Edge Side Includes with fetch()
This sample edge function uses HtmlTransform to replace all
<esi:include src="..."/>
elements with the content of the specified URL.JavaScript./edge-functions/esi_example.js
1export async function handleHttpRequest(request, context) {2 // This definition replaces <esi:include src="..." /> the response from the src3 const transformerDefinitions = [4 {5 // This selector will match all <esi:include /> elements which have a src attribute.6 // We escape the : character with 2 backslashes to indicate it is part of the tag name7 // and not an attribute of the selector.8 selector: 'esi\\:include[src]',9 element: async (el) => {10 const url = el.get_attribute('src');11 const response = await fetch(url, {edgio: {origin: 'edgio_self'}});12 if (response.status == 200) {13 const body = await response.text();14 el.replace(body, 'html');15 } else {16 el.replace(17 '<a>We encountered an error, please try again later.</a>',18 'html'19 );20 }21 },22 },23 ];2425 // For demo purposes, we'll fetch a local asset HTML file that contains an ESI include.26 const esiIncludeSource = new URL(request.url);27 esiIncludeSource.pathname = '/assets/esi_include.html';2829 // Get the HTML from the origin server and stream the response body through the HtmlTransformer30 return fetch(esiIncludeSource, {edgio: {origin: 'edgio_self'}}).then(31 (response) => {32 let transformedResponse = HtmlTransformer.stream(33 transformerDefinitions,34 response35 );36 // Make any changes to the response headers here.37 transformedResponse.headers.set('x-html-transformer-ran', 'true');38 return transformedResponse;39 }40 );41}
We will now examine how this edge function will transform the following HTML response provided by an origin server:
HTML./assets/esi_include.html
1<!DOCTYPE html>2<html>3 <head>4 <title>Script Example</title>5 </head>6 <body>7 <h1>Script Example</h1>8 <esi:include src="/assets/esi_snippet.html" />9 </body>10</html>
Our ESI snippet file
/assets/esi_snippet.html
contains the following HTML:HTML./assets/esi_snippet.html
1<div>2 <h1>Hello, World!</h1>3 <p>This snippet will be included in the response via ESI.</p>4</div>
This edge function transforms the above HTML to replace
<esi:include ... />
with the results of the fetch. The transformed HTML is shown below.HTML
1<!DOCTYPE html>2<html>3 <head>4 <title>Script Example</title>5 </head>6 <body>7 <h1>Script Example</h1>8 <div>9 <h1>Hello, World!</h1>10 <p>This snippet will be included in the response via ESI.</p>11 </div>12 </body>13</html>
Example 3: Using fetch() with response streaming
This example is a modified version of Example 2 without the
HtmlTransformer.stream()
helper function.JavaScript./edge-functions/esi_response_stream_example.js
1export async function handleHttpRequest(request, context) {2 // This definition replaces <esi:include src="..." /> the response from the src3 const transformerDefinitions = [4 {5 // This selector will match all <esi:include /> elements which have a src attribute.6 // We escape the : character with 2 backslashes to indicate it is part of the tag name7 // and not an attribute of the selector.8 selector: 'esi\\:include[src]',9 element: async (el) => {10 const url = el.get_attribute('src');11 const response = await fetch(url, {edgio: {origin: 'edgio_self'}});12 if (response.status == 200) {13 const body = await response.text();14 el.replace(body, 'html');15 } else {16 el.replace(17 '<a>We encountered an error, please try again later.</a>',18 'html'19 );20 }21 },22 },23 ];24 const textDecoder = new TextDecoder();2526 // For demo purposes, we'll fetch a local asset HTML file that contains an ESI include.27 const esiIncludeSource = new URL(request.url);28 esiIncludeSource.pathname = '/assets/esi_include.html';2930 // Get the HTML from the origin server and stream the response body through the31 // HtmlTransformer to the Response object32 const response = fetch(esiIncludeSource, {edgio: {origin: 'edgio_self'}})33 // Retrieve its body as ReadableStream34 .then(async (response) => {35 const reader = response.body.getReader();36 return new ReadableStream({37 start(controller) {38 const htmlTransformer = new HtmlTransformer(39 transformerDefinitions,40 (chunk) => {41 controller.enqueue(chunk);42 }43 );44 return pump();45 function pump() {46 return reader.read().then(async ({done, value}) => {47 if (value) {48 await htmlTransformer.write(textDecoder.decode(value));49 }50 // When no more data needs to be consumed, close the stream51 if (done) {52 await htmlTransformer.end();53 controller.close();54 return;55 }56 // Send the html to the HtmlTransformer.57 return pump();58 });59 }60 },61 });62 })63 // Create a new response out of the stream64 .then((stream) => new Response(stream));6566 return response;67}
HtmlTransformer Class
static async stream(transformerDefinitions, response)
Static helper function to easily stream
fetch()
responses through the HtmlTransformer. This is the recommended method to use when transforming HTML responses.transformerDefinitions
is an array of transformer definitions.response
is thefetch()
response object.
JavaScript
1// Transforms response from origin2return fetch(request.url, {edgio: {origin: 'api_backend'}}).then((response) =>3 HtmlTransformer.stream(transformerDefinitions, response)4);
JavaScript
1// Transforms response from origin and optionally manipulates the response headers2return fetch(request.url, {edgio: {origin: 'api_backend'}}).then((response) => {3 let transformedResponse = HtmlTransformer.stream(4 transformerDefinitions,5 response6 );7 // Make any changes to the response headers here.8 transformedResponse.headers.set('x-html-transformer-ran', 'true');9 return transformedResponse;10});
new
Creates a new HtmlTransformer instance. The
transformerDefinitions
is an array of transformer definitions. The callback
is the function (chunk) => { ... }
that receives the transformed HTML data chunks.This usage is not recommended for large responses as it may exceed memory
limitations. Use the
HtmlTransformer.stream()
method to stream the response
body during transformation.JavaScript
1const htmlTransformer = new HtmlTransformer(transformerDefinitions, callback);
async write(string)
Writes the string to the transformer stream. This function can be called multiple times.
JavaScript
1await htmlTransformer.write('<html><body><h1>Hello World</h1>');2await htmlTransformer.write('<a href="https://edg.io/">Edgio Homepage</a>');3await htmlTransformer.write('</body></html>');4await htmlTransformer.end();
async write(Promise<Response>)
Pass the Response’s Promise to the transformer stream. This writes the entire response to the transformer as a stream.
JavaScript
1const responsePromise = fetch(request.url, {edgio: {origin: 'api_backend'}});2await htmlTransformer.write(responsePromise);3await htmlTransformer.end();
async write(Response)
Pass the Response Object to the transformer stream. This writes the entire response to the transformer as a stream.
JavaScript
1const response = await fetch(request.url, {edgio: {origin: 'api_backend'}});2if (response.status == 200) {3 await htmlTransformer.write(response);4 await htmlTransformer.end();5}
async write(readableStream)
Pass the ReadbleStream to the transformer stream. This writes the entire response to the transformer as a stream.
JavaScript
1const response = await fetch(request.url, {edgio: {origin: 'api_backend'}});2if (response.status == 200) {3 if (response.headers.get('content-type') == 'text/html') {4 const reableStream = response.body;5 await htmlTransformer.write(reableStream);6 await htmlTransformer.end();7 }8}
async end()
Flushes the transformer and completes the transformation. This function must be called after the last call to
await htmlTransformer.write()
.JavaScript
1await htmlTransformer.end();
Definitions
The HtmlTransformer definitions are an array of objects that define the transformations performed on the HTML stream. These definition objects can contain one selector and one asynchronous callback:
selector:
- A string defining the HTML selector to match. (See Selectors)comment:
- Theasync (Comment) => { }
function called when an HTML comment is found matching the selector.element:
- Theasync (Element) => { }
function called when an HTML element is found matching the selector.text:
- Theasync (Text) => { }
function called when text is found matching the selector.
These definition objects contain one callback that is not associated with a selector:
doc_comment:
- Theasync (Comment) => { }
function called when an HTML comment is found in the document.doc_text:
- Theasync (Text) => { }
function called when text is found in the document.doc_type:
- Theasync (Doctype) => { }
function called when the HTML document type is found in the document.doc_end:
- Theasync (DocEnd) => { }
function called when the end of the HTML document is reached.
Selectors
The HtmlTransformer supports the following selector types: (ref: lol_html::Selector )
Pattern | Represents |
---|---|
* | any element |
E | any element of type E |
E:nth-child(n) | an E element, the n-th child of its parent |
E:first-child | an E element, first child of its parent |
E:nth-of-type(n) | an E element, the n-th sibling of its type |
E:first-of-type | an E element, first sibling of its type |
E:not(s) | an E element that does not match either compound selector s |
E.warning | an E element belonging to the class warning |
E#myid | an E element with ID equal to myid |
E[foo] | an E element with a foo attribute |
E[foo="bar"] | an E element whose foo attribute value is exactly equal to bar |
E[foo="bar" i] | an E element whose foo attribute value is exactly equal to any (ASCII-range) case-permutation of bar |
E[foo="bar" s] | an E element whose foo attribute value is exactly and case-sensitively equal to bar |
E[foo~="bar"] | an E element whose foo attribute value is a list of whitespace-separated values, one of which is exactly equal to bar |
E[foo^="bar"] | an E element whose foo attribute value begins exactly with the string bar |
E[foo$="bar"] | an E element whose foo attribute value ends exactly with the string bar |
E[foo*="bar"] | an E element whose foo attribute value contains the substring bar |
E[foo|="en"] | an E element whose foo attribute value is a hyphen-separated list of values beginning with en |
E F | an F element descendant of an E element |
E > F | an F element child of an E element |
Use a double backslash to escape special characters within an
E
selector. For example, use the selector esi\\:include[src]
to match <esi:include src="...">
.Classes
Comment Class
The Comment class is passed to the callback function for
comment:
and doc_comment:
definitions.
(ref: lol_html::html_content::Comment)
The Comment class has the following methods:Method | Description |
---|---|
text(): string | Returns the comment text |
set_text(text: string) | Sets the comment text |
before(text: string, content_type: string) | Inserts the text before the comment. Content type is ‘html’ or ‘text’ |
after(text: string, content_type: string) | Inserts the text after the comment. Content type is ‘html’ or ‘text’ |
replace(text: string, content_type: string) | Replaces the comment with the text. Content type is ‘html’ or ‘text’ |
remove() | Removes the entire comment |
removed(): boolean | Returns true if the comment has been removed |
Element Class
The Element class is passed to the callback function for
element:
definitions.
(ref: lol_html::html_content::Element)
The Element class has the following methods:Method | Description |
---|---|
tag_name(): string | Returns the tag name of the element |
tag_name_preserve_case(): string | Returns the tag name of the element, preserving the case of the original tag name |
set_tag_name(name: string) | Sets the tag name of the element. Returns an error if the tag name is invalid. |
is_self_closing(): boolean | Returns true if the element is self closing. E.g. <foo /> |
can_have_content(): boolean | Returns true if the element can have content |
namespace_uri(): string | Returns the namespace URI of the element |
attributes(): [Attributes] | Returns an array of Attribute objects |
get_attribute(name: string): string | Returns the value of the attribute with the specified name |
has_attribute(name: string): boolean | Returns true if the element has an attribute with the specified name |
set_attribute(name: string, value) | Sets the value of the attribute with the specified name. Returns an error if the attribute name is invalid. |
remove_attribute(name: string) | Removes the attribute with the specified name |
before(text: string, content_type: string) | Inserts the text before the element. Content type is ‘html’ or ‘text’ |
after(text: string, content_type: string) | Inserts the text after the element. Content type is ‘html’ or ‘text’ |
prepend(text: string, content_type: string) | Prepends the text to the element. Content type is ‘html’ or ‘text’ |
append(text: string, content_type: string) | Appends the text to the element. Content type is ‘html’ or ‘text’ |
set_inner_content(text: string, content_type: string) | Sets the inner content of the element. Content type is ‘html’ or ‘text’ |
replace(text: string, content_type: string) | Replaces the element with the text. Content type is ‘html’ or ‘text’ |
remove() | Removes the entire element |
remove_and_keep_content() | Removes the element and keeps its content |
removed(): boolean | Returns true if the element has been removed |
start_tag(): StartTag | Returns the StartTag object for the element |
end_tag_handlers() | Not implemented |
Text Class
The Text class is passed to the callback function for
text:
and doc_text:
definitions.
(ref: lol_html::html_content::TextChunk)
The Text class has the following methods:Method | Description |
---|---|
as_str(): string | Returns the text |
set_str(text: string) | Sets the text |
text_type(): string | Returns the text type. |
last_in_text_node(): boolean | Returns true if the chunk is last in a HTML text node. |
before(text: string, content_type: string) | Inserts the text before the text chunk. Content type is ‘html’ or ‘text’ |
after(text: string, content_type: string) | Inserts the text after the text chunk. Content type is ‘html’ or ‘text’ |
replace(text: string, content_type: string) | Replaces the text chunk with the text. Content type is ‘html’ or ‘text’ |
remove() | Removes the entire text chunk |
removed(): boolean | Returns true if the text chunk has been removed |
Doctype Class
The Doctype class is passed to the callback function for
doc_type:
definitions.
(ref: lol_html::html_content::Doctype)
The Doctype class has the following methods:Method | Description |
---|---|
name(): string | Returns the name of the document type |
public_id(): string | Returns the public ID of the document type |
system_id(): string | Returns the system ID of the document type |
remove() | Removes the entire document type |
removed(): boolean | Returns true if the document type has been removed |
DocEnd Class
The DocEnd class is passed to the callback function for
doc_end:
definitions.
(ref: lol_html::html_content::DocumentEnd)
The DocEnd class has the following methods:Method | Description |
---|---|
append(text: string, content_type: string) | Appends the text to the end of the document. Content type is ‘html’ or ‘text’ |
Attribute Class
The Attribute class is returned by the
attributes()
method of the Element class.
(ref: lol_html::html_content::Attribute)
The Attribute class has the following methods:Method | Description |
---|---|
name(): string | Returns the name of the attribute |
name_preserve_case(): string | Returns the name of the attribute, preserving its case. |
value(): string | Returns the value of the attribute |
StartTag Class
The StartTag class is returned by the
start_tag()
method of the Element class.
(ref: lol_html::html_content::StartTag)
The StartTag class has the following methods:Method | Description |
---|---|
name(): string | Returns the name of the tag |
name_preserve_case(): string | Returns the name of the tag, preserving its case. |
namespace_uri(): string | Returns the namespace URI of the tag |
attributes(): [Attributes] | Returns an array of Attribute objects |
set_attribute(name: string, value) | Sets the value of the attribute with the specified name. Returns an error if the attribute name is invalid. |
remove_attribute(name: string) | Removes the attribute with the specified name |
self_closing(): boolean | Returns true if the tag is self closing. E.g. <foo /> |
before(text: string, content_type: string) | Inserts the text before the tag. Content type is ‘html’ or ‘text’ |
after(text: string, content_type: string) | Inserts the text after the tag. Content type is ‘html’ or ‘text’ |
replace(text: string, content_type: string) | Replaces the tag with the text. Content type is ‘html’ or ‘text’ |
remove() | Removes the entire tag |
Types
Text Type
Valid values are:
'PlainText'
- Text inside a<plaintext>
element.'RCData'
- Text inside<title>
, and<textarea>
elements.'RawText'
- Text inside<style>
,<xmp>
,<iframe>
,<noembed>
,<noframes>
, and<noscript>
elements.'ScriptData'
- Text inside a<script>
element.'Data'
- Regular text.'CDataSection'
- Text inside aCDATA
section.
Content Types
The HtmlTransformer supports the following content types: (ref: lol_html::html_content::ContentType)
'html'
- HTML content. The transformer will not escape HTML entities.'text'
- Plain text content. The transformer will escape HTML entities. E.g.<
will be converted to<
.