Edgio

HtmlTransformer

The HtmlTransformer class is a powerful and efficient edge function helper designed to modify HTML responses from the origin server. It serves as a wrapper for the lol_html::HtmlRewriter Class Rust crate, ensuring seamless compatibility and supporting streaming HTML response bodies.
This class provides a simple and intuitive API for transforming HTML responses. It allows you to define transformations based on HTML selectors, such as elements, comments, and text, and apply these transformations to the HTML response as it streams in from the origin server. This approach is more efficient and avoids memory limitations, making it ideal for processing large HTML responses.
To use the HtmlTransformer, you’ll first need to define the transformations you want to apply to the HTML response. The following sample code demonstrates how to create an instance of the HtmlTransformer class and define the transformations:
JavaScript
1const transformerDefinitions = [
2 {
3 selector: 'a[href]',
4 element: async (el) => {
5 const href = el.get_attribute('href');
6 el.set_attribute('href', href.replace('http://', 'https://'));
7 },
8 },
9 {
10 selector: 'body',
11 comment: async (c) => {
12 c.remove();
13 },
14 },
15 {
16 doc_end: async (d) => {
17 d.append(
18 `<!-- Transformed at ${new Date().toISOString()} by Edg.io -->`,
19 'html'
20 );
21 },
22 },
23];
There are two types of rewriter callback definitions that can be passed into the HtmlTransformer:
  • Transformations that match an HTML selector and trigger a callback function when the selector is found.
    • comment: Operates on any HTML comment matching the specified selector.
    • element: Operates on the HTML element matching the specified selector and the element’s attributes.
    • text: Operates on any text matching the specified selector.
  • Transformations that trigger the callback function when the HTML document is found. This type of transformation does not require an HTML selector.
    • doc_comment: Operates on any HTML comment in the document.
    • doc_text: Operates on any text in the document.
    • doc_type: Provides read-only information on the HTML document type.
    • doc_end: Triggered when the end of the HTML document is reached.
Learn more about Definitions and Selectors.

Quick Start Examples

Example 1: Basic Usage

Here is an edge function that uses HtmlTransform to:
  • Ensure all <a href="..."> links are HTTPS.
  • Remove all HTML comments.
  • Append a comment for the transformation timestamp to the end of the document.
JavaScript
1export async function handleHttpRequest(request, context) {
2 const transformerDefinitions = [
3 // This first definition replaces http with https in the <a href=...> elements
4 {
5 // This selector will match all <a> elements which have an href attribute
6 selector: 'a[href]',
7 element: async (el) => {
8 const href = el.get_attribute('href');
9 el.set_attribute('href', href.replace('http://', 'https://'));
10 },
11 },
12
13 // This second definition removes all comments from the <body>
14 {
15 selector: 'body',
16 comment: async (c) => {
17 // remove the comment
18 c.remove();
19 },
20 },
21
22 // This third definition appends a timestamp to the end of the HTML document
23 {
24 // Since this operates on the document, there is no need to specify a selector
25 doc_end: async (d) => {
26 // Use the append() method to append the timestamp to the end of the document
27 // Specify 'html' as the second arguemnt to indicate the content is HTML, not plain text
28 d.append(
29 `<!-- Transformed at ${new Date().toISOString()} by Edg.io -->`,
30 'html'
31 );
32 },
33 },
34 ];
35
36 // Get the HTML from the origin server and stream the response body through the HtmlTransformer
37 return fetch(request.url, {edgio: {origin: 'api_backend'}}).then(
38 (response) => {
39 let transformedResponse = HtmlTransformer.stream(
40 transformerDefinitions,
41 response
42 );
43 // Make any changes to the response headers here.
44 transformedResponse.headers.set('x-html-transformer-ran', 'true');
45 return transformedResponse;
46 }
47 );
48}
We will now examine how this edge function will transform the following HTML response provided by an origin server:
HTML
1<!DOCTYPE html>
2<html>
3<head><title>Script Example</title></head>
4<body>
5 <h1>Script Example</h1>
6 <p>Script example.
7 <!-- This is a <p> comment -->
8 </p>
9 <a href="http://edg.io/">Edgio Homepage</a>
10 <!-- This is a <body> comment -->
11</body>
12</html>
This edge function transforms the above HTML to ensure HTTPS links, remove HTML comments, and append a transformation timestamp. The transformed HTML is shown below.
HTML
1<!DOCTYPE html>
2<html>
3 <head>
4 <title>Script Example</title>
5 </head>
6 <body>
7 <h1>Script Example</h1>
8 <p>Script example.</p>
9 <a href="https://edg.io/">Edgio Homepage</a>
10 </body>
11</html>
12<!-- Transformed at 2023-11-29T00:52:09.942Z by Edg.io -->

Example 2: Edge Side Includes with fetch()

This sample edge function uses HtmlTransform to replace all <esi:include src="..."/> elements with the content of the specified URL.
JavaScript./edge-functions/esi_example.js
1export async function handleHttpRequest(request, context) {
2 // This definition replaces <esi:include src="..." /> the response from the src
3 const transformerDefinitions = [
4 {
5 // This selector will match all <esi:include /> elements which have a src attribute.
6 // We escape the : character with 2 backslashes to indicate it is part of the tag name
7 // and not an attribute of the selector.
8 selector: 'esi\\:include[src]',
9 element: async (el) => {
10 const url = el.get_attribute('src');
11 const response = await fetch(url, {edgio: {origin: 'edgio_self'}});
12 if (response.status == 200) {
13 const body = await response.text();
14 el.replace(body, 'html');
15 } else {
16 el.replace(
17 '<a>We encountered an error, please try again later.</a>',
18 'html'
19 );
20 }
21 },
22 },
23 ];
24
25 // For demo purposes, we'll fetch a local asset HTML file that contains an ESI include.
26 const esiIncludeSource = new URL(request.url);
27 esiIncludeSource.pathname = '/assets/esi_include.html';
28
29 // Get the HTML from the origin server and stream the response body through the HtmlTransformer
30 return fetch(esiIncludeSource, {edgio: {origin: 'edgio_self'}}).then(
31 (response) => {
32 let transformedResponse = HtmlTransformer.stream(
33 transformerDefinitions,
34 response
35 );
36 // Make any changes to the response headers here.
37 transformedResponse.headers.set('x-html-transformer-ran', 'true');
38 return transformedResponse;
39 }
40 );
41}
We will now examine how this edge function will transform the following HTML response provided by an origin server:
HTML./assets/esi_include.html
1<!DOCTYPE html>
2<html>
3 <head>
4 <title>Script Example</title>
5 </head>
6 <body>
7 <h1>Script Example</h1>
8 <esi:include src="/assets/esi_snippet.html" />
9 </body>
10</html>
Our ESI snippet file /assets/esi_snippet.html contains the following HTML:
HTML./assets/esi_snippet.html
1<div>
2 <h1>Hello, World!</h1>
3 <p>This snippet will be included in the response via ESI.</p>
4</div>
This edge function transforms the above HTML to replace <esi:include ... /> with the results of the fetch. The transformed HTML is shown below.
HTML
1<!DOCTYPE html>
2<html>
3 <head>
4 <title>Script Example</title>
5 </head>
6 <body>
7 <h1>Script Example</h1>
8 <div>
9 <h1>Hello, World!</h1>
10 <p>This snippet will be included in the response via ESI.</p>
11 </div>
12 </body>
13</html>

Example 3: Using fetch() with response streaming

This example is a modified version of Example 2 without the HtmlTransformer.stream() helper function.
JavaScript./edge-functions/esi_response_stream_example.js
1export async function handleHttpRequest(request, context) {
2 // This definition replaces <esi:include src="..." /> the response from the src
3 const transformerDefinitions = [
4 {
5 // This selector will match all <esi:include /> elements which have a src attribute.
6 // We escape the : character with 2 backslashes to indicate it is part of the tag name
7 // and not an attribute of the selector.
8 selector: 'esi\\:include[src]',
9 element: async (el) => {
10 const url = el.get_attribute('src');
11 const response = await fetch(url, {edgio: {origin: 'edgio_self'}});
12 if (response.status == 200) {
13 const body = await response.text();
14 el.replace(body, 'html');
15 } else {
16 el.replace(
17 '<a>We encountered an error, please try again later.</a>',
18 'html'
19 );
20 }
21 },
22 },
23 ];
24 const textDecoder = new TextDecoder();
25
26 // For demo purposes, we'll fetch a local asset HTML file that contains an ESI include.
27 const esiIncludeSource = new URL(request.url);
28 esiIncludeSource.pathname = '/assets/esi_include.html';
29
30 // Get the HTML from the origin server and stream the response body through the
31 // HtmlTransformer to the Response object
32 const response = fetch(esiIncludeSource, {edgio: {origin: 'edgio_self'}})
33 // Retrieve its body as ReadableStream
34 .then(async (response) => {
35 const reader = response.body.getReader();
36 return new ReadableStream({
37 start(controller) {
38 const htmlTransformer = new HtmlTransformer(
39 transformerDefinitions,
40 (chunk) => {
41 controller.enqueue(chunk);
42 }
43 );
44 return pump();
45 function pump() {
46 return reader.read().then(async ({done, value}) => {
47 if (value) {
48 await htmlTransformer.write(textDecoder.decode(value));
49 }
50 // When no more data needs to be consumed, close the stream
51 if (done) {
52 await htmlTransformer.end();
53 controller.close();
54 return;
55 }
56 // Send the html to the HtmlTransformer.
57 return pump();
58 });
59 }
60 },
61 });
62 })
63 // Create a new response out of the stream
64 .then((stream) => new Response(stream));
65
66 return response;
67}

HtmlTransformer Class

static async stream(transformerDefinitions, response)

Static helper function to easily stream fetch() responses through the HtmlTransformer. This is the recommended method to use when transforming HTML responses.
JavaScript
1// Transforms response from origin
2return fetch(request.url, {edgio: {origin: 'api_backend'}}).then((response) =>
3 HtmlTransformer.stream(transformerDefinitions, response)
4);
JavaScript
1// Transforms response from origin and optionally manipulates the response headers
2return fetch(request.url, {edgio: {origin: 'api_backend'}}).then((response) => {
3 let transformedResponse = HtmlTransformer.stream(
4 transformerDefinitions,
5 response
6 );
7 // Make any changes to the response headers here.
8 transformedResponse.headers.set('x-html-transformer-ran', 'true');
9 return transformedResponse;
10});

new

Creates a new HtmlTransformer instance. The transformerDefinitions is an array of transformer definitions. The callback is the function (chunk) => { ... } that receives the transformed HTML data chunks.
This usage is not recommended for large responses as it may exceed memory limitations. Use the HtmlTransformer.stream() method to stream the response body during transformation.
JavaScript
1const htmlTransformer = new HtmlTransformer(transformerDefinitions, callback);

async write(string)

Writes the string to the transformer stream. This function can be called multiple times.
JavaScript
1await htmlTransformer.write('<html><body><h1>Hello World</h1>');
2await htmlTransformer.write('<a href="https://edg.io/">Edgio Homepage</a>');
3await htmlTransformer.write('</body></html>');
4await htmlTransformer.end();

async write(Promise<Response>)

Pass the Response’s Promise to the transformer stream. This writes the entire response to the transformer as a stream.
JavaScript
1const responsePromise = fetch(request.url, {edgio: {origin: 'api_backend'}});
2await htmlTransformer.write(responsePromise);
3await htmlTransformer.end();

async write(Response)

Pass the Response Object to the transformer stream. This writes the entire response to the transformer as a stream.
JavaScript
1const response = await fetch(request.url, {edgio: {origin: 'api_backend'}});
2if (response.status == 200) {
3 await htmlTransformer.write(response);
4 await htmlTransformer.end();
5}

async write(readableStream)

Pass the ReadbleStream to the transformer stream. This writes the entire response to the transformer as a stream.
JavaScript
1const response = await fetch(request.url, {edgio: {origin: 'api_backend'}});
2if (response.status == 200) {
3 if (response.headers.get('content-type') == 'text/html') {
4 const reableStream = response.body;
5 await htmlTransformer.write(reableStream);
6 await htmlTransformer.end();
7 }
8}

async end()

Flushes the transformer and completes the transformation. This function must be called after the last call to await htmlTransformer.write().
JavaScript
1await htmlTransformer.end();

Definitions

The HtmlTransformer definitions are an array of objects that define the transformations performed on the HTML stream. These definition objects can contain one selector and one asynchronous callback:
  • selector: - A string defining the HTML selector to match. (See Selectors)
  • comment: - The async (Comment) => { } function called when an HTML comment is found matching the selector.
  • element: - The async (Element) => { } function called when an HTML element is found matching the selector.
  • text: - The async (Text) => { } function called when text is found matching the selector.
These definition objects contain one callback that is not associated with a selector:
  • doc_comment: - The async (Comment) => { } function called when an HTML comment is found in the document.
  • doc_text: - The async (Text) => { } function called when text is found in the document.
  • doc_type: - The async (Doctype) => { } function called when the HTML document type is found in the document.
  • doc_end:- The async (DocEnd) => { } function called when the end of the HTML document is reached.

Selectors

The HtmlTransformer supports the following selector types: (ref: lol_html::Selector )
PatternRepresents
*any element
Eany element of type E
E:nth-child(n)an E element, the n-th child of its parent
E:first-childan E element, first child of its parent
E:nth-of-type(n)an E element, the n-th sibling of its type
E:first-of-typean E element, first sibling of its type
E:not(s)an E element that does not match either compound selector s
E.warningan E element belonging to the class warning
E#myidan E element with ID equal to myid
E[foo]an E element with a foo attribute
E[foo="bar"]an E element whose foo attribute value is exactly equal to bar
E[foo="bar" i]an E element whose foo attribute value is exactly equal to any (ASCII-range) case-permutation of bar
E[foo="bar" s]an E element whose foo attribute value is exactly and case-sensitively equal to bar
E[foo~="bar"]an E element whose foo attribute value is a list of whitespace-separated values, one of which is exactly equal to bar
E[foo^="bar"]an E element whose foo attribute value begins exactly with the string bar
E[foo$="bar"]an E element whose foo attribute value ends exactly with the string bar
E[foo*="bar"]an E element whose foo attribute value contains the substring bar
E[foo|="en"]an E element whose foo attribute value is a hyphen-separated list of values beginning with en
E Fan F element descendant of an E element
E > Fan F element child of an E element
Use a double backslash to escape special characters within an E selector. For example, use the selector esi\\:include[src] to match <esi:include src="...">.

Classes

Comment Class

The Comment class is passed to the callback function for comment: and doc_comment: definitions. (ref: lol_html::html_content::Comment) The Comment class has the following methods:
MethodDescription
text(): stringReturns the comment text
set_text(text: string)Sets the comment text
before(text: string, content_type: string)Inserts the text before the comment. Content type is ‘html’ or ‘text’
after(text: string, content_type: string)Inserts the text after the comment. Content type is ‘html’ or ‘text’
replace(text: string, content_type: string)Replaces the comment with the text. Content type is ‘html’ or ‘text’
remove()Removes the entire comment
removed(): booleanReturns true if the comment has been removed

Element Class

The Element class is passed to the callback function for element: definitions. (ref: lol_html::html_content::Element) The Element class has the following methods:
MethodDescription
tag_name(): stringReturns the tag name of the element
tag_name_preserve_case(): stringReturns the tag name of the element, preserving the case of the original tag name
set_tag_name(name: string)Sets the tag name of the element. Returns an error if the tag name is invalid.
is_self_closing(): booleanReturns true if the element is self closing. E.g. <foo />
can_have_content(): booleanReturns true if the element can have content
namespace_uri(): stringReturns the namespace URI of the element
attributes(): [Attributes]Returns an array of Attribute objects
get_attribute(name: string): stringReturns the value of the attribute with the specified name
has_attribute(name: string): booleanReturns true if the element has an attribute with the specified name
set_attribute(name: string, value)Sets the value of the attribute with the specified name. Returns an error if the attribute name is invalid.
remove_attribute(name: string)Removes the attribute with the specified name
before(text: string, content_type: string)Inserts the text before the element. Content type is ‘html’ or ‘text’
after(text: string, content_type: string)Inserts the text after the element. Content type is ‘html’ or ‘text’
prepend(text: string, content_type: string)Prepends the text to the element. Content type is ‘html’ or ‘text’
append(text: string, content_type: string)Appends the text to the element. Content type is ‘html’ or ‘text’
set_inner_content(text: string, content_type: string)Sets the inner content of the element. Content type is ‘html’ or ‘text’
replace(text: string, content_type: string)Replaces the element with the text. Content type is ‘html’ or ‘text’
remove()Removes the entire element
remove_and_keep_content()Removes the element and keeps its content
removed(): booleanReturns true if the element has been removed
start_tag(): StartTagReturns the StartTag object for the element
end_tag_handlers()Not implemented

Text Class

The Text class is passed to the callback function for text: and doc_text: definitions. (ref: lol_html::html_content::TextChunk) The Text class has the following methods:
MethodDescription
as_str(): stringReturns the text
set_str(text: string)Sets the text
text_type(): stringReturns the text type.
last_in_text_node(): booleanReturns true if the chunk is last in a HTML text node.
before(text: string, content_type: string)Inserts the text before the text chunk. Content type is ‘html’ or ‘text’
after(text: string, content_type: string)Inserts the text after the text chunk. Content type is ‘html’ or ‘text’
replace(text: string, content_type: string)Replaces the text chunk with the text. Content type is ‘html’ or ‘text’
remove()Removes the entire text chunk
removed(): booleanReturns true if the text chunk has been removed

Doctype Class

The Doctype class is passed to the callback function for doc_type: definitions. (ref: lol_html::html_content::Doctype) The Doctype class has the following methods:
MethodDescription
name(): stringReturns the name of the document type
public_id(): stringReturns the public ID of the document type
system_id(): stringReturns the system ID of the document type
remove()Removes the entire document type
removed(): booleanReturns true if the document type has been removed

DocEnd Class

The DocEnd class is passed to the callback function for doc_end: definitions. (ref: lol_html::html_content::DocumentEnd) The DocEnd class has the following methods:
MethodDescription
append(text: string, content_type: string)Appends the text to the end of the document. Content type is ‘html’ or ‘text’

Attribute Class

The Attribute class is returned by the attributes() method of the Element class. (ref: lol_html::html_content::Attribute) The Attribute class has the following methods:
MethodDescription
name(): stringReturns the name of the attribute
name_preserve_case(): stringReturns the name of the attribute, preserving its case.
value(): stringReturns the value of the attribute

StartTag Class

The StartTag class is returned by the start_tag() method of the Element class. (ref: lol_html::html_content::StartTag) The StartTag class has the following methods:
MethodDescription
name(): stringReturns the name of the tag
name_preserve_case(): stringReturns the name of the tag, preserving its case.
namespace_uri(): stringReturns the namespace URI of the tag
attributes(): [Attributes]Returns an array of Attribute objects
set_attribute(name: string, value)Sets the value of the attribute with the specified name. Returns an error if the attribute name is invalid.
remove_attribute(name: string)Removes the attribute with the specified name
self_closing(): booleanReturns true if the tag is self closing. E.g. <foo />
before(text: string, content_type: string)Inserts the text before the tag. Content type is ‘html’ or ‘text’
after(text: string, content_type: string)Inserts the text after the tag. Content type is ‘html’ or ‘text’
replace(text: string, content_type: string)Replaces the tag with the text. Content type is ‘html’ or ‘text’
remove()Removes the entire tag

Types

Text Type

(ref: lol_html::html_content::TextType)
Valid values are:
  • 'PlainText' - Text inside a <plaintext> element.
  • 'RCData' - Text inside <title>, and <textarea> elements.
  • 'RawText' - Text inside <style>, <xmp>, <iframe>, <noembed>, <noframes>, and <noscript> elements.
  • 'ScriptData' - Text inside a <script> element.
  • 'Data' - Regular text.
  • 'CDataSection' - Text inside a CDATA section.

Content Types

The HtmlTransformer supports the following content types: (ref: lol_html::html_content::ContentType)
  • 'html' - HTML content. The transformer will not escape HTML entities.
  • 'text' - Plain text content. The transformer will escape HTML entities. E.g. < will be converted to &lt;.