Introducing Edgio Applications v7Find out what's new.
Edgio
Edgio

HtmlTransformer

The HtmlTransformer class is a compact and efficient edge function helper to assist in modifying the HTML response from the origin server. It serves as a wrapper for the lol_html::HtmlRewriter Class Rust crate. This class supports streaming HTML response bodies, ensuring seamless compatibility.
To use the HtmlTransformer, create an instance of the class with new HtmlTransformer(definitions, callback), passing in the rewriter definitions and the callback function for the streaming output. Arriving HTML response data is passed into the HtmlTransformer via the await htmlTransformer.write(data) method. The write() function can be called multiple times. Once the HTML response is complete the await htmlTransformer.end() method must be called to complete the transformation and flush the streaming data.
There are two types of rewriter callback definitions that can be passed into the HtmlTransformer:
  • Transformations that match an HTML selector and trigger a callback function when the selector is found.
    • comment: Operates on any HTML comment matching the specified selector.
    • element: Operates on the HTML element matching the specified selector and the element’s attributes.
    • text: Operates on any text matching the specified selector.
  • Transformations that trigger the callback function when the HTML document is found. This type of transformation does not require an HTML selector.
    • doc_comment: Operates on any HTML comment in the document.
    • doc_text: Operates on any text in the document.
    • doc_type: Provides read-only information on the HTML document type.
    • doc_end: Triggered when the end of the HTML document is reached.

Quick Start Examples

Example 1: Basics Usage

Here is an edge function that uses HtmlTransform to:
  • Ensure all <a href="..."> links are HTTPS.
  • Remove all HTML comments.
  • Append a comment for the transformation timestamp to the end of the document.
JavaScript
1export async function handleHttpRequest(request, context) {
2 const transformerDefinitions = [
3 // This first definition replaces http with https in the <a href=...> elements
4 {
5 // This selector will match all <a> elements which have an href attribute
6 selector: 'a[href]',
7 element: async (el) => {
8 const href = el.get_attribute('href');
9 el.set_attribute('href', href.replace('http://', 'https://'));
10 },
11 },
12
13 // This second definition removes all comments from the <body>
14 {
15 selector: 'body',
16 comment: async (c) => {
17 // remove the comment
18 c.remove();
19 },
20 },
21
22 // This third definition appends a timestamp to the end of the HTML document
23 {
24 // Since this operates on the document, there is no need to specify a selector
25 doc_end: async (d) => {
26 // Use the append() method to append the timestamp to the end of the document
27 // Specify 'html' as the second arguemnt to indicate the content is HTML, not plain text
28 d.append(
29 `<!-- Transformed at ${new Date().toISOString()} by Edg.io -->`,
30 'html'
31 );
32 },
33 },
34 ];
35
36 // Define the callback function to accept the transformed HTML stream
37 let responseBody = '';
38 const streamingCallback = (chunk) => {
39 responseBody += chunk;
40 };
41
42 // Create a new HTML transformer, passing in the definitions and streaming callback
43 let htmlTransformer = new HtmlTransformer(
44 transformerDefinitions,
45 streamingCallback
46 );
47
48 // Get the HTML from the origin server
49 const response = await fetch(request.url, {edgio: {origin: 'api_backend'}});
50
51 // Pass the HTML response to the transformer.
52 await htmlTransformer.write(response);
53
54 // Flush the HTML transformer
55 await htmlTransformer.end();
56
57 // Return the transformed HTML body
58 return new Response(responseBody);
59}
We will now examine how this edge function will transform the following HTML response provided by an origin server:
HTML
1<!DOCTYPE html>
2<html>
3<head><title>Script Example</title></head>
4<body>
5 <h1>Script Example</h1>
6 <p>Script example.
7 <!-- This is a <p> comment -->
8 </p>
9 <a href="http://edg.io/">Edgio Homepage</a>
10 <!-- This is a <body> comment -->
11</body>
12</html>
This edge function transforms the above HTML to ensure HTTPS links, remove HTML comments, and append a transformation timestamp. The transformed HTML is shown below.
HTML
1<!DOCTYPE html>
2<html>
3 <head>
4 <title>Script Example</title>
5 </head>
6 <body>
7 <h1>Script Example</h1>
8 <p>Script example.</p>
9 <a href="https://edg.io/">Edgio Homepage</a>
10 </body>
11</html>
12<!-- Transformed at 2023-11-29T00:52:09.942Z by Edg.io -->

Example 2: Using fetch()

This sample edge function uses HtmlTransform to replace all <esi:include src="..."/> elements with the content of the specified URL.
JavaScript
1export async function handleHttpRequest(request, context) {
2 // This definition replaces <esi:include src="..." /> the response from the src
3 const transformerDefinitions = [
4 {
5 // This selector will match all <esi:include /> elements which have a src attribute.
6 // We escape the : character with 2 backslashes to indicate it is part of the tag name
7 // and not an attribute of the selector.
8 selector: 'esi\\:include[src]',
9 element: async (el) => {
10 const url = el.get_attribute('src');
11 const response = await fetch(url, {edgio: {origin: 'api_backend'}});
12 if (response.status == 200) {
13 const body = await response.text();
14 el.replace(body, 'html');
15 } else {
16 el.replace(
17 '<a>We encountered an error, please try again later.</a>',
18 'html'
19 );
20 }
21 },
22 },
23 ];
24
25 // Define the callback function to accept the transformed HTML stream
26 let responseBody = '';
27 const streamingCallback = (chunk) => {
28 responseBody += chunk;
29 };
30
31 // Create a new HTML transformer, passing in the definitions and streaming callback
32 let htmlTransformer = new HtmlTransformer(
33 transformerDefinitions,
34 streamingCallback
35 );
36
37 // Get the HTML from the origin server
38 const response = await fetch(request.url, {edgio: {origin: 'api_backend'}});
39
40 // Pass the HTML response to the transformer.
41 await htmlTransformer.write(response);
42
43 // Flush the HTML transformer
44 await htmlTransformer.end();
45
46 // Return the transformed HTML body
47 return new Response(responseBody);
48}
We will now examine how this edge function will transform the following HTML response provided by an origin server:
HTML
1<!DOCTYPE html>
2<html>
3 <head>
4 <title>Script Example</title>
5 </head>
6 <body>
7 <h1>Script Example</h1>
8 <esi:include src="https://api.backend.com/body" />
9 </body>
10</html>
This edge function transforms the above HTML to replace <esi:include ... /> with the results of the fetch. The transformed HTML is shown below.
HTML
1<!DOCTYPE html>
2<html>
3 <head>
4 <title>Script Example</title>
5 </head>
6 <body>
7 <h1>Script Example</h1>
8 <div>
9 <p>Here is some HTML text with a link returned by the fetch().</p>
10 <a href="https://edg.io/">Edgio Homepage</a>
11 </div>
12 </body>
13</html>

Example 3: Using fetch() with HtmlTransformer.stream() helper function

This example is a modified version of Example 2 which incorporates streaming the response body. This technique saves memory by streaming the response body during transformation rather than reading the whole body into memory before transforming it.
JavaScript
1export async function handleHttpRequest(request, context) {
2 // This definition replaces <esi:include src="..." /> the response from the src
3 const transformerDefinitions = [
4 {
5 // This selector will match all <esi:include /> elements which have a src attribute.
6 // We escape the : character with 2 backslashes to indicate it is part of the tag name
7 // and not an attribute of the selector.
8 selector: 'esi\\:include[src]',
9 element: async (el) => {
10 const url = el.get_attribute('src');
11 const response = await fetch(url, {edgio: {origin: 'api_backend'}});
12 if (response.status == 200) {
13 const body = await response.text();
14 el.replace(body, 'html');
15 } else {
16 el.replace(
17 '<a>We encountered an error, please try again later.</a>',
18 'html'
19 );
20 }
21 },
22 },
23 ];
24
25 // Get the HTML from the origin server and stream the response body through the
26 // HtmlTransformer to the Response object
27 const response = fetch(request.url, {edgio: {origin: 'api_backend'}})
28 .then(HtmlTransformer.stream(transformerDefinitions))
29 .then((stream) => new Response(stream))
30 return response
31}

Example 4: Using fetch() with response streaming

This example is a modified version of Example 3 without the HtmlTransformer.stream() helper function
JavaScript
1export async function handleHttpRequest(request, context) {
2 // This definition replaces <esi:include src="..." /> the response from the src
3 const transformerDefinitions = [
4 {
5 // This selector will match all <esi:include /> elements which have a src attribute.
6 // We escape the : character with 2 backslashes to indicate it is part of the tag name
7 // and not an attribute of the selector.
8 selector: 'esi\\:include[src]',
9 element: async (el) => {
10 const url = el.get_attribute('src');
11 const response = await fetch(url, {edgio: {origin: 'api_backend'}});
12 if (response.status == 200) {
13 const body = await response.text();
14 el.replace(body, 'html');
15 } else {
16 el.replace(
17 '<a>We encountered an error, please try again later.</a>',
18 'html'
19 );
20 }
21 },
22 },
23 ];
24 const textDecoder = new TextDecoder()
25
26 // Get the HTML from the origin server and stream the response body through the
27 // HtmlTransformer to the Response object
28 const response = fetch(request.url, {edgio: {origin: 'api_backend'}})
29 // Retrieve its body as ReadableStream
30 .then(async (response) => {
31 const reader = response.body.getReader()
32 return new ReadableStream({
33 start(controller) {
34 const htmlTransformer = new HtmlTransformer(definitions, (chunk) => {
35 controller.enqueue(chunk)
36 })
37 return pump()
38 function pump() {
39 return reader.read().then(async ({ done, value }) => {
40 if (value) {
41 await htmlTransformer.write(textDecoder.decode(value))
42 }
43 // When no more data needs to be consumed, close the stream
44 if (done) {
45 await htmlTransformer.end()
46 controller.close()
47 return
48 }
49 // Send the html to the HtmlTransformer.
50 return pump()
51 })
52 }
53 },
54 })
55 })
56 // Create a new response out of the stream
57 .then((stream) => new Response(stream))
58
59 return response
60}

HtmlTransformer Class

new

JavaScript
1const htmlTransformer = new HtmlTransformer(transformerDefinitions, callback);
Creates a new HtmlTransformer instance. The transformerDefinitions is an array of transformer definitions. The callback is the function (chunk) => { ... } that receives the transformed HTML data chunks.

async write(string)

JavaScript
1await htmlTransformer.write('<html><body><h1>Hello World</h1>');
2await htmlTransformer.write('<a href="https://edg.io/">Edgio Homepage</a>');
3await htmlTransformer.write('</body></html>');
4await htmlTransformer.end();
Writes the string to the transformer stream. This function can be called multiple times.

async write(Promise<Response>)

JavaScript
1const responsePromise = fetch(request.url, {edgio: {origin: 'api_backend'}});
2await htmlTransformer.write(responsePromise);
3await htmlTransformer.end();
Pass the Response’s Promise to the transformer stream. This writes the entire response to the transformer as a stream.

async write(Response)

JavaScript
1const response = await fetch(request.url, {edgio: {origin: 'api_backend'}});
2if (response.status == 200) {
3 await htmlTransformer.write(response);
4 await htmlTransformer.end();
5}
Pass the Response Object to the transformer stream. This writes the entire response to the transformer as a stream.

async write(readableStream)

JavaScript
1const response = await fetch(request.url, {edgio: {origin: 'api_backend'}});
2if (response.status == 200) {
3 if (response.headers.get('content-type') == 'text/html') {
4 const reableStream = response.body;
5 await htmlTransformer.write(reableStream);
6 await htmlTransformer.end();
7 }
8}
Pass the ReadbleStream to the transformer stream. This writes the entire response to the transformer as a stream.

async end()

JavaScript
1await htmlTransformer.end();
Flushes the transformer and completes the transformation. This function must be called after the last call to await htmlTransformer.write().

static async stream()

JavaScript
1const response = fetch(request.url, {edgio: {origin: 'api_backend'}})
2 .then(HtmlTransformer.stream(transformerDefinitions))
3 .then((stream) => new Response(stream))
4 return response
Static helper function to easily stream fetch() responses through the HtmlTransformer.

Definitions

The HtmlTransformer definitions are an array of objects that define the transformations performed on the HTML stream. These definition objects can contain one selector and one asynchronous callback:
  • selector: - A string defining the HTML selector to match. (See Selectors)
  • comment: - The async (Comment) => { } function called when an HTML comment is found matching the selector.
  • element: - The async (Element) => { } function called when an HTML element is found matching the selector.
  • text: - The async (Text) => { } function called when text is found matching the selector.
These definition objects contain one callback that is not associated with a selector:
  • doc_comment: - The async (Comment) => { } function called when an HTML comment is found in the document.
  • doc_text: - The async (Text) => { } function called when text is found in the document.
  • doc_type: - The async (Doctype) => { } function called when the HTML document type is found in the document.
  • doc_end:- The async (DocEnd) => { } function called when the end of the HTML document is reached.

Selectors

The HtmlTransformer supports the following selector types: (ref: lol_html::Selector )
PatternRepresents
*any element
Eany element of type E
E:nth-child(n)an E element, the n-th child of its parent
E:first-childan E element, first child of its parent
E:nth-of-type(n)an E element, the n-th sibling of its type
E:first-of-typean E element, first sibling of its type
E:not(s)an E element that does not match either compound selector s
E.warningan E element belonging to the class warning
E#myidan E element with ID equal to myid
E[foo]an E element with a foo attribute
E[foo="bar"]an E element whose foo attribute value is exactly equal to bar
E[foo="bar" i]an E element whose foo attribute value is exactly equal to any (ASCII-range) case-permutation of bar
E[foo="bar" s]an E element whose foo attribute value is exactly and case-sensitively equal to bar
E[foo~="bar"]an E element whose foo attribute value is a list of whitespace-separated values, one of which is exactly equal to bar
E[foo^="bar"]an E element whose foo attribute value begins exactly with the string bar
E[foo$="bar"]an E element whose foo attribute value ends exactly with the string bar
E[foo*="bar"]an E element whose foo attribute value contains the substring bar
E[foo|="en"]an E element whose foo attribute value is a hyphen-separated list of values beginning with en
E Fan F element descendant of an E element
E > Fan F element child of an E element
Use a double backslash to escape special characters within an E selector. For example, use the selector esi\\:include[src] to match <esi:include src="...">.

Classes

Comment Class

The Comment class is passed to the callback function for comment: and doc_comment: definitions. (ref: lol_html::html_content::Comment) The Comment class has the following methods:
MethodDescription
text(): stringReturns the comment text
set_text(text: string)Sets the comment text
before(text: string, content_type: string)Inserts the text before the comment. Content type is ‘html’ or ‘text’
after(text: string, content_type: string)Inserts the text after the comment. Content type is ‘html’ or ‘text’
replace(text: string, content_type: string)Replaces the comment with the text. Content type is ‘html’ or ‘text’
remove()Removes the entire comment
removed(): booleanReturns true if the comment has been removed

Element Class

The Element class is passed to the callback function for element: definitions. (ref: lol_html::html_content::Element) The Element class has the following methods:
MethodDescription
tag_name(): stringReturns the tag name of the element
tag_name_preserve_case(): stringReturns the tag name of the element, preserving the case of the original tag name
set_tag_name(name: string)Sets the tag name of the element. Returns an error if the tag name is invalid.
is_self_closing(): booleanReturns true if the element is self closing. E.g. <foo />
can_have_content(): booleanReturns true if the element can have content
namespace_uri(): stringReturns the namespace URI of the element
attributes(): [Attributes]Returns an array of Attribute objects
get_attribute(name: string): stringReturns the value of the attribute with the specified name
has_attribute(name: string): booleanReturns true if the element has an attribute with the specified name
set_attribute(name: string, value)Sets the value of the attribute with the specified name. Returns an error if the attribute name is invalid.
remove_attribute(name: string)Removes the attribute with the specified name
before(text: string, content_type: string)Inserts the text before the element. Content type is ‘html’ or ‘text’
after(text: string, content_type: string)Inserts the text after the element. Content type is ‘html’ or ‘text’
prepend(text: string, content_type: string)Prepends the text to the element. Content type is ‘html’ or ‘text’
append(text: string, content_type: string)Appends the text to the element. Content type is ‘html’ or ‘text’
set_inner_content(text: string, content_type: string)Sets the inner content of the element. Content type is ‘html’ or ‘text’
replace(text: string, content_type: string)Replaces the element with the text. Content type is ‘html’ or ‘text’
remove()Removes the entire element
remove_and_keep_content()Removes the element and keeps its content
removed(): booleanReturns true if the element has been removed
start_tag(): StartTagReturns the StartTag object for the element
end_tag_handlers()Not implemented

Text Class

The Text class is passed to the callback function for text: and doc_text: definitions. (ref: lol_html::html_content::TextChunk) The Text class has the following methods:
MethodDescription
as_str(): stringReturns the text
set_str(text: string)Sets the text
text_type(): stringReturns the text type.
last_text_in_node(): booleanReturns true if the chunk is last in a HTML text node.
before(text: string, content_type: string)Inserts the text before the text chunk. Content type is ‘html’ or ‘text’
after(text: string, content_type: string)Inserts the text after the text chunk. Content type is ‘html’ or ‘text’
replace(text: string, content_type: string)Replaces the text chunk with the text. Content type is ‘html’ or ‘text’
remove()Removes the entire text chunk
removed(): booleanReturns true if the text chunk has been removed

Doctype Class

The Doctype class is passed to the callback function for doc_type: definitions. (ref: lol_html::html_content::Doctype) The Doctype class has the following methods:
MethodDescription
name(): stringReturns the name of the document type
public_id(): stringReturns the public ID of the document type
system_id(): stringReturns the system ID of the document type
remove()Removes the entire document type
removed(): booleanReturns true if the document type has been removed

DocEnd Class

The DocEnd class is passed to the callback function for doc_end: definitions. (ref: lol_html::html_content::DocumentEnd) The DocEnd class has the following methods:
MethodDescription
append(text: string, content_type: string)Appends the text to the end of the document. Content type is ‘html’ or ‘text’

Attribute Class

The Attribute class is returned by the attributes() method of the Element class. (ref: lol_html::html_content::Attribute) The Attribute class has the following methods:
MethodDescription
name(): stringReturns the name of the attribute
name_preserve_case(): stringReturns the name of the attribute, preserving its case.
value(): stringReturns the value of the attribute

StartTag Class

The StartTag class is returned by the start_tag() method of the Element class. (ref: lol_html::html_content::StartTag) The StartTag class has the following methods:
MethodDescription
name(): stringReturns the name of the tag
name_preserve_case(): stringReturns the name of the tag, preserving its case.
set_name(name: string)Sets the name of the tag. Returns an error if the tag name is invalid.
namespace_uri(): stringReturns the namespace URI of the tag
attributes(): [Attributes]Returns an array of Attribute objects
set_attribute(name: string, value)Sets the value of the attribute with the specified name. Returns an error if the attribute name is invalid.
remove_attribute(name: string)Removes the attribute with the specified name
self_closing(): booleanReturns true if the tag is self closing. E.g. <foo />
before(text: string, content_type: string)Inserts the text before the tag. Content type is ‘html’ or ‘text’
after(text: string, content_type: string)Inserts the text after the tag. Content type is ‘html’ or ‘text’
replace(text: string, content_type: string)Replaces the tag with the text. Content type is ‘html’ or ‘text’
remove()Removes the entire tag

Types

Text Type

(ref: lol_html::html_content::TextType)
Valid values are:
  • 'PlainText' - Text inside a <plaintext> element.
  • 'RCData' - Text inside <title>, and <textarea> elements.
  • 'RawText' - Text inside <style>, <xmp>, <iframe>, <noembed>, <noframes>, and <noscript> elements.
  • 'ScriptData' - Text inside a <script> element.
  • 'Data' - Regular text.
  • 'CDataSection' - Text inside a CDATA section.

Content Types

The HtmlTransformer supports the following content types: (ref: lol_html::html_content::ContentType)
  • 'html' - HTML content. The transformer will not escape HTML entities.
  • 'text' - Plain text content. The transformer will escape HTML entities. E.g. < will be converted to &lt;.