AnyWhichWay
codeburst
Published in
7 min readJan 20, 2018

--

A Little Cleaner: Protecting HTML & Reducing Cross-Site Script Attacks

The risks of cross-site scripting (XSS) and HTML injection are well documented. They provide a mechanism not just through which your own operations can be negatively impacted but also by which nefarious parties can hijack your website to fool others into providing their usernames, passwords, or other private information. If you care about the general safety of others on the Internet, you should take measures to protect your site from these types of attacks.

In this article you will be shown how to implement a light weight, configurable function to help reduce the risk of these security exploits. Some useful links to other resources are also provided at the end of the article.

There are three primary ways that malicious code can enter your system and be used to hijack your web pages or get inserted into a back-end system:

  1. Direct entry into a web page form or inline editable HTML.
  2. As part of a URL path or query string which is used in dynamic HTML generation or form population.
  3. Data acquired using other means that is added to objects or arrays your code creates. This could include POST data sent directly to your server or headers and content you receive by your code making a request of a malicious or hi-jacked server.

The code presented in this article does five things:

  1. Cleans string input by striping out or sterilizing/escaping potentially malicious content which could redirect users to another site, pop-up fake dialog boxes, present content you don’t want to present, or mess with your server.
  2. Cleans query strings.
  3. Cleans arrays and objects you may have assembled from foreign data.
  4. Automatically protects the value attribute on HTML input elements, which is otherwise an open attack surface.
  5. Automatically protects from injection via the JavaScript prompt function, which is an open attack vector.

Once implemented, the following behaviors can be expected:

  1. cleaner("function() { alert('pow!'); }") will return undefined.
  2. document.createElement("input").value = "()=> alert('pow!'); }" will leave value undefined.
  3. <some element>.setAttribute("title","()=> alert('pow')") will leave the title unchanged.
  4. <some element>.setAttribute("onclick","()=> alert('pow')") will set the onclick responder to alert with the message “pow!”.

Functional Overview

The cleanerfunction provides a foundation from which you can easily expand protection when additional vulnerabilities are identified. The function implements a 5 step process driven by 5 options.

  1. It attempts to coerce malformed data into an acceptable format by reducing it using an array of coercefunctions.
  2. It returns the data if at least one function in an array of acceptfunctions returns a truthy value.
  3. It rejects/eliminates data by returning undefined if just one function in an array of reject functions returns a truthy value.
  4. It escapes strings by reducing them across an array of escapefunctions.
  5. It optionally evaluates strings based on an eval flag and tests to see if the result satisfies steps 1 through 4.

The attack surface value and the vector prompt are addressed by re-writing document.createElement and window.prompt to use cleaner. Additionally, all document elements have protection added when the document loaded event occurs.

As you may have guessed, the options are provided in an option object of the form:

{
coerce: [function[,...]],
accepts: [function[,...]],
rejects: [function[,...]],
escape: [function,[,...]],
eval: true|false
}

cleaner provides a number of default accept, reject, and escape functions and defaults to executing the optional step 5. Since the step 5 evaluation is done after rejection and escaping, its eval is safe (assuming the reject and escape steps have eliminated any blocking code, code that may adversely impact cleaner itself, or code that may inject data into log files).

These defaults are provided on a property of the cleaner function itself called options. It is possible to extend cleaner statically when cleaner is first loaded or on a per-call basis when cleaning data by using a second argument, extension. The defaults can also be completely eliminated by passing a third argument to cleaner when cleaning data.

Here is the general structure of cleaner:

const cleaner = (data,extensions={},options=cleaner.options) 
=> {
options = <merge extensions with options>
if(<accept>) return data;
if(<reject>) return;
if(Array.isArray(data)) return <clean array>;
if(data && typeof(data)==="object") return <clean object>;
data = <escape data>;
if(options.eval) return cleaner(<evaluated data>);
return data;
};
cleaner.extend = (extensions) => { ... };
cleaner.options = {...}
cleaner.protect (element) => { ... };

The Cleaner Function

const cleaner = (data,extensions={},options=cleaner.options) => {
// to exclude standard options pass as third argument
// {coerce:[],accept:[],reject:[],escape:[],eval:false}

// merge extensions with options
options = Object.keys(options).reduce((accum,key) => {
if(Array.isArray(options[key])) { // use union of arrays
accum[key] = (extensions[key]||[]).reduce((accum,item) => {
accum.includes(item) || accum.push(item); return accum;
},options[key].slice());
} else if(typeof(extensions[key])==="undefined") {
accum[key] = options[key];
} else {
accum[key] = extensions[key];
}
return accum;
},{});
// data may be safe if coerced into a proper format
data = options.coerce.reduce((accum,coercer) =>
coercer(accum),data);
//these are always safe
if(options.accept.some(test => test(data))) return data;
//these are always unsafe
if(options.reject.some(test => test(data))) return;
//remove unsafe data from arrays
if(Array.isArray(data)) {
data.forEach((item,i) => data[i] = cleaner(data));
return data;
}
//recursively clean data on objects
if(data && typeof(data)==="object") {
for(let key in data) {
const cleaned = cleaner(data[key]);
if(typeof(cleaned)==="undefined") {
delete data[key];
} else {
data[key] = cleaned;
}
}
return data;
}
if(typeof(data)==="string") {
data = options.escape.reduce((accum,escaper) =>
escaper(accum),data); // escape the data
if(options.eval) {
try {
// if data can be converted into something that is legal
// JavaScript, clean it, make sure that options.reject has
// already removed undesireable self evaluating or blocking
// functions. Call with null to block global access.
return cleaner(Function("return " + data).call(null));
} catch(error) {
// otherwise, just return it
return data;
}
}
}
return data;
}
// statically merge extensions into default options
cleaner.extend = (extensions) => {
const options = cleaner.options;
cleaner.options = Object.keys(options).reduce((accum,key) =>
{
if(Array.isArray(options[key])) { // use union of arrays
accum[key] = (extensions[key]||[]).reduce((accum,item) => {
accum.includes(item) || accum.push(item); return accum;
},options[key].slice());
} else if(typeof(extensions[key])==="undefined") {
accum[key] = options[key];
} else {
accum[key] = extensions[key];
}
return accum;
},{});
}
// default options/support for coerce, accept, reject, escape, eva
cleaner.options = {
coerce: [],
accept: [data => !data ||
["number","boolean"].includes(typeof(data))],
reject: [
// executable data
data => typeof(data)==="function",
// possible server execution like <?php
data => typeof(data)==="string" && data.match(/<\s*\?\s*.*\s*/),
// direct eval, might block or negatively impact cleaner itself,
data => typeof(data)==="string" &&
data.match(/eval|alert|prompt|dialog|void|cleaner\s*\(/),
// very suspicious,
data => typeof(data)==="string" && data.match(/url\s*\(/),
// might inject nastiness into logs,
data => typeof(data)==="string" &&
data.match(/console\.\s*.*\s*\(/),
// contains javascript,
data => typeof(data)==="string" && data.match(/javascript:/),
// arrow function
data => typeof(data)==="string" &&
data.match(/\(\s*.*\s*\)\s*.*\s*=>/),
// self eval, might negatively impact cleaner itself
data => typeof(data)==="string" &&
data.match(/[Ff]unction\s*.*\s*\(\s*.*\s*\)\s*.*\s*\{\s*.*\s*\}\s*.*\s*\)\s*.*\s*\(\s*.*\s*\)/),
],
escape: [
data => { // handle possible query strings
if(typeof(data)==="string" && data[0]==="?") {
const parts = data.split("&");
let max = parts.length;
return parts.reduce((accum,part,i) => {
const [key,value] = decodeURIComponent(part).split("="),
type = typeof(value),
// if type undefined, then may not even be URL query
// string, so clean "key"
cleaned = (type!=="undefined"
? cleaner(value)
: cleaner(key));
if(typeof(cleaned)!=="undefined") {
// keep only those parts of query string that are clean
accum += (type!=="undefined"
? `${key}=${cleaned}`
: cleaned) + (i<max-1 ? "&" : "");
} else {
max--;
}
return accum;
},"?");
}
return data;
},
data => { // handle escaping html entities
if(typeof(data)==="string" && data[0]!=="?"
&& typeof(document)!=="undefined") {
// on client or a server DOM is operable
const div = document.createElement('div');
div.appendChild(document.createTextNode(data));
return div.innerHTML;
}
return data;
}
],
eval: true
}

Protecting The Value Attribute & setAttribute

function setAttribute(name,value) {
// allow attributes that take functions to remain uncleaned
// enhanced version might selectively reject suspicious functions
const cleaned = (name.indexOf("on")===0 ? value : cleaner(value));
if(typeof(cleaned)!=="undefined") {
this.__setAttribute__(name,cleaned);
}
}
cleaner.protect = (el) => {
if(typeof(el.value)!=="undefined") {
const get = () => get._value,
set = (value) => {
const cleaned = cleaner(value);
if(typeof(cleaned)!=="undefined") {
get._value = cleaned;
}
}
// save current value;
get.__value = cleaner(el.value);
// re-define the value property so data is cleaned
Object.defineProperty(el,"value",{enumerable:true,
configurable:true,
get,set});
}
//redefine setAttribute so it works with cleaned value
if(el.setAttribute!==setAttribute) {
Object.defineProperty(el,"__setAttribute__"
{enumerable:false,
configurable:true,
writable:true,
value:el.setAttribute});
el.setAttribute = setAttribute;
}
for(let child of el.children) {
cleaner.protect(child);
}
return el;
}

Preventing Injection Via Prompt & Protecting Document

if(typeof(window)!=="undefined") {
// on client or a server pseudo window is available
if(window.prompt) {
const _prompt = window.prompt.bind(window);
window.prompt = function(title) {
const input = _prompt(title),
cleaned = cleaner(input);
if(typeof(cleaned)=="undefined") {
window.alert("Invalid input: " + input);
} else {
return cleaned;
}
}
}
window.addEventListener("load",() => {
cleaner.protect(document.head);
cleaner.protect(document.body);
});
}

Unit Tests & Usage

These unit tests show how to use the code:

it("clean function",function() {
expect(cleaner("() => { true; }")).to.equal(undefined);
});
it("clean html",function() {
expect(cleaner("<div onclick='((event) => console.log(event))(event)'>Test</div>")).to.equal(undefined);
});
it("clean php",function() {
expect(cleaner("<?php ")).to.equal(undefined);
});
it("clean eval",function() {
expect(cleaner("eval(alert('ok'))")).to.equal(undefined);
});
it("clean object",function() {
const object = {
nested: {
f: () => true,
s: "test"
}
},
clean = cleaner(object);
expect(clean.nested.f).to.equal(undefined);
expect(clean.nested.s).to.equal("test");
});it("protect value",function() {
const el = document.createElement("input");
el.value = "function() { return true; }";
expect(el.value==="" || el.value===undefined).to.equal(true);
el.value ="Safe Value";
expect(el.value).to.equal("Safe Value");
});
it("setAttribute",function() {
const el = document.createElement("div");
el.setAttribute("title","function() { return true; }");
expect(el.title).to.equal("");
el.setAttribute("title","Safe Title");
expect(el.title).to.equal("Safe Title");
});
it("document protected",function() {
if(typeof(document)!=="undefined") {
const el = document.body;
el.setAttribute("title","function() { return true; }");
expect(el.title).to.equal("");
el.setAttribute("title","Safe Title");
expect(el.title).to.equal("Safe Title");
}
});

Closing Comments

As a convenience, this code is provided via npm as little-cleaner. We call it little-cleaner because:

  1. It is small (1.2k minified and gzipped with its default capability).
  2. It is just part of what you should be using to secure your application. Adequate security requires a comprehensive set of client and server side tactics both inside your code-base and external to it.

As an exercise you may wish to try writing a function that identifies a string as HTML and just rejects all HTML input by adding it to a reject array property on anextension object.

If you are doing a lot of HTML generation based on user input or you need some more flexible sanitizing of HTML, you should consider something like DOMPurify to secure your entire DOM in addition to something like little-cleanerthat locks down the attack surface on value or the attack vector opened up by prompt while also cleaning objects or arrays that may contain foreign data.

If you found this useful, give us a clap. Be safe!

Useful Links

--

--