Class Sanitizer

java.lang.Object
com.oorian.utils.Sanitizer

public final class Sanitizer extends Object
Utility class for sanitizing and escaping user input to prevent XSS (Cross-Site Scripting) and other injection attacks.

This class provides static methods for escaping content in various contexts:

Usage Examples:


 // Escaping user input for display in HTML
 String userComment = "<script>alert('xss')</script>";
 div.setText(Sanitizer.escapeHtml(userComment));
 // Output: &lt;script&gt;alert(&#x27;xss&#x27;)&lt;/script&gt;

 // Safe attribute values
 String title = "Click \"here\" for more";
 element.addAttribute("title", Sanitizer.escapeHtmlAttribute(title));

 // Safe JavaScript strings
 String message = "User's \"special\" message";
 page.executeJs("alert('" + Sanitizer.escapeJavaScript(message) + "')");

 // URL parameters
 String searchTerm = "foo&bar=baz";
 String url = "/search?q=" + Sanitizer.escapeUrl(searchTerm);
 

Security Note: Always use the appropriate escape method for the context. Using the wrong escape method may leave your application vulnerable to injection attacks.

Since:
2025
Version:
1.0
Author:
Marvin P. Warble Jr.
See Also:
  • Method Details

    • escapeHtml

      public static String escapeHtml(String input)
      Escapes HTML special characters to prevent XSS attacks in HTML text content.

      This method escapes the following characters:

      • &&amp;
      • <&lt;
      • >&gt;
      • "&quot;
      • '&#x27;

      Use this method when: Inserting user-provided content into HTML text nodes.

      Parameters:
      input - The string to escape. May be null.
      Returns:
      The escaped string, or an empty string if input is null.
    • escapeHtmlAttribute

      public static String escapeHtmlAttribute(String input)
      Escapes a string for safe use in HTML attribute values.

      This method performs the same escaping as escapeHtml(String) plus additional characters that could break out of attribute context:

      • `&#x60; (backtick - prevents template literal injection)
      • =&#x3D; (equals sign - prevents attribute injection in some contexts)

      Use this method when: Setting HTML attribute values with user-provided content.

      Parameters:
      input - The string to escape. May be null.
      Returns:
      The escaped string, or an empty string if input is null.
    • escapeJavaScript

      public static String escapeJavaScript(String input)
      Escapes a string for safe use in JavaScript string literals.

      This method escapes the following characters:

      • \\\
      • '\'
      • "\"
      • /\/ (prevents </script> from breaking out)
      • Newline → \n
      • Carriage return → \r
      • Tab → \t
      • Line separator (U+2028) →
      • Paragraph separator (U+2029) →

      Use this method when: Inserting user-provided content into JavaScript string literals, whether single-quoted, double-quoted, or template literals.

      Parameters:
      input - The string to escape. May be null.
      Returns:
      The escaped string, or an empty string if input is null.
    • escapeUrl

      public static String escapeUrl(String input)
      URL-encodes a string for safe use in URL parameters.

      This method uses UTF-8 encoding to convert the input string to a URL-safe format. All characters except alphanumeric characters and -_.~ are percent-encoded.

      Use this method when: Building URLs with user-provided query parameters or path segments.

      Parameters:
      input - The string to encode. May be null.
      Returns:
      The URL-encoded string, or an empty string if input is null.
    • escapeCss

      public static String escapeCss(String input)
      Escapes a string for safe use in CSS values.

      This method escapes characters that could break out of CSS context or enable CSS injection attacks:

      • Backslash, quotes, parentheses, semicolons, colons, etc.
      • Characters that could enable expression() or url() injection

      Use this method when: Setting CSS property values with user-provided content, such as in inline styles or dynamic stylesheets.

      Note: For maximum security, consider using a whitelist approach for CSS values rather than escaping.

      Parameters:
      input - The string to escape. May be null.
      Returns:
      The escaped string, or an empty string if input is null.
    • stripHtml

      public static String stripHtml(String input)
      Removes all HTML tags from the input string, leaving only plain text content.

      This method removes:

      • All HTML/XML tags (including attributes)
      • HTML comments

      After tag removal, common HTML entities are decoded:

      • &amp;&
      • &lt;<
      • &gt;>
      • &quot;"
      • &#x27; and &apos;'
      • &nbsp; → space

      Use this method when: Extracting plain text content from HTML, such as for search indexing or plain text display.

      Warning: This method should NOT be used as a security measure to sanitize untrusted HTML for display. Use escapeHtml(String) instead.

      Parameters:
      input - The HTML string to strip. May be null.
      Returns:
      The plain text content, or an empty string if input is null.
    • containsDangerousHtml

      public static boolean containsDangerousHtml(String input)
      Checks if a string contains potentially dangerous HTML content.

      This method checks for the presence of:

      • Script tags
      • Event handler attributes (onclick, onerror, etc.)
      • JavaScript URLs
      • Data URLs
      • Other potentially dangerous patterns

      Use this method when: You need to validate user input before allowing it in contexts where full escaping is not possible.

      Parameters:
      input - The string to check. May be null.
      Returns:
      true if the string contains potentially dangerous content, false otherwise (including when input is null).