Data Validation and Sanitization with WordPress is critical to keep your creations safe from the bad guys. Untrusted data in your web can come from many sources: the system users, third parties or your own database, everything needs to be validated both on input and output. The code without data Validation and Sanitization will work fine but it’s insecure, as bad guys can inject your data and hack your site. In simple term, we can say that user data without validation and sanitization is keeping your shop open unattended.
In this article we are going to look at Introduction, why this is important and what functions WordPress provides to help.
1. Introduction
Data Sanitization
Data sanitization is also known as Data Escaping. These are the conditional filters that you apply to the data to make it safe in a specific context. When we receive the data, depending on the condition used, it escape the evil tags and encodes the tags to make it secure. For instance, to display HTML code in a textarea it would be necessary to replace all the HTML tags by their entity equivalents.
Data Validation
Data validation is a strict validation of data. In this case, we check the data to ensure if it’s the data that we asked for and rejects if it’s not. In simple, data validation is a strict check of data with our specification and only accepts it if it meets the specification. For instance, that an e-mail looks like an e-mail address, that a date is a date and that a number is (or is cast as) an integer.
2. Why is Validation and Sanitization Important?
When you are writing any code, you should not trust anyone. So, we need to consider all data invalid unless it can be proven valid. Any data that can be manipulated by a third party should be validated and sanitized prior to processing that data. If you missed out data validation and sanitization then:
- Hackers can inject various script including XSS (Cross-Site Scripting)
- Data can break the forms at output
- Data can be used to spread malware
Example: Simple input field can be a potential threat.
//Retrieving value from $_Post variable $username = $_POST['username']; echo '<label for="author">' . __( 'Name',' simplecatch' ) . '</label>'; echo '<input type="text" name="username" value="' . $username . '" />';
The above code is a simple input field for entering user’s name. There is nothing wrong with above code if user enters a name George but what happens when user enters following values in input field:
Case A: When user enters name as <George>
. This will break the form output in browser due to < >
(less than, greater than).
Case B: When bag guy try to inject the script and add in the script code <script>alert('XSS');</script>
. This may lead unauthorized user gain privileges to sensitive information and pages.
3. Whether to validate or sanitize data?
It all depends on the condition. If you make to make sure that date is 100% valid then you can use Data Validation but if you are just concerned about making it safe then use Data Sanitization.
For example there is an input field where user has to enter his age. We can validate the data in this case and accept it if its a positive integer using absint( $int ) and reject if data is not a positive integer and ask user to re-enter the age.
But in the case of text field where user enter a lengthy text, validating and ignoring the whole text and asking user to rewrite the whole text just because user uses some HTML tags is not a good approach. In such cases sanitizing the text and stripping the tags is the better way.
4. Functions commonly used to validate or sanitize data?
esc_html()
: It is used for escaping data that contains HTML tags. The function encodes special characters in their HTML entities, making it safe to display on the page. Very similar toesc_attr()
.
Example:<?php echo esc_html( $text ); ?>
esc_attr()
: It is used for escaping HTML attributes which is similar toesc_html()
. The different is that we only use this function whenever we need to display data inside an HTML element.
Example:$attribute = esc_attr($val);
If you need to echo the return of this function it is recommended that you use the function
esc_attr_e()
.
Example:esc_attr_e($val);
esc_textarea()
: It is used for escaping HTML<textarea>
values. This function should be used to encode text for use in a<textarea>
form element.
Example:<textarea name="bio"><?php echo esc_textarea( $bio); ?></textarea>
esc_url()
: It is used for validating and sanitizing URLs. This function strips out various offending characters, and replaces quotes and ampersands with their entity equivalents. It then checks that the protocol being used is allowed (javascript, by default, isn’t). The esc_url function should be used when you displaying the URL in a textbox, input attribute or on the page. But if you want to store the value in a database or use the URL to redirect the user you should use the functionesc_url_raw()
Example:<a href="<?php echo esc_url( $url); ?>">Link</a> <a href="<?php echo esc_url_raw( $url); ?>">Link</a>
esc_js()
: It is used for escaping text strings in JavaScript. This functions will escape single quotes, htmlspecialchar ” < > &, and fix line endings.
Example:<script> var myVar = '<?php echo esc_js($text); ?>'; </script>
intval()
: It is used to check if the value is an integer or not. If the value is a string, and therefore not an integer then it will return as zero. We then check to see if the value ended up as zero. If it did, we’ll save an empty value to the database. Otherwise, we’ll save the properly validated value.
Example:<input type="text" name="number_to_display" value="<php echo intval( $number ); ?>" />
absint()
: It is used to check if the value in a positive integer. If the value is a string or a negative number, and therefore not an positive integer then it will return as zero. We then check to see if the value ended up as zero. If it did, we’ll save an empty value to the database. Otherwise, we’ll save the properly validated value.
Example:<input type="text" name="number_to_display" value="<php echo absint( $number ); ?>" />
sanitize_text_field()
: It is used to sanitize text data. This function will sanitize the input and remove invalid UTF-8 characters, convert single < characters to entity, strip all tags, remove line breaks, tabs and extra white space, and strip octets and return a string safe to be stored in the database. Example: [php]$safe_string = sanitize_text_field ($val);[/php]sanitize_title()
: It is used to sanitize the title. This function will remove any HTML or PHP tags and replace all spaces with a hyphen. In WordPress, when you create a new post, it will take the title of the post and sanitize it to be used in the URL of the post. It will take a string and will return a URL slug of the string.
Example:$wordpress_url = sanitize_title('Check this now'); echo $wordpress_url; //check-this-now
sanitize_html_class()
: It is used to sanitize the HTML Class. This function make sure that there are no invalid characters in the HTML class name.
Example:echo '<div class="' . esc_attr( sanitize_html_class($post_class) ) . '">';
sanitize_file_name()
: it is used to sanitize the file name for storing. This function make sure that there are no invalid characters that are not allowed in file names and replaces any whitespace with dashed.
Example:$new_filename = sanitize_file_name($val);
sanitize_email()
: It is used to sanitize an email address. This function will strip out all characters that are not allowed in an email address and make sure that an email address only has valid characters.
Example:$email = sanitize_email($val);
is_email()
: It is used to check email address. This function check that the input data email address is a valid email address and will return a boolean value true if the email address is valid.
Example:if( is_email( $val ) ) { // Valid email address } else { // Invalid email address }
wp_kses()
: It is used to sanitize untrusted HTML element. This function verifies only defined HTML tags and attributes are allowed and everything else is stripped out.
Example:$filteredHtml = wp_kses($string, $allowed_html, $allowed_protocols);
The wp_kses function is called with 3 parameters.
$string – The provided string to validate.
$allowed_html – A array of allowed HTML elements.
$allowed_protocols – Is an optional parameter of trusted protocols.wp_kses_post()
: It is similar to wp_kses() and is used to sanitize untrusted HTML elements. But here you don’t need to provide an array of allowed HTML tags and attributes. The list is already set by WordPress based on the allowed HTML tags in a regular WordPress post content.
Example:wp_kses_post( $data );
Thanks for the valuable information, I like it.
Thank you. Great article!
Nice info. Thanks for that.
I have a question. wp_kses santizes the html with the allowed html element but it also strips off the class associated with the allowed htmls.
for eg:
$html = ‘error:Enter an email.’;
echo wp_kses( $html, array( ‘p’ => array(), ‘strong’ => array() ) );
the above will output
error:Enter an email.’
Any idea what to do to have the class too.
Thanks in advance.
Thanks for sharing such great post on PHP and WordPress.