This section describes how the device implements in-depth identification of traffic content and blocks or alerts on traffic containing specified keywords.
Data filtering falls into two types: file data filtering and application data filtering.
File data filtering filters the uploaded and downloaded files by keyword. You can specify the protocols for file transfer or the types of files to be filtered.
Application data filtering filters application content by keyword. For different applications, the device filters different contents.
Application |
Filter By |
|
|---|---|---|
protocol |
HTTP |
|
FTP |
Name and content of the file to be uploaded or downloaded |
|
SMTP |
Title, body, and attachment name of the sent mail |
|
POP3 |
Title, body, and attachment name of the received mail |
|
NFS |
Uploaded and downloaded file |
|
SMB |
Uploaded and downloaded file |
|
IMAP |
Title, body, and attachment name of the received mail |
|
RTMPT |
Name of the file transmitted using RTMPT |
|
FLASH |
Name of the Flash file |
|
File Sharing |
Names of shared files |
|
Keyword refers to the content to be identified by the device in data filtering. The device performs the specified action for the files or applications containing the keyword. Generally, the keyword is confidential or illegal information.
The keyword includes predefined keywords and user-defined keywords.
Predefined keywords include bank card numbers, credit card numbers, social security numbers, ID card numbers, mobile phone numbers, and confidentiality (including confidential, secret, and top secret).
User-defined keywords can be texts or regular expressions.
The keywords that can be matched with texts or regular expressions contain a minimum of three bytes. Each ASCII character is one byte, and each Chinese character is two bytes.
For example, a keyword can match abc, but cannot match a, ab, or b.
For a text keyword, you only need to enter the exact keywords to be filtered. Text keywords are easy to configure and are used for an exact match.
Regular expression keywords provide fuzzy matching capability. For example, "." in "abc.de" can represent any single character. Therefore, "abc.de" can match "abcxde", "abcyde", or "abc8de".
Keywords in a regular expression can be flexibly and efficiently matched, but the configuration must observe the rules of regular expressions. Table 2 lists the rules of regular expressions.
Character |
Description |
|---|---|
\ |
Add the escape character \ before the special characters to literally match them, such as, \., \(, and \). |
. |
Matches any single ASCII character or Chinese character. For example, abc.de can match abcade, abcyde, and abc8de. Logically, an regular expression cannot start or end with a .. For example, .abc|def, abc.|def, abc|.def, abc|def., and abc|def.|ghi are invalid inputs. |
( ) |
Indicates the beginning or end of a subexpression. For example, (abc)+ can match abc and abcabc. |
? |
Matches the previous character or expression zero or one time. For example, abcd? can match abc and abcd. Note that the regular expression cannot be set to abc?. For example, if the match count is 0, the keyword must be ab, but the keyword that a regular expression can match must contain a least three bytes. Therefore, there must be at least four characters in front of ?. |
* |
Repeats the previous character or expression zero or more times. For example, abcd* can match abc, abcd, and abcddd. Note that the regular expression cannot be set to abc*. For example, if the match count is 0, the keyword must be ab, but the keyword that a regular expression can match must contain a least three bytes. Therefore, there must be at least four characters in front of *. |
+ |
Repeats the previous character or expression one or more times. For example, abc+ can match abc and abcc, but not ab. |
| |
Matches the expression either before or after the operator. For example. abc|defg can match abc or defg. (a|b)cde can match acde or bcde. |
- |
Creates an expression range. For example, [a-z] can match any single character from a to z, including a and z. |
[ ] |
Matches any single character that is contained within the brackets. For example, abc[def] can match abcd, abce, or abcf. |
{n} |
Matches the previous character n times. n is a non-negative integer and less than 10. For example, abc{2} cannot match abc in oabco, but can match the abccs in oabcco. |
{n,m} |
Matches the previous character larger than or equal to n times but smaller than or equal to m times. Both n and m are non-negative integers smaller than or equal to 10, and n is smaller than m. For example, abcd{0,3} can match abc, abcd{1,3} can match abcdd, and (abc){1,5} can match abcabcabc. |
\d |
Matches a digit character. It equals to [0-9]. For example, abc\d can match abc0 and abc9. |
\w |
Matches any digit, letter, and underscore. For example, abc\w can match abc2, abcd, abcA, and abc_. |
If a keyword is identified in data filtering, the device performs the action listed in Table 3.
Action |
Description |
|---|---|
Alert |
The device generates logs but does not block the content. |
Block |
The device blocks the content and generates logs. For users, the web pages cannot be displayed, uploading or downloading files fails, and sending or receiving mails fails. |
By Weight |
Each keyword has a weight. The device adds the weights of identified keywords by matching count. If the sum of weights is less than the block threshold and greater than or equal to the alert threshold, the device generates an alarm. If the sum of weights is greater than or equal to the block threshold, the device blocks the traffic. For example, two keywords are defined on the device. The weight of keyword a is 1, and that of keyword b is 2. The alert threshold for data filtering is 1, and the block threshold is 5. Assume that keyword a appears once on the web page browsed by a user, the sum of weights is 1, which is equal to the alert threshold. The device generates a log, but the user can continue browsing the web page. If keyword a appears three times and keyword b appears twice on the web page browsed by a user, the sum of weights is 7 (3 x 1 + 2 x 2 = 7), which is greater than block threshold 5. The device blocks the web page and generates a log, and the web page cannot be displayed for the user. |
If traffic passing through the device matches a security rule, the action is permit, and the data filtering profile is referenced in the security policy, data filtering must be implemented on the traffic.
The data filtering process is as follows:
The device detects and identifies the traffic content.
For an application, the identified content includes the application type and transmission direction. For a file, the identified content includes the protocol used for transmitting the file, the file type, and transmission direction.