Difference Between Content and Context in DLP
Content and context are two terms that one will often hear in discussions on data loss prevention. It is important that IT security administrators understand the difference between the two in order for them to better appraise the different DLP solutions in the market.
Content awareness is often touted as a standard characteristic of DLP software. It is defined as the capacity of DLP software to perform an in depth analysis of content by employing a broad array of techniques.
Content analysis is different from context analysis. Probably the best demonstration of content versus context is to think about content as the furniture within a house and context as the house structure itself and its environs.
In a typical organisation, data context will encompass everything from destination, recipient, source and sender, to size, headers, metadata and file formats. In summary, context includes everything else other than the data itself (i.e. the content). Institutions are best served by DLP solutions that include both content and context analysis functionality.
Data content analysis will thus look at the content inside data containers. The distinct advantage that content analysis has over context analysis is that sensitive data is sensitive regardless of where it is. In other words, when an organisation classifies information as confidential or restricted, such information must be monitored and protected wherever it is located. The data is what is being protected and not its context.
In theory, scanning data, classifying it and developing rules to protect such data where it is located in the organisation sounds simple. However the sheer quantity and complexity of enterprise data makes this process far more time consuming and difficult than contextual analysis.
Analysing the content of an email in plain text format is relatively easy. But when administrators start to venture into the more complicated binary files, the process is not as straightforward. This is why DLP solutions come with a file cracking functionality that allow them to peer into and interpret file content even where the content is in buried under multiple levels of coding.
A good illustration of the file cracking functionality is the ability of DLP software to read the content of a compressed Microsoft PowerPoint presentation embedded in a Microsoft Excel file. To read the file, the software would have to read the Excel sheet, locate the compressed PowerPoint file, uncompress it and then scan the PowerPoint file for sensitive information.
Of course this example may be a relatively easy scenario for most DLP products to handle. Certain file formats are far more complex such as encrypted PDF files and CAD files. Most DLP software nowadays provide support for more than 300 of the most commonly used file formats thus making it more difficult to encounter formats that are not covered.
Effective content analysis must also incorporate checking against multiple languages (including the distinct dual byte character sets of Asian languages) and the flexibility to extract text from unknown file formats. When it comes to encryption, DLP tools will quarantine or block encrypted files that they cannot read.