Describe information protection features
Microsoft 365’s information protection and governance features are built on the following principles:
- Data classification
- Data protection
- Data lifecycle management
These principles build on each other to support a holistic approach to data governance.
Data classification
With Microsoft Purview solutions, data classification is accomplished by applyinglabels (or sensitivity labels) to content objects such as files, email messages, and chats. Labels are metadata that act like virtual sticky notes, providing additional information about the content. Labels themselves have no data protection features, as they are only a classifying mechanism.
Labels can be applied manually by end users in many ways—such as during content creation with Microsoft 365 apps such as Word or Outlook or through the SharePoint Online and OneDrive for Business web apps. Labels can also be applied automatically, using sophisticated detection tools such as trainable classifiers and sensitive information types. You can learn more about labels at https://learn.microsoft.com/en-us/purview/sensitivity-labels.
Trainable classifiers are machine learning algorithms that have been trained to identify content based on certain characteristics, such as medical terminology, financial transaction data, or threatening language. Microsoft 365 comes with several built-in classifiers that will automatically scan organizational content and suggest what categories the content matches. See Figure 10.7:
Figure 10.7 – Trainable classifiers
After the scanning process, you can use Content explorer to view detected data and mark content as either matching or not matching a specific classifier. Microsoft 365 will update the detections as you mark data as relevant or not. You can create your own trainable classifiers in addition to using the prebuilt classifiers included with Microsoft 365. You can learn more about trainable classifiers at https://learn.microsoft.com/en-us/purview/classifier-learn-about.
Sensitive information types (also referred to as sensitive info types, or SITs) are content detection mechanisms based primarily on keyword, string, and pattern matches. Currently, Microsoft 365 comes with over 300 sensitive information types. Like trainable classifiers, you can also create your own sensitive information types based on organizational content. For example, if you have specific document control numbers or terminology that is used to describe certain types of content, you can create a custom sensitive information type to detect that content. You can also upload a document template (using a process called document fingerprinting) to define a sensitive information type. You can learn more about sensitive information types at https://learn.microsoft.com/ en-us/purview/sensitive-information-type-learn-about.
Exact data match (EDM) classifiers are another type of classifier that’s used in Microsoft 365. EDM classifiers are a more specific type of sensitive information type. Where trainable classifiers and sensitive information types may be used to detect patterns, EDM uses a finite dataset to determine whether content is a match. For example, an EDM classifier may be built from a list of customer names, product names, and numbers, or serial numbers. Implementing EDM and sensitive information types requires you to periodically refresh the source data that you’ll match against. This is accomplished through an EDM upload agent deployed on a workstation or server that has access to the data source and can transfer a hashed source table to the Microsoft 365 service. You can learn more about EDM sensitive information types at https://learn.microsoft.com/en-us/purview/sit-learn-about-exact-data-match-based-sits.