Synergy: Metadata
Our tagging mechanism employs metadata to classify documents and preserve essential internal information needed for its operation.
Given that various file types utilize unique methods for storing metadata, the agent is designed to adapt to these differences. Despite the variations in storage mechanisms, the actual information written remains consistent, conforming to specific formats that are applicable across both MacOS and Windows operating systems.
Here is an example of metadata written to a .png
file, illustrating the structured way in which information is stored:
{
"GVData": "ew0KICAidGFnc2V0X2YxNGZjMWYxXzg5NTBfNDBkNV84YTI5XzQ1OTA5ZGE5NDdkNl9nZHByL3BpaSI6ICJGYWxzZSIsDQogICJ0YWdzZXRfZjE0ZmMxZjFfODk1MF80MGQ1XzhhMjlfNDU5MDlkYTk0N2Q2X3NlbnNpdGl2ZSI6ICJGYWxzZSIsDQogICJ0YWdzZXRf",
"GVData0": "MDA0ZGVhMzNfODc1MV80Mzk5X2E3NmVfOTVmMzcxY2I0MTE5X2Rpc3RyaWJ1dGlvbiI6ICJJbnRlcm5hbCIsDQogICJ0YWdzZXRfZTE2NDA5YTdfMTcwMF80MTUzXzkwOTBfMzk1NWJjMmYwYWU4X2NsYXNzaWZpY2F0aW9uIjogIkdlbmVyYWwgQnVzaW5lc3MiDQp9",
"GVData1": "(end)",
"Classification": "General Business",
"ClassificationTagSetId": "e16409a7-1700-4153-9090-3955bc2f0ae8",
"ClassificationValue": "General Business",
"DistributionTagSetId": "004dea33-8751-4399-a76e-95f371cb4119",
"DistributionValue": "Internal / \u0645\u0631\u062d\u0628\u0627 \u0628\u0627\u0644\u0639\u0627\u0644\u0645",
"FileId": "e0481ca0-a9e0-e307-07fa-6189581762a8",
"MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_ActionId": "95871ebc-c143-40b9-9b42-ad7bd6bc77df",
"MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_ContentBits": "3",
"MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_Enabled": "true",
"MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_Method": "Privileged",
"MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_Name": "General Business",
"MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_SetDate": "2023-08-02T11:17:28Z",
"MSIP_Label_d2ee4459-2afa-4136-9d18-4f2ebca102cf_SiteId": "ed86fd3a-ab24-4113-a9f8-6cb38f63c190",
"TagDateTime": "2023-08-02T11:17:28Z",
"UserId": "trzec"
}
This example demonstrates how the agent builds the metadata that gets attached to a classified file.
Metadata Entries
The agent is designed to write various types of metadata entries, each serving a specific purpose:
GVData0, GVData1, GVData2, etc. - These metadata entries encapsulate internal data used by the agent, such as document ID and written visual labels. Encoded using Base64, this metadata is integral to the agent's operations and is not configurable. It will always be written by the agent.
MSIP_Label_* - This category of metadata is configurable and is written to ensure compatibility with Microsoft Azure Information Protection (AIP). It allows the agent to align with Microsoft's security and protection frameworks.
Configurable Tags - Apart from the above fixed metadata entries, the agent supports configurable tags. These can be tailored according to specific needs within the agent's configuration, providing flexibility in handling and storing additional information.
This design allows for a high degree of customization in how metadata is written, catering to diverse requirements and integration scenarios.
AIP Metadata Configuration
The agent is able to write metadata that aligns with AIP's standards, allowing seamless interaction with Microsoft's security framework.
Below is an example of a configuration that demonstrates how the agent can be configured to write AIP metadata.
For agent v4, the syntax is slightly different, but very similar to older versions - this goes within the “global” part of the configuration:
"aipConfiguration": {
"siteId": "7bd98728-3093-47f4-921e-0e70b5a54fe6",
"labels": [
{
"labelId": "b0426751-caad-47fa-9f7b-ab3ecdf2a453",
"name": "Public",
"method": "Privileged",
"contentBits": "3"
},
{
"labelId": "e8febc5f-7679-4c23-8457-a2d9c0c83853",
"name": "General Business",
"method": "Privileged",
"contentBits": "3"
},
{
"labelId": "cefd4509-260d-4ab3-9a12-e8e78560da3c",
"name": "Confidential",
"method": "Privileged",
"contentBits": "3"
},
{
"labelId": "9bc3c901-3c35-4a56-ade8-f1e58ef4ecba",
"name": "Restricted",
"method": "Privileged",
"contentBits": "3"
}
],
"mappings": [
{
"classificationTag": "Public",
"labelId": "b0426751-caad-47fa-9f7b-ab3ecdf2a453"
},
{
"classificationTag": "Internal",
"labelId": "e8febc5f-7679-4c23-8457-a2d9c0c83853"
},
{
"classificationTag": "Confidential",
"labelId": "cefd4509-260d-4ab3-9a12-e8e78560da3c"
},
{
"classificationTag": "Restricted",
"labelId": "9bc3c901-3c35-4a56-ade8-f1e58ef4ecba"
}
]
},
This configuration includes specific AIP labels and their corresponding classifications, defining how the agent translates its internal categorizations into a format that AIP can recognize.
How to get MIP labels from Azure
We can get the labels configured in customer’s Azure using the Fetch MIP labels from Azure
in power tools.
It accepts all the required details from the user (tenantId, appName, clinetId, clientSecret, emailId) as inputs and prints out the available MIP labels as output.
The output format:
This utility will use application permissions
in azure app and the following API permissions need to be provided to it (with admin consent):
Microsoft Graph → InformationProtectionPolicy.Read.All
Microsoft Information Protection Sync Service → UnifiedPolicy.Tenant.Read
Configurable Tags
The agent also provides the functionality to write fully customized metadata entries.
Below is an example of a configuration that defines a variety of custom metadata tags:
The tagHandle
field within this configuration supports various placeholders that facilitate dynamic tagging:
{classification}
- current classification value of the document{distribution}
- current distribution value of the document{compliance}
- current compliance value of the document{datetime}
- current date and time{email}
- email of the current user (only works when outlook plugin is installed){user}
- id of the current user{machineid}
- id of the current machine{fileid}
- unique file id{classification_raw}
- This is the tag value without the tag alias{compliance_raw}
- This is the tag value without the tag alias{distribution_raw}
- This is the tag value without the tag alias{classification_guid}
- This is a uniquid id generated based on tagset id and tag name{classification_guid}
- This is a uniquid id generated based on tagset id and tag name{classification_guid}
- This is a uniquid id generated based on tagset id and tag name
Writing Metadata to files
As already mentioned in the beginning of this page, the agent uses a slightly different approach for each file to write/read the metadata. This flexibility is necessitated by the diverse range of file types and their various mechanisms for storing metadata. The following outlines the approaches used for different file categories:
.zip
For .zip files, the agent creates a file named GV_metadata.json
within the classified zip file. The format is analogous to the example provided earlier.
.avi, .wav
D3v2 tags are utilized for .avi and .wav files. These tags are variable in size and strategically placed at the beginning of the file, facilitating immediate metadata loading, even during incremental file streaming. An ID3v2 tag comprises multiple optional frames, each encapsulating metadata up to 16 MB in size.
The written metadata can be accessed via the tag comment-pol
.
.mp4, .m4p, .m4v
These file types make use of the ILST tag format. Further details can be found on the Apple metadata page.
The written metadata can be accessed via the tag comment
.
.mov,
These files leverage XMP tags. XMP, or Extensible Metadata Platform, is an Adobe-created standard for embedding metadata within digital files. This approach ensures standardized, structured metadata embedding that's readily extendable, encompassing information such as creator, copyright, editing data, and more.
The written metadata can be accessed via the tag classification
.
.gif, .jpg, .jpeg, .png, .tiff, .tif
These files also leverage XMP tags to write metadata. The written metadata can be accessed via the tags description
and user_comment
.
.vsdx, .docx, .xlsx, .xlsm, .pptx
Microsoft Office allows for Custom Properties to be written to these documents, stored internally in XML format. The metadata will be written as separate key/values tag pairs instead of JSON.
.doc, .xls, .ppt
The older binary format also supports Custom Properties but without XML storage. The metadata will be written as separate key/values tag pairs instead of JSON.
Currently not supported on MacOS.
.dxf, .dwg
For CAD files, custom summary info is utilized to store metadata
Currently not supported on MacOS.
For PDF files, the metadata is written into the Document Information Dictionary
. The metadata will be written as separate key/values tag pairs instead of JSON.
Related content
Classified as Getvisibility - Partner/Customer Confidential