Data Mining Through Metadata Analysis
2026-05-13 09:36
“More than 80 percent of all data in companies exists in unstructured form, and most companies currently have no way to make sustainable use of that data, its content, and, above all, its value. Without a detailed metadata analysis, the data becomes worthless after only a short time because its content can no longer be traced. With the Metadata Hub, the potential of large volumes of data can be tapped quickly and easily,” explains Herbert Grau, Managing Director of GRAU DATA GmbH.
The Metadata Hub detects, analyzes, and processes “embedded” metadata from unstructured data on file systems of any size, can process over 320 file formats, and read more than 50,000 different metadata tags in a matter of seconds. “Embedded” metadata contains much more comprehensive information than standard file system metadata. The Metadata Hub is far more powerful than solutions that are typically limited to specific file formats and do not allow for cross-company and cross-departmental analysis of all file formats.
Universally applicable and infinitely scalable
The Metadata Hub is platform-independent and can be quickly and easily integrated into virtually any IT infrastructure. It is controlled via a browser-based web interface. The Metadata Hub is fully scalable by installing multiple hubs in parallel and managing them via the central WebUI. This allows the Metadata Hub to be used in companies of any size and with any volume of files - from traditional mid-sized companies to corporations or large research organizations with billions of files.
The core component of the Metadata Hub is the intelligent file system crawler & harvester (metadata collector). This continuously extracts the embedded metadata from the files. The crawler & harvester retrieves all “embedded” metadata via NFS or SMB and extracts millions of tags in a very short time. The tags are stored in a specially designed database immediately after extraction. The metadata is then available in a structured format, for example for analysis or queries. A GraphQL-based API, a native Python SDK, and a comprehensive command-line interface also enable seamless integration with third-party solutions for automated big data processing.