![]() ![]() ![]() Automated discovery of datasets, both for initial catalog build and ongoing discovery of new datasets is essential. It is impractical to attempt cataloging as a manual effort. I’ll dive deeper into catalog metadata in an upcoming blog.Ī modern data catalog includes many features and functions that all depend on the core capability of cataloging data-collecting the metadata that identifies and describes the inventory of shareable data. Supplier metadata is especially important for data acquired from external sources, informing about sources and subscription or licensing constraints. Processing metadata describes transformations and derivations that are applied as data is managed through its lifecycle. Search metadata supports tagging and keywords to help people find data. People metadata describes those who work with data-consumers, curators, stewards, subject matter experts, etc. They may reside in a data lake, warehouse, master data repository, or any other shared data resource. Figure 1 illustrates the typical metadata subjects contained in a data catalog.ĭatasets are the files and tables that data workers need to find and access. A data catalog focuses first on datasets (the inventory of available data) and connects those datasets with rich information to inform people who work with data. The metadata that we need today is more expansive than metadata in the BI era. This brief definition makes several points about data catalogs-data management, searching, data inventory, and data evaluation-but all depend on the central capability to provide a collection of metadata.ĭata catalogs have become the standard for metadata management in the age of big data and self-service business intelligence. What is a Data Catalog?Ī Data Catalog is a collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need, serves as an inventory of available data, and provides information to evaluate fitness of data for intended uses. By contrast, organizations without a data catalog often have these questions: What is a data catalog? Why do we need a data catalog? What does a data catalog do? These are all good questions and a logical place to start your data cataloging journey. Organizations with successful data catalog implementations see remarkable changes in the speed and quality of data analysis, and in the engagement and enthusiasm of people who need to perform data analysis. Data catalogs have quickly become a core component of modern data management. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |