07/04: Data Platform Buzzwords: Introduction and So What?

Category: Buz Words
Posted by: bagheljas
Modern Enterprise Data Platforms are a conglomerate of Business Requirements, Architectures, Tools & Technologies, Frameworks, and Processes to provide Data Services.

Data Platforms are mission-critical to an enterprise, irrespective of size and industry sector. Data Platform is the foundation for Business Intelligence & Artificial Intelligence to deliver sustainable competitive advantage for business operations excellence and innovation.

  • Data Store is a repository for persistently storing and managing collections of structured & unstructured data, files, and emails. Data Warehouse, Data Lake, and Data Lakehouse with Data Navigator are the specialized Data Stores implementations to deliver Modern Enterprise Data Platforms. Data Deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementations of Data Deduplication optimize storage & network capacity needs of Data Stores is critical in managing the cost and performance of Modern Enterprise Data Platforms.

    • Data Warehouse enables the categorization and integration of enterprise structured data from heterogeneous sources for future use. The operational schemas are prebuilt for each relevant business requirement, typically following the ETL (Extract-Transform-Load) process in a Data Pipeline. Enterprise Data Warehouse Operational Schemas updates could be challenging and expensive; Serves best for Data Analytics & Business Intelligence but are limited to particular problem-solving.

    • Data Lake is an extensive centralized data store that hosts the collection of semi-structured & unstructured raw data. Data Lakes enable comprehensive analysis of big and small data from a single location. Data is extracted, loaded, and transformed (ELT) at the moment when it is necessary for analysis purposes. Data Lake makes historical data in its original form available to researchers for operations excellence and innovation anytime. Data Lakes integration for Business Intelligence & Data Analytics could be complex; best for Machine Learning and Artificial Intelligence tasks.

    • Data Lakehouse combines the best elements of Data Lakes & Data Warehouses. Data Lakehouse provides Data Storage architecture for organized, semi-structured, and unstructured data in a single location. Data Lakehouse delivers Data Storage services for Business Intelligence, Data Analytics, Machine Learning, and Artificial Intelligence tasks in a single platform.

    • Data Mart is a subset of a Data Warehouse usually focused on a particular line of business, department, or subject area. Data Marts make specific data available to a defined group of users, which allows those users to quickly access critical insights without learning and exposing the Enterprise Data Warehouse.

  • Data Mesh:: The Enterprise Data Platform hosts data in a centralized location using Data Stores such as Data Lake, Data Warehouse, and Data LakeHouse by a specialized enterprise data team. The monolithic Data Centralization approach slows down adoption and innovation. Data Mesh is a sociotechnical approach for building distributed data architecture leveraging Business Data Domain that provides Autonomy to the line of business. Data Mesh enables cloud-native architectures to deliver data services for business agility and innovation at cloud speed. Data Mesh is emerging as a mission-critical data architecture approach for enterprises in the era of Artificial Intelligence. The Data Mesh adoption enables the Enterprise Journey to Data Democratization.

  • Data Pipeline is a method that ingests raw data into Data Store such as Data Lake, Data Warehouse, or Data Lakehouse for analysis from various Enterprise Data Sources. There are two types of Data pipelines; first batch processing and second streaming data. The Data Pipeline Architecture core step consists of Data Ingestion, Data Transformation (sometimes optional), and Data Store.

  • Data Fabric is an architecture and set of data services that provide consistent capabilities that standardize data management practices and practicalities across the on-premises, cloud, and edge devices.

  • Data Ops is a specialized Dev Ops / Dev Sec Ops that demands collaboration among Dev Ops teams with data engineers & scientists for improving the communication, integration, and automation of data flows between data managers and data consumers across an enterprise. Data Ops is emerging as a mission-critical methodology for enterprises in the era of business agility.

  • Data Security needs no introduction. Enterprise Data Platforms must address the Availability and Recoverability of the Data Services for Authorized Users only. Data Security Architecture establishes and governs the Data Storage, Data Vault, Data Availability, Data Access, Data Masking, Data Archive, Data Recovery, and Data Transport policies that comply with Industry and Enterprise mandates.

  • Data Hub architecture enables data sharing by connecting producers with consumers at a central data repository with spokes that radiate to data producers and consumers. Data Hub promotes ease of discovery and integration to consume Data Services.



Disclaimer

The thoughts expressed in the blog are those of the author and do not represent necessarily the official policy or position of any other agency, organization, employer, or company. Assumptions made in the study are not reflective of the point of view(s) of any entity other than the author. Since we are critically thinking human beings, the point of view(s) is always subject to change, revision and rethinking at any time. While reasonable efforts have been made to obtain accurate information, the author makes no warranty, expressed or implied as to its accuracy.