Data fabrics are relatively new constructs. They have their origins largely in data management, but they also tie into storage, platform integration, and other areas. They provide a data architecture framework designed to make data management more agile in a complex, diverse, and distributed environment.
What Is Data Fabric?
To be a fabric, data “must have redundancy of pathways and not be dependent upon a single point to point connection, so if one connection is overloaded with data or otherwise unavailable, there are other pathways to the destination,” according to Datamation, which likens data fabrics to the design of the internet.
That said, data fabric is very much an emerging area of the IT landscape, so the architectures and approaches vary widely. Many data fabrics utilize components such as metadata discovery, metadata activation, data cataloging, usage of knowledge graph constructs for semantic modeling, and orchestration.
Here are several applications of data fabric:
- Source data from data warehouses, flat files, XML, or web applications, can be cataloged within the data catalog/metadata layer.
- Connected knowledge graphs with analytics can activate metadata within the knowledge graph and metadata analysis layer.
- AI and machine learning algorithms, enriched with activated metadata, can help to simplify and automate data integration design.
- Data can be dynamically integrated for data consumers and delivered to them in ways that satisfy various usage requirements.
- Data scientists, application developers, and BI developers gain clean and accurate data when they need it.
Top Data Fabric Software Solutions
ServerWatch evaluated many different vendors in the data fabric space. Here are our top picks in no particular order.
IBM Cloud Pak for Data simplifies and automates data collection, organization, and analysis of data, accelerating the infusion of AI throughout a business. It can connect data everywhere, run workloads anywhere, and build, deploy, and manage AI at scale in hybrid cloud environments.
- AI-based capabilities for data management and governance
- Orchestrates and optimizes data processing based on workloads, data locality, and data policies
- Metadata-based knowledge core facilitates the discovery of data sources and catalogs, enriches data assets, and performs analysis to extract insight
- The knowledge core also provides semantic search
- Enables data consumption by extracting, virtualizing, transforming, and streaming data
- Curates metadata, defines data policies for privacy, curates data, captures data lineage, and performs other tasks related to security and compliance
- Analytic models in different tools can talk to one another
TerminusDB is a cloud-based document-oriented knowledge graph data management platform. It combines the power of knowledge graphs with the simplicity of documents.
- Build, execute, monitor, and share versioned data products that can be domain-specific and connect to other data products
- Version control to enable data teams to collaborate on the same asset at the same time
- Catalog the knowledge graph with metadata across documents and sub-documents, as well as a full commit history
- Document-oriented knowledge graph can have multiple analytical endpoints and operational APIs to maximize the impact and efficiency of AI and ML
- Immutable data store with the ability to travel to any point of time within the data history
- Ability to build a knowledge graph of linked JSON documents to show data structure and relationships
- Take data from multiple sources, from data lake to flat files
Talend Data Fabric is a unified platform for data ingestion, integration, governance, and sharing that can simplify building a data hub. The Talend Trust Score furthers this focus by providing an at-a-glance assessment of your data’s health — its quality, relevance, and popularity.
- With 1,000+ built-in components and connectors, datasets can be captured, standardized, and cleaned
- Data cataloging techniques discover and automatically profile, document, and categorize the incoming data
- Self-service tools ingest data from almost any source, and integrated preparation functionality ensures that data is rapidly usable
- Data governance is applied to data ownership, certification, stewardship, and remediation
- Automated quality checks and browser-based, point-and-click tools for sharing and capturing
- Data Quality profiles clean and mask data in real time
- Machine learning powers recommendations for addressing data quality issues
- Data can also be enriched through analytics, leveraging tools like Spark, Python, Databricks, RapidMiner, and Qubole
- The Talend Trust Score provides visibility into the reliability of any dataset
NetApp offers the technologies and expertise to build a data fabric and shape a strategy around distinct requirements and goals. NetApp offers visibility into the full stack with visualizations and tools to remediate problems automatically.
- Integrations bring the endpoints of data fabric together to monitor and manage it all from a single control panel
- Automations reduce the number of manual tasks from dev/test to provisioning
- Tools continuously optimize the infrastructure and enable scaling
- Backup and recovery solutions simplify end-to-end data protection in the event of an outage, disaster, or ransomware attack
- AI is used to deliver security
- Efficient data transfers for cloud-based backup and recovery
- The possibility to readily move data to and from hyperscale clouds
- The ability to eliminate silos by enabling data to flow as applications are moved
- A unified Software Defined architecture, so data management remains consistent across cloud services
SAP’s enterprise data fabric solution consists of capabilities from the SAP Business Technology Platform, anchored by SAP Data Intelligence, and SAP HANA. SAP Data Intelligence enables the management of IoT data streams, creation of data warehouses, and operationalizes scalable machine learning. It enables business applications to provide a holistic, unified way to manage, integrate, and process all enterprise data and metadata. It delivers self-service capabilities for data preparation, active metadata management, and data quality.
- Perform column actions, derive new information, harmonize disparate sources, merge or combine multiple datasets
- A data catalog automatically identifies content, such as the data types or personal information, and tags information for these content types when the metadata is extracted
- Graph engines can be used to identify and integrate connected data
- Parse, standardize, and validate attributes such as name and address; perform geocoding; and identify duplicates and relationships between entities
- SAP HANA can virtualize access to data to federate queries to external data sources, such as other databases, Web services, files, Apache Hadoop, and Apache Spark, to perform queries without data movement
Informatica can scale any enterprise workload with elastic and serverless processing in order to gain fast insights when applying AI and ML to data and metadata. At the heart of it is an AI engine called CLAIRE that can learn the data landscape to automate thousands of manual tasks and augment human activity with recommendations and insights.
- Can run, interoperate, and support any combination of cloud and hybrid infrastructures
- Low-code/no-code experience that offers security, data quality, data governance, and privacy
- 200+ data services offered
- Optimize performance with AI/ML-driven operational monitoring and predictive insights
- End-to-end cloud data management and a microservices architecture
Komprise Intelligent Data Management creates a global metadata index call that can analyze storage resources in multiple data centers and cloud. It enables storage teams to collaborate with the owners of the data to determine optimal data management policies, to tier or archive infrequently accessed data, as well as migrating, replicating, or identifying obsolete data for deletion.
- Can be deployed as SaaS, on premises, or hybrid; the admin console and metadata index is maintained in the cloud and data movers are deployed adjacent to data
- Transparent Move Technology (TMT) enables Komprise to move data while maintaining transparent access for end users and applications without the need to install agents or proprietary stubs
- Elastic Grid, made of stateless data movers (Observers), can maintain file system access to data after moving to secondary storage targets, such as cloud storage
- Deep Analytics Actions eliminate the manual effort of finding custom data sets and moving them separately from different storage silos
Storage Made Easy Enterprise File Fabric provides multi-cloud governance, compliance, and cybersecurity. It includes storage connectors and ways to strengthen remote and hybrid storage environments. This includes SMBStream, which is filer agnostic and compatible with on-premises NAS or Windows File shares, as well as cloud-based Server Message Block (SMB) shares such as Nasuni, Amazon FSx or Azure Files.
- A File Fabric instance can securely connect to SMB shares without being colocated on the same network, enabling cross site and region connectivity
- Said to be 10x faster than a typical VPN and is tolerant to high latency and dropped packets
- Microsoft Teams integration delivers access to existing file shares and other storage services, such as Amazon S3 / FSx, and Azure Blob while directly embedded within Microsoft Teams
- AutoCAD Previewer enables the Web File Manager to preview AutoCAD files stored on remote file storage and object storage
- Data Automation Rules allow automatic transcoding of uploaded or discovered files, and invocation of user-supplied webhooks when qualifying content is detected
- Secure Link Sharing offers named (external) user link sharing, two-step link sharing verifications, and increased policy controls
- Single-sign-on authentication to SMB shared resources for end users
- Full content search of any of the 60+ on-premises and on-cloud storage solutions
What Are the Benefits of a Data Fabric Solution?
Data fabrics have many different benefits for businesses:
- They facilitate better utilization of data in the enterprise by providing instant access.
- Data fabrics are agnostic. They weave together data across many platforms, architectures, processes, uses, and geographies.
- They improve decision making by providing business-ready data for applications, analytics, and various business processes.
- They enable IT to simplify data management and governance despite it existing in complex environments, such as hybrid or multi-cloud landscapes.
- They offer a single view of organizational data, taking data from many sources and cataloging it.
- Because data fabric has AI and machine learning woven into the layers to determine data needs, query times are improved.
Read next: Best Data Warehouse Tools & Solutions