Advanced storage architecture to power AI in data centres.

Executive summary.

The rise of artificial intelligence (AI) has driven unprecedented demand for scalable, high-performance, and cost-effective data centre storage solutions. This white paper presents a comprehensive solution combining Supermicro hardware, Seagate Exos hard drives enabled by Seagate’s HAMR-based Mozaic 3+™ technology, and OSNexus QuantaStor software. This joint solution addresses the explosive growth in AI-driven data storage needs, providing a robust architecture that supports both scale-up and scale-out configurations. Key benefits include enhanced scalability to accommodate growing AI workloads, exceptional performance with high throughput and low latency, optimised cost efficiency through reduced physical drives and power savings, a unified management platform that simplifies operations, advanced security features for compliance, and reduced environmental impact through energy-efficient storage solutions.

Introduction.

The rapid evolution of AI and machine learning (ML) technologies has fundamentally transformed the data storage landscape. Advances in computational power, democratized access for developers, and faster development tools have led to an explosion of AI-driven innovation. As AI models become more advanced, the need for scalable, high-performance storage solutions has never been greater. Data is the backbone of AI, and the ability to store, manage, and access vast amounts of data efficiently is crucial for training AI models and deploying AI applications. Traditional storage solutions often fall short of meeting these demands, necessitating the development of new architectures tailored to the needs of AI workloads.

Evolving AI workloads demand evolving storage solutions.

AI workloads present unique challenges that traditional storage solutions struggle to meet. AI models require vast amounts of data for training, often reaching petabyte scale. This data must be readily accessible, as the efficiency of the training process heavily depends on fast data retrieval. Furthermore, AI applications often involve large-scale data processing tasks which demand high throughput and low latency to deliver real-time insights.

The computational intensity of AI workloads also generates significant amounts of metadata, which must be managed efficiently to prevent bottlenecks. Traditional storage solutions, with their limited scalability and performance, are ill-suited for these demands. They often lack the flexibility to handle dynamic workloads, leading to inefficiencies and increased operational costs.

AI-driven innovation necessitates storage solutions that can scale rapidly, handle large volumes of unstructured data, and provide seamless access to this data. For instance, training a complex AI model involves iterative processing of vast data sets to refine algorithms and improve accuracy. The sheer volume of data required for these iterations can overwhelm traditional storage systems, causing delays and reducing the overall efficiency of AI operations.

Moreover, AI applications are increasingly deployed in real-time environments where immediate data processing is critical. This includes applications such as autonomous vehicles, predictive maintenance and personalised healthcare. These use cases require storage solutions that not only offer high capacity but also deliver exceptional performance to support instantaneous data analysis and decision-making.

Supporting scale-up and scale-out configurations.

The joint solution from Supermicro, Seagate, and OSNexus combines cutting-edge hardware and software to deliver a robust, scalable, and cost-effective storage infrastructure for AI workloads. The core components of this solution include Supermicro servers and JBODs, Seagate Mozaic 3+ hard drives, Seagate Nytro NVMe SSDs and OSNexus QuantaStor software.

The architecture of the joint solution supports both scale-up and scale-out configurations, catering to diverse deployment needs.

Scaling up (or vertical scaling) involves increasing the capacity of a single storage system or server by adding more resources, such as CPUs, memory and/or storage drives. This approach maximises the performance of individual units but has inherent limitations in scalability.

Scaling out (or horizontal scaling), on the other hand, involves adding more storage nodes or servers to a system, distributing the workload across multiple units. This approach allows for virtually unlimited scalability, enabling systems to handle larger, more complex AI workloads by expanding the architecture seamlessly as demand grows.

Scale-up configurations are ideal for smaller, cost-sensitive applications, offering up to 5-10 GB/s throughput. In contrast, scale-out configurations are designed for larger deployments, with performance scaling linearly as additional nodes are incorporated. This scalability allows the solution to achieve hundreds of gigabytes per second in throughput, meeting the demands of intensive AI workloads.

The seamless integration of Supermicro servers, Seagate drives, and QuantaStor software forms a cohesive and efficient storage solution. This architecture supports both file and object storage, providing organisations with the flexibility to choose the most suitable configuration for their specific needs. The unified management provided by QuantaStor ensures that all components work harmoniously, delivering optimal performance and reliability. The ability to manage both scale-up and scale-out configurations within a single platform simplifies operations and reduces the complexity associated with maintaining multiple storage systems.

Architecture overview.

The architecture comprises Supermicro servers, Seagate Exos Mozaic 3+ hard drives, and Seagate Nytro NVMe SSDs, all orchestrated by OSNexus QuantaStor software. This combination meets the intense demands of AI/ML workloads, which require high throughput, low latency, and the ability to handle massive datasets efficiently.

Deployment infrastructure considerations.

Details about networking and the minimum infrastructure required for success are beyond the scope of this paper, but are critical to architectural decision making.
Key criteria:
- Network speed (determines optimal media & node size)
- Rack specifications (rack depth & U-space)
- Power and cooling budget

Scale-up and scale-out architectures.

Scale-up architecture
- This architecture is ideal for environments that require cost-effective, high-density storage. It utilises dual-port NVMe drives in Supermicro’s 24-bay chassis, which provides high availability and performance by enabling shared access to the underlying drives. The architecture supports expansion through JBODs, allowing up to four JBODs to be connected to the scale-up controllers, thereby supporting configurations with up to seven petabytes of storage with Mozaic 3+ enterprise-class hard drives.
- In scale-up configurations, QuantaStor utilises OpenZFS — the high-performance, enterprise-level file system known for its advanced data protection, scalability, and efficiency, particularly in large-scale storage environments — allowing for efficient data integrity checks and storage optimisation. The architecture is particularly well-suited for smaller-scale AI/ML workloads and environments where minimising cost and maximising density are priorities.

Enlarge

Scale-out architecture
- Scale-out architecture is designed to provide linear performance scalability by adding more nodes. It uses erasure coding and replica techniques across nodes to ensure high availability and data redundancy. The architecture is especially suitable for large-scale AI/ML workloads where performance and capacity needs are continuously growing. For example, training large language models (LLMs), such as GPT (generative pre-trained transformer) or BERT (bidirectional encoder representations from transformers), requires immense computational power and data storage, making scale-out architecture essential for managing the increasing complexity and volume of data. Additionally, AI-driven genomic research, where large-scale processing of genomic data is required for tasks such as variant analysis and gene expression studies, also benefits significantly from the scalability and high availability that scale-out architecture provides.
- This architecture can combine hybrid nodes (mixing NVMe and hard drives) with all-flash nodes, providing flexibility in configuring clusters based on specific performance and capacity requirements. In scale-out configurations, QuantaStor is utilising its integration with Ceph technology, which excels in providing distributed storage across a large number of nodes.

Key considerations and design options.

Depending on the specific performance requirements and data capacity needs of AI/ML workloads, different configurations may be necessary to achieve optimal results. Factors such as the volume of data being processed and the speed at which data needs to be accessed will dictate whether a hybrid or all-flash configuration is the best fit for the scenario. Additionally, budget considerations and scalability requirements will influence the design choices for the architecture.

Hybrid configurations.
- In hybrid configurations, a combination of NVMe SSDs and high-capacity hard drives is used to balance performance and cost. The architecture supports up to 60 or 90 drives in JBODs, making it suitable for AI/ML workloads that require both high performance and large capacity in the PB range such as medical and physics research.
- A typical scale-up hybrid pool might use three NVMe drives per pool for metadata and small file offload, combined with large-capacity hard drives for storing larger data sets. Scale-out hybrid configurations would have three or more NVMe drives per node.
All-flash configurations
- All-flash configurations are recommended for AI/ML workloads that require extremely high performance, such as real-time analytics or intensive data processing tasks.
- These scale-out configurations can deliver up to 1 TB/s throughput by leveraging hundreds of NVMe drives in scale-out clusters.
Considerations for capacity and performance
- It’s essential to balance storage capacity with performance requirements. For instance, in a scale-out hybrid cluster with a mix of flash and hard drives, about 3% of the total storage might be flash to optimise performance, whereas in a scale-up hybrid cluster the flash storage could be around 1% of the total. With hard drives offering a clear advantage in cost-per-terabyte and TCO — enterprise SSDs carry a 6-to-1 price premium — hard drives remain the preferred choice for mass capacity in data centres.
- The architecture allows for starting with smaller clusters and expanding them as needed by adding more nodes or JBODs, ensuring that the storage infrastructure can grow alongside the AI/ML workloads.

Management and optimisation.

Effective management and optimisation are critical for ensuring that AI/ML workloads perform at their best within the storage architecture. QuantaStor's advanced management features streamline operations, providing comprehensive control and oversight across diverse configurations.

QuantaStor unified management
- QuantaStor provides a unified control plane that simplifies the management of both scale-up and scale-out architectures. It supports advanced features like auto-tiering, end-to-end encryption, and compliance with industry standards, ensuring that the storage infrastructure is secure and optimised for AI/ML workloads.
- The software’s grid technology allows for the seamless scaling of storage across multiple sites, eliminating the complexity of managing disparate systems.

Use cases and scenarios.

Different AI/ML workloads require tailored storage solutions to achieve optimal performance and cost-efficiency. Depending on the scale and complexity of the workload, scale-up, scale-out, or mixed configurations can be deployed to meet the specific demands of various industries and applications.

Scale-up use cases
- Scale-up configurations are ideal for environments with smaller AI/ML workloads or those that prioritise cost efficiency. They are well-suited for applications like media and entertainment storage, server virtualization and data archiving.
Scale-out use cases
- Scale-out configurations are designed for high-performance computing, data lakes, and AI/ML environments where the ability to scale both performance and capacity is critical. These configurations are also ideal for large-scale object storage and real-time analytics.
Mixed use cases
- Organisations can deploy both scale-up and scale-out configurations within the same environment, using QuantaStor’s unified management to maintain consistency and optimise performance across different workloads.

Enlarge

Advances in technology.

The technological advancements embodied in this solution are critical to its effectiveness. The Seagate Exos Mozaic 3+ hard drives represent a significant leap forward in storage technology. By utilising HAMR technology, these drives achieve unprecedented areal density, allowing for greater storage capacity within the same physical footprint. This advance not only addresses the need for large-scale data storage but also improves energy efficiency as fewer drives are required to store the same amount of data.

The TCO advantages of Mozaic 3+ hard drives are considerable, including 3× the storage capacity in the same data centre footprint for 25% less cost per TB, 60% lower power consumption per TB, and a 70% reduced embodied carbon per TB (compared to 10 TB PMR drives, a common drive capacity needing upgrade at data centres today). The drives’ lower power consumption translates to reduced energy costs, while the higher density reduces the need for physical space, leading to savings in data centre infrastructure. Additionally, the drives' lower embodied carbon makes them a more environmentally friendly option, aligning with sustainability goals that are increasingly important for modern enterprises.

The integration of Seagate Nytro NVMe SSDs adds another layer of enhanced performance. These high-speed drives are essential for managing the intensive read and write operations typical of AI workloads. Their low latency ensures that data can be accessed and processed in real time, which is crucial for training AI models and deploying AI applications. The dual-ported design of the SSDs enhances reliability, as it allows for continuous operation even if one port fails.

OSNexus QuantaStor software further enhances the solution by providing intelligent data management and advanced security features. The software's auto-tiering capabilities ensure that data is stored in the most appropriate tier, optimising both performance and cost. The end-to-end encryption and compliance with industry standards help protect data by addressing the security and privacy concerns that are paramount in AI applications, particularly in industries like healthcare and finance where sensitive data is frequently handled.

Enlarge

Benefits of the solution.

The joint solution from Supermicro, Seagate, and OSNexus offers several key benefits that address the specific needs of AI/ML workloads. These benefits include:

Scalability: The solution's ability to scale both up and out ensures that it can grow alongside the increasing demands of AI workloads. Whether an organisation is dealing with a few terabytes or several petabytes of data, the solution can accommodate their needs without requiring a complete overhaul of storage infrastructure.
Performance: The use of Seagate Nytro NVMe SSDs and Mozaic 3+ hard drives, combined with QuantaStor’s management capabilities, delivers exceptional performance. This is particularly important for AI/ML workloads that require high throughput and low latency to function effectively.
Cost efficiency: The solution’s architecture is designed to optimise both capital and operational expenditures. By reducing the number of physical drives needed, lowering power consumption, and offering a flexible, unified management platform, the solution significantly lowers the total cost of ownership (TCO).
Unified management: QuantaStor’s ability to manage both scale-up and scale-out architectures from a single interface simplifies operations and reduces the complexity associated with multi-vendor storage solutions. This unified approach not only saves time but also reduces the potential for errors and increases overall efficiency.
Security and compliance: The solution includes advanced security features that protect data from unauthorised access and ensure compliance with industry standards. This is particularly important for AI applications in regulated industries, where data breaches can result in significant legal and financial penalties.
Environmental impact: The use of Seagate drives built on the Mozaic 3+ platform reduces the environmental impact of data centres by lowering power consumption and reducing the physical space required for storage. This aligns with the growing emphasis on sustainability in the technology sector.

Use cases and applications.

The solution is versatile enough to support a wide range of use cases across various industries. Some examples include:

Healthcare: AI/ML workloads in healthcare, such as predictive analytics and personalised medicine, require the ability to process vast amounts of data quickly and securely. The joint solution offers the scalability, performance, and security needed to support these applications.
Finance: In finance, AI is used for tasks such as fraud detection, algorithmic trading, and risk management. These applications require high-speed data processing and real-time analytics, both of which are supported by the solution’s high-performance storage architecture.
Media and entertainment: The media and entertainment industry generates massive amounts of data, particularly with the increasing use of high-resolution video. The solution’s ability to handle large-scale data storage and provide fast access to files makes it ideal for tasks such as video editing, rendering and archiving.
Manufacturing: AI/ML is used in manufacturing for predictive maintenance, quality control and supply chain optimisation. These applications generate large volumes of data that need to be stored and analysed efficiently. The joint solution provides the scalability and performance needed to support these use cases.
Research and development: AI-driven research in fields like pharmaceuticals, genomics, materials science, and climate modelling requires the ability to store and process large data sets. The solution’s high throughput and low latency make it well-suited for these demanding applications.

Conclusion.

The joint AI solution developed by Supermicro, Seagate, and OSNexus offers a comprehensive, scalable, and cost-effective storage architecture tailored to the unique demands of AI/ML workloads. By combining advanced hardware and software technologies, the solution delivers exceptional performance, reliability, and efficiency, making it an ideal choice for organisations looking to leverage AI to gain a competitive edge. Whether deployed in healthcare, finance, media, manufacturing, or research, this solution provides the robust infrastructure needed to support the next generation of AI applications and pave the way for the future of AI-driven innovation across industries.

Solution table.

Topology	Product	Resiliency Model	Raw Capacity	Usable Capacity	Detailed Specification
Scale-up	SBB hybrid;	Triple parity	2,039 TB raw	1.512 TB usable	link
Scale-up	SBB All-flash	Double-parity(4d+2p)	737 TB raw	553 TB usable	link
Scale-out	Hyper All-flash	EC2k+2m/REP3	1.106 TB raw	533 TB usable	link
Scale-out	4U/36	EC4K+2m/REP3	3974 TB raw	2513 TB usable	link
Scale-out	4U/36	EC8K+3m/REP3	8342 TB raw	5786 TB usable	link
Scale-out	Dual-node top loading	EC8K+3m/REP3	11981 TB raw	8406 TB usable	link

Acronyms And Additional Information.
SBB: Storage Bridge Bay.
EC: Erasure Coding.
“Double-parity” and “triple-parity” refer to the number of parity blocks used to provide data redundancy and fault tolerance.
Numerical strings relate to the resiliency model.

Products

Knowledge Base

Support Downloads

Articles

suggested searches

Read the Article

Read the Article

Read the Article

Advanced storage architecture to power AI in data centres.

Executive summary.

Introduction.

Evolving AI workloads demand evolving storage solutions.

Supporting scale-up and scale-out configurations.

Architecture overview.

Scale-up and scale-out architectures.

Key considerations and design options.

Management and optimisation.

Use cases and scenarios.

Advances in technology.

Benefits of the solution.

Use cases and applications.

Conclusion.

Solution table.