Commvault Unveils Clumio Backtrack - Near Instant Dataset Recovery in S3

// 27 Mar 2024

Clumio and Amazon S3: Resilience for the age of data lakes, AI, and internet-scale applications

Poojan Kumar, CEO and Co-founder
Ari Paul
Ari Paul, Director, Product Marketing
ShareTwitterfacebookLinkedin

As Amazon S3 celebrates its 18th birthday, it’s clear that it has surpassed the realm of storage services; it is now the cornerstone of modern data architectures. It has played a crucial role in the transition from monolithic applications to decoupled, economical storage, profoundly impacting the way software is written today. For the past four years of this journey, we at Clumio have had the privilege to partner and innovate with the Amazon S3 team, bringing resilience, compliance, and governance to hundreds of its customers.

Unbundling storage, accelerating innovation

Amazon S3 has catalyzed the disaggregation of storage from compute, leading to a colossal shift in application architecture. Customers are now able to seamlessly aggregate petabytes of data from diverse sources without being constrained by attached compute. Massive datasets can now be stored and accessed economically, while compute-intensive tasks like training models and running analytics can be scaled separately to meet the demanding requirements of internet-scale applications, data lakes, and AI. Clumio, itself architected as a stateless data processing pipeline that persists most of its data immutably in Amazon S3, helps customers add resilience and protection to these workloads.

Pi Day 2024: Insights into Amazon S3’s critical role in generative AI

The Pi Day event, hosted by AWS annually on 3/14, this year underscored the acceleration of data lakes and their crucial role in generative AI. With over a million data lakes running on S3, solutions like Iceberg and Delta are becoming household names in the AWS ecosystem. To cater to this rising demand, Amazon S3 has also made significant strides in performance optimization through innovations like S3 Express One Zone, crucial for handling the high throughput of billions of objects that generative AI workloads demand. Furthermore, the seamless integrations with Amazon Bedrock and other AI services such as SageMaker and Vector Store exemplifies how Amazon S3 is facilitating seamless AI operations.

Amazon S3 Express One Zone makes it possible to use existing data to customize and personalize foundation models, rather than customers having to make several copies of source data. Source

Clumio: Pioneering resilience and compliance for generative AI

The GenAI workflow has distinct storage stages, and Clumio provides resilience for AWS customers across all these stages.

  1. data ingestion and prep — data lakes and object stores
  2. training and fine-tuning — high performance filesystems
  3. inference – block / edge
  4. and archiving – object store archives

What makes Clumio different is that it’s not built on traditional filesystem-based architectures from the VM-based world. Traditional backup solutions start to exhibit scalability issues at any meaningful size, say, a few petabytes of LLM training data. Clumio helps consolidate, optimize, and streamline disparate data copies and resilience mechanisms into one serverless, scalable platform. For example, in LexisNexis’ data lake, no solution existed for them to protect and restore their billion-object environment within their target SLAs before Clumio.

Like the modern applications it backs up, Clumio is architected as a stateless data processing pipeline, persisting data immutably and at scale in Amazon S3

 

In addition, with the unstoppable force of generative AI meeting the immovable focus on regulation, there is enormous flux in the domain of AI compliance. This is playing out in front of our eyes—Clumio has some famous genomics customers that are using AI for drug discovery, and they want to do it responsibly because new regulations will require backward traceability for years, perhaps decades. They use Clumio to identify, classify, intelligently backup, and encrypt in an immutable fashion those datasets that they foresee being subject to regulation.

We’re beginning to see evidence that regulation for AI will emanate from regulation of the source datasets. When it comes to verticalized AI solutions, it will still be SOC2 and ISO27001 compliance for most enterprises, HIPAA for life sciences, medical and healthcare companies, COPAA for edtech and online gaming outfits, various FINRA and SEC regulations for financial services providers, and some very specific regulations created by the regulatory bodies manufacturing, automotive, and heavy industries.

Looking ahead: Data resilience in the age of AI

At Clumio, we recognize that Amazon S3 is among the defining technological innovations of the 21st century. And its influence will only accelerate in the age of AI, as it becomes the de facto storage substrate for AI development in the cloud. And for all customers of Amazon S3, Clumio is committed to delivering unwavering resilience, automated and at scale, for AI development now and in the future. Not just as a backup provider, but as a partner in innovation.

Next up in our Resilience Rising series

Join Clumio in a chat with AI and IA pioneer, Pascal Bornet, as we discuss:

  • Industries that are adopting intelligent automation the fastest
  • Emerging regulations to govern data powering AI applications
  • Foundations that organizations must set for their data to fuel business growth
  • Critical success factors for adopting AI at scale

Register here

Webinar with Pascal Bornet

About the author

Poojan brings over 20 years of experience in cloud computing and storage and is known for seeing an opportunity for change, innovating and capitalizing on it. Poojan founded and built PernixData that was acquired by Nutanix in 2016, he then served as Vice President of Engineering and Products. Earlier in his career, he was Head of Data Products at VMware and founder at Oracle Exadata.
Ari Paul

About the author

Ari leads product marketing at Clumio. Prior to Clumio, Ari held several product marketing leadership positions at cloud and data companies such as VMware, Cohesity, and Databricks, helping product lines scale 10X. Ari attended Stanford University and R.V College of Engineering for his masters and bachelors degrees in engineering, respectively.