Remote

Director of Engineering - Infinia Core

DataDirect Networks

United States

May 27, 2025

Director of Engineering - Infinia Core

Job Locations

US-Remote

Job ID

2025-5258

Name Linked

Remote: US

Country

United States

City

Remote

Worker Type

Regular Full-Time Employee

Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." - IDC

"The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments" - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

Job Description

We are looking for an experienced and technically driven Director of Engineering to lead the Infinia Core Engineering organization which is the the foundational team powering DDN's flagship AI-native distributed data platform. In this role, you will oversee engineering teams responsible for the core systems that enable Infinia's performance, scalability, and reliability at global scale. This includes mission-critical components such as task scheduling, distributed tracing, memory management, SPDK data access, profiling, networking, reliability, distributed locking, internal key-value stores, and filesystem clients - all orchestrated within a multi-tenant, high-throughput environment. You will define the strategy, scale execution, and mentor engineering leaders to deliver production-grade systems that meet the demands of AI/ML, high-performance computing, and enterprise analytics.

This is a hands-on technical leadership role at the heart of Infinia's distributed architecture - where decisions today shape how data moves tomorrow.

Key Responsibilities Core Systems Leadership

Lead and scale multiple engineering teams focused on critical path components of the Infinia platform:

Task scheduling and orchestration
Tracing and observability infrastructure
Memory management and performance tuning
SPDK-based I/O data path
Reliability and fault-tolerance systems
Networking stack optimization and event-driven IO
TDS (Tenant Data Services) and multi-tenant isolation
DLM (Distributed Lock Manager) and concurrency control
Internal KVStore for system metadata and state
FS client for scalable POSIX-like access

Technical Strategy & Execution

Own the end-to-end architecture, roadmap, and execution for all core components.
Guide technical design reviews, enforce performance standards, and align cross-team priorities to platform milestones.
Collaborate with architecture and infrastructure teams to evolve platform interfaces, service contracts, and internal APIs.

Organizational Growth & Team Development

Hire, mentor, and develop engineering managers and senior ICs to build a culture of accountability, innovation, and technical rigor.
Drive a results-oriented mindset focused on high-velocity, high-reliability software delivery.
Set clear goals and foster professional growth through coaching, feedback, and performance management.

Cross-Functional Collaboration

Partner with product management, field engineering, and customer teams to shape feature priorities and ensure core platform needs are anticipated early.
Interface with support and site reliability teams to define SLAs, improve telemetry, and reduce MTTR for platform incidents.
Contribute to platform-wide initiatives in multi-tenancy, fault isolation, observability, and performance benchmarking.

Platform Reliability & Performance

Champion operational excellence across core services - including incident response, regression testing, and release stability.
Optimize memory usage, lock contention, thread scheduling, and task pipelines to deliver microsecond-level performance where required.
Establish strong internal metrics and observability standards to measure system health, responsiveness, and uptime.

Required Qualifications

12+ years of engineering experience in distributed systems, operating systems, or storage platform engineering.
5+ years of experience leading multi-team organizations delivering core systems software in production environments.
Strong expertise in systems programming (C, C++, Rust) and deep knowledge of concurrency, memory models, and network programming.
Proven track record designing and scaling services related to task scheduling, locking, memory, and I/O performance.
Experience managing components at the intersection of infrastructure and application performance, especially in multi-tenant platforms.
Excellent communication, roadmap planning, and cross-functional leadership skills.

Preferred Qualifications

Experience with SPDK, RDMA, DPDK, or high-performance storage stacks.
Knowledge of distributed coordination protocols, key-value stores, or scalable metadata architectures.
Background in AI/ML, HPC, or cloud-native infrastructure (Kubernetes, microservices, etc.).
Familiarity with observability tools (e.g., tracing frameworks, profilers, Prometheus, OpenTelemetry).

Success Metrics - First 30 Days

Strategic Alignment

Ramp up on all core components, existing technical challenges, and roadmap priorities.
Meet with team leads and cross-functional partners to assess execution readiness and architectural cohesion.

Early Impact

Identify 2-3 areas for performance optimization, team structure refinement, or architectural alignment.
Deliver a 90-day strategy plan outlining key initiatives across reliability, latency, and scalability.

Team Integration

Build trust and alignment with engineering managers and ICs.
Assess hiring needs and begin shaping the next phase of team growth.

Success Metrics - Beyond 30 Days

Timely, high-quality delivery of core platform milestones aligned to product roadmap.
Improvements in performance, fault-tolerance, and memory/network efficiency across key subsystems.
Clear reduction in escalations, latency spikes, and cross-component coordination complexity.
Team health, engagement, and velocity aligned with long-term technical and business goals.

Join us to lead the engineering teams responsible for the very heartbeat of a world-class, AI-native data platform - where every task, trace, and lock matters at scale. Apply now to shape the foundation of tomorrow's data intelligence with DDN Infinia.

DDN

Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

Coding assessment: Often in a language of your choice.
Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
Meet and greet with the wider team.
Our goal is to finish the main process in 2-3 weeks at most.

DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote