Large Scale Training Engineer
Lightricks
Who we are
Lightricks, an AI-first company, is revolutionizing how visual content is created. With a mission to bridge the gap between imagination and creation, Lightricks is dedicated to bringing cutting-edge technology to the creative and business spaces.
Our AI photo and video generation models, which power our apps and platforms including Facetune, Photoleap, Videoleap, and LTX Studio, allow creators and brands to leverage the latest research breakthroughs, offering endless control over their creative potential. Our influencer marketing platform, Popular Pays, provides creators the ability to monetize their work and offers brands opportunities to scale their content through tailored creator partnerships.
The Core Generative AI team at Lightricks Research is a unified group of researchers and engineers dedicated to developing our generative foundational models that serve LTX Studio, our AI-based video creation platform. Our focus is on creating a controllable, cutting-edge video generative model by merging cutting-edge algorithms with exceptional engineering. This involves enhancing machine learning components within our sophisticated internal training framework, crucial for developing advanced models. We specialize in both research and engineering that enable efficient and scalable training and inference, allowing us to deliver state-of-the-art AI-generated video models.
About the Role
As a Large Scale Training Engineer, you will play a key role in enhancing the training throughput of our internal framework and enabling researchers to pioneer new model concepts. This role demands excellent engineering skills for designing, implementing, and optimizing cutting-edge AI models, alongside writing robust machine learning code and understanding supercomputer performance deeply. Your expertise in performance optimization, understanding distributed systems, and bug elimination will be crucial, as our framework supports extensive computations across numerous virtual machines.
This role is designed for individuals who are not only technically proficient but also deeply passionate about pushing the boundaries of AI and machine learning through innovative engineering and collaborative research.
Key Responsibilities
- Profile and optimize the training process to ensure efficiency and effectiveness, including optimizing multimodal data pipelines and data storage methods.
- Develop high-performance TPU/CPU kernels and integrate advanced techniques into our training framework to maximize hardware efficiency.
- Utilize knowledge of hardware features to make aggressive optimizations and advise on hardware/software co-designs.
- Collaboratively develop model architectures with researchers that facilitate efficient training and inference.
Your skills and experience
- Industry experience with small to large-scale ML experiments and multi-modal ML pipelines.
- Strong software engineering skills, proficient in Python, and experienced with modern C++.
- Deep understanding of GPU, CPU, TPU, or other AI accelerator architectures.
- Enjoy diving deep into system implementations to enhance performance and maintainability.
- Passion for driving ML accuracy with low-precision formats and optimizing compute kernels.
- Background in JAX/Pallas, Triton, CUDA, OpenCL, or similar technologies is a plus.