HubLensTopicsDeep Learning
// topic

Deep Learning

33trending in last 90 days·33all-time

// new this month

// ecosystem

LLM11PaddlePaddle9Computer Vision7Inference6Machine Learning6Deep Learning
AI 33

// recent newcomers

see all newcomers →

// this week's top 10

01
PaddlePaddle / Paddle
PaddlePaddle is a comprehensive industrial deep learning platform that provides core frameworks, model libraries, and end-to-end development tools. It supports advanced features like unified dynamic and static graphs, automatic parallelism, and high-order differentiation for scientific computing. The platform is designed to facilitate large-scale model training and inference across diverse industrial sectors.
8523,870
02
deepseek-ai / DeepGEMM
DeepGEMM is a unified CUDA library providing high-performance tensor core kernels specifically optimized for modern large language models. It features a lightweight Just-In-Time compilation module that eliminates the need for CUDA compilation during installation. The library delivers expert-tuned performance for various matrix operations, including FP8, FP4, and BF16 GEMMs, as well as fused MoE and MQA scoring.
797,104
03
PaddlePaddle / PaddleFormers
PaddleFormers is a Transformers library built on the Baidu PaddlePaddle framework, designed to provide training interfaces and functional experiences for Large Language Models and Vision-Language Models equivalent to Hugging Face. By integrating tensor parallelism, pipeline parallelism, and automatic mixed precision, the project achieves training performance that surpasses Megatron-LM on mainstream models. Furthermore, it fully supports domestic computing chips and is compatible with the Safetensors format, helping developers efficiently complete the entire process from pre-training to post-training.
7812,991
04
alibaba / ROLL
ROLL is an efficient, user-friendly library designed for scaling reinforcement learning workflows for large language models across large-scale GPU clusters. It supports diverse training paradigms including RLVR, agentic interaction, and distillation, while integrating advanced backends like Megatron-Core, vLLM, and SGLang. The framework provides robust observability and flexible resource management to enhance performance in complex reasoning and human preference alignment tasks.
753,120
05
alibaba / rtp-llm
RTP-LLM is a high-performance LLM inference acceleration engine developed by the Alibaba Foundation Model Inference team. This engine has been widely applied in various Alibaba business scenarios such as Taobao and Tmall, supporting multiple mainstream model formats and hardware backends. It provides efficient production-level services for large language models by integrating advanced operator optimization, quantization techniques, and distributed inference capabilities.
701,107
06
bilibili / Index-anisora
Index-AniSora is a powerful open-source framework designed specifically for high-quality anime video generation and animation production. The system features a comprehensive data processing pipeline, a controllable generation model with spatiotemporal masking, and a specialized evaluation benchmark. It supports diverse creative tasks including character 3D generation, video style transfer, and multimodal guidance for precise motion control.
682,421
07
alibaba / TorchEasyRec
TorchEasyRec is a PyTorch-based framework designed for developing production-ready deep learning recommendation models. It supports a wide range of tasks including candidate generation, ranking, multi-task learning, and generative recommendation. The framework offers high scalability, flexible data source integration, and seamless deployment options for real-world production environments.
60377
08
PaddlePaddle / PaddleCustomDevice
PaddleCustomDevice is the custom hardware integration solution provided by the PaddlePaddle framework. Through standardized interface design, this project enables developers to integrate various third-party hardware backends into the PaddlePaddle ecosystem. It currently covers support for mainstream hardware platforms including Ascend, Cambricon, Intel GPU, and Apple MPS.
54104
09
bytedance / jaqmc
JaQMC is a modular, JAX-based framework designed for performing neural network quantum Monte Carlo simulations. It utilizes deep neural networks as variational wavefunctions to solve the electronic Schrödinger equation without relying on traditional basis sets. The project supports various quantum systems, including molecules, solids, and fractional quantum Hall states, through a highly configurable and extensible architecture.
53108
10
baidu / vLLM-Kunlun
vLLM Kunlun is a community-maintained hardware plugin that enables the seamless execution of vLLM on Kunlun XPU hardware. It utilizes a hardware-pluggable interface to decouple the integration process, ensuring compatibility with a wide range of open-source models. The project supports various architectures including Transformer-based, Mixture-of-Expert, and multi-modal LLMs on the Kunlun3 P800 platform.
51405

// all-time featured (33)

PaddlePaddle / Paddle
PaddlePaddle is a comprehensive industrial deep learning platform that provides a complete ecosystem of frameworks, model libraries, and development tools. It supports advanced capabilities such as automatic parallelism, unified training and inference, and high-order differentiation for scientific computing. The platform is designed to facilitate AI commercialization across various sectors by offering a flexible, high-performance architecture for diverse model development.
92
PaddlePaddle / PaddleOCR
PaddleOCR is a comprehensive toolkit designed to convert images and PDF documents into structured, LLM-ready data formats like Markdown and JSON. It features state-of-the-art vision-language models and high-performance text recognition engines that support over 100 languages. The platform is widely integrated into major AI agent and RAG frameworks, offering efficient deployment options across various hardware backends.
89
Tencent / ncnn
ncnn is a high-performance neural network forward computation framework specifically optimized for mobile platforms, designed to simplify the deployment of deep learning algorithms on mobile devices. The framework has no third-party dependencies and features cross-platform capabilities, with execution speeds on mobile CPUs that outperform all currently known open-source frameworks. Currently, ncnn is widely used in various mainstream applications under Tencent, helping developers easily build intelligent applications.
89
Tencent / ncnn
ncnn is a high-performance neural network forward computation framework deeply optimized for mobile platforms. The framework has no third-party dependencies and features cross-platform capabilities, outperforming all known open-source frameworks on mobile CPUs. Developers can easily port deep learning models to mobile devices using ncnn to build various intelligent applications.
87
PaddlePaddle / Paddle
PaddlePaddle is a comprehensive industrial deep learning platform that provides core frameworks, model libraries, and end-to-end development tools. It supports advanced features like unified dynamic and static graphs, automatic parallelism, and high-order differentiation for scientific computing. The platform is designed to facilitate large-scale model training and inference across diverse industrial sectors.
85
alibaba / MNN
MNN is a high-performance, lightweight deep learning framework designed for efficient model inference and training on mobile and embedded devices. It supports a wide range of neural network architectures and provides versatile tools for model conversion, compression, and general-purpose computation. The framework is widely used in production environments, including various Alibaba applications, to enable device-cloud collaborative machine learning.
81
deepseek-ai / DeepEP
DeepEP is a specialized communication library designed to optimize Mixture-of-Experts and expert parallelism through high-throughput, low-latency GPU kernels. It provides advanced features such as asymmetric-domain bandwidth forwarding and low-precision support to enhance both training and inference performance. The library also includes hook-based mechanisms for communication-computation overlapping to maximize hardware efficiency without occupying additional streaming multiprocessor resources.
80
deepseek-ai / DeepGEMM
DeepGEMM is a unified CUDA library providing high-performance tensor core kernels specifically optimized for modern large language models. It features a lightweight Just-In-Time compilation module that eliminates the need for CUDA compilation during installation. The library delivers expert-tuned performance for various matrix operations, including FP8, FP4, and BF16 GEMMs, as well as fused MoE and MQA scoring.
79
PaddlePaddle / PaddleFormers
PaddleFormers is a Transformers library built on the Baidu PaddlePaddle framework, designed to provide training interfaces and functional experiences for Large Language Models and Vision-Language Models equivalent to Hugging Face. By integrating tensor parallelism, pipeline parallelism, and automatic mixed precision, the project achieves training performance that surpasses Megatron-LM on mainstream models. Furthermore, it fully supports domestic computing chips and is compatible with the Safetensors format, helping developers efficiently complete the entire process from pre-training to post-training.
78
alibaba / ROLL
ROLL is an efficient, user-friendly library designed for scaling reinforcement learning workflows for large language models across large-scale GPU clusters. It supports diverse training paradigms including RLVR, agentic interaction, and distillation, while integrating advanced backends like Megatron-Core, vLLM, and SGLang. The framework provides robust observability and flexible resource management to enhance performance in complex reasoning and human preference alignment tasks.
75
PaddlePaddle / PaddleX
PaddleX 3.0 is a low-code development tool built on the PaddlePaddle framework, integrating a vast array of out-of-the-box pre-trained models to support full-process development. Through a minimalist Python API and a graphical interface, the tool enables rapid implementation from model training to inference deployment. Furthermore, it is widely compatible with mainstream domestic and international hardware, helping developers efficiently complete industrial practices.
72
alibaba / rtp-llm
RTP-LLM is a high-performance LLM inference acceleration engine developed by the Alibaba Foundation Model Inference team. This engine has been widely applied in various Alibaba business scenarios such as Taobao and Tmall, supporting multiple mainstream model formats and hardware backends. It provides efficient production-level services for large language models by integrating advanced operator optimization, quantization techniques, and distributed inference capabilities.
70
alibaba / ROLL
ROLL is an efficient, user-friendly reinforcement learning library specifically designed for training and scaling Large Language Models on large-scale GPU clusters. It utilizes a multi-role distributed architecture powered by Ray to support complex tasks like human preference alignment, reasoning, and agentic interactions. The framework integrates advanced technologies such as Megatron-Core, vLLM, and SGLang to accelerate model training and inference across diverse hardware environments.
70
bilibili / Index-anisora
Index-AniSora is a powerful open-source framework designed specifically for high-quality anime video generation and animation production. The system features a comprehensive data processing pipeline, a controllable generation model with spatiotemporal masking, and a specialized evaluation benchmark. It supports diverse creative tasks including character 3D generation, video style transfer, and multimodal guidance for precise motion control.
68
alibaba / rtp-llm
RTP-LLM is a high-performance large model inference acceleration engine developed by the Alibaba Foundation Model Inference Team, widely used in various business scenarios such as Taobao and Tmall. By integrating various advanced CUDA kernels and quantization techniques, the engine significantly improves model inference performance and efficiency. Furthermore, it possesses high flexibility, supporting multiple model formats, multimodal inputs, and LoRA service deployment.
68
bytedance / Protenix
Protenix is an open-source framework designed for high-accuracy biomolecular structure prediction, offering models that perform competitively with state-of-the-art methods. The project provides multiple versions, including the enhanced Protenix-v2, which demonstrates significant improvements in antibody-antigen structure prediction and ligand-related plausibility. It is released under the Apache 2.0 license, making it freely accessible for both academic and commercial research applications.
66
Tencent / AngelSlim
AngelSlim is a highly integrated toolkit designed to provide efficient compression solutions for large language, vision, and diffusion models. It supports a wide range of techniques including advanced quantization, speculative decoding, and token pruning to optimize model performance. The framework offers developers a unified interface for training, deployment, and performance evaluation across various hardware environments.
63
bilibili / Index-anisora
Index-AniSora is a comprehensive open-source system developed by Bilibili for high-quality anime video generation. The project provides a controllable generation model, a specialized data processing pipeline, and an evaluation benchmark tailored for animation aesthetics. It supports advanced features such as character 3D video generation, video style transfer, and multimodal guidance to facilitate diverse animation production tasks.
61
alibaba / TorchEasyRec
TorchEasyRec is a PyTorch-based framework designed for developing production-ready deep learning recommendation models. It supports a wide range of tasks including candidate generation, ranking, multi-task learning, and generative recommendation. The framework offers high scalability, flexible data source integration, and seamless deployment options for real-world production environments.
60
OpenBMB / VoxCPM
VoxCPM2 is a tokenizer-free, 2B parameter text-to-speech system that utilizes a diffusion autoregressive architecture to generate high-quality, expressive audio. The model supports 30 languages and offers advanced capabilities including voice design, controllable voice cloning, and studio-quality 48kHz output. It is fully open-source under the Apache-2.0 license and provides production-ready deployment options via vLLM-Omni and Nano-vLLM.
56
PaddlePaddle / PaddleCustomDevice
PaddleCustomDevice is the custom hardware integration solution provided by the PaddlePaddle framework. Through standardized interface design, this project enables developers to integrate various third-party hardware backends into the PaddlePaddle ecosystem. It currently covers support for mainstream hardware platforms including Ascend, Cambricon, Intel GPU, and Apple MPS.
54
bytedance / jaqmc
JaQMC is a modular, JAX-based framework designed for performing neural network quantum Monte Carlo simulations. It utilizes deep neural networks as variational wavefunctions to solve the electronic Schrödinger equation without relying on traditional basis sets. The project supports various quantum systems, including molecules, solids, and fractional quantum Hall states, through a highly configurable and extensible architecture.
53
baidu / vLLM-Kunlun
vLLM Kunlun is a community-maintained hardware plugin that enables the seamless execution of vLLM on Kunlun XPU hardware. It utilizes a hardware-pluggable interface to decouple the integration process, ensuring compatibility with a wide range of open-source models. The project supports various architectures including Transformer-based, Mixture-of-Expert, and multi-modal LLMs on the Kunlun3 P800 platform.
51
google / magika
Magika is an AI-powered tool that utilizes deep learning to provide highly accurate file type identification for over 200 content types. It features a highly optimized model that delivers inference results in milliseconds while maintaining approximately 99% accuracy. The project offers a versatile command-line interface and language bindings for Python, JavaScript, and Rust to support diverse developer workflows.
50
k2-fsa / OmniVoice
OmniVoice is an advanced large-scale multilingual zero-shot speech synthesis model based on a diffusion language model architecture, supporting over 600 languages. The model features exceptional inference speed and enables high-quality voice cloning and voice design capabilities. Users can easily perform speech generation via Python API or command-line tools, with support for fine-grained non-linguistic symbols and pronunciation control.
48
microsoft / VibeVoice
VibeVoice is a family of open-source voice AI models that utilizes continuous speech tokenizers and next-token diffusion to achieve high-fidelity audio processing. The framework includes advanced tools for long-form speech recognition and real-time streaming text-to-speech generation. These models are designed for research purposes to advance collaboration and innovation within the speech synthesis community.
43
baidu / vLLM-Kunlun
vLLM Kunlun is a community-maintained hardware plugin that enables the seamless execution of vLLM on Kunlun XPU devices. It functions as a hardware-pluggable interface, allowing users to run various large language and multimodal models without modifying the original vLLM source code. The project supports advanced features like quantization, LoRA fine-tuning, and hardware-accelerated graph optimization to ensure high-performance inference.
40
PaddlePaddle / docs
This repository contains the source files for the official PaddlePaddle documentation platform. It organizes content into specific directories for API references, user guides, and tutorials to support developers. The project also provides CI scripts and build instructions to facilitate local documentation generation and community contributions.
39
PaddlePaddle / PaddleCustomDevice
PaddleCustomDevice is a custom hardware integration solution provided by the PaddlePaddle deep learning framework. This project aims to help developers efficiently integrate various third-party hardware backends into the PaddlePaddle ecosystem. Currently, it supports a variety of mainstream hardware platforms, including Ascend, Cambricon, Intel GPU, and Apple MPS.
38
PaddlePaddle / PaConvert
This tool is officially maintained by Paddle and aims to achieve efficient automated migration from PyTorch code to PaddlePaddle code. It supports one-click conversion of over 1,600 PyTorch APIs and 200 torchvision APIs, maintaining an average conversion rate of over 95% in tests. The conversion process is operated via the command line, preserves the style and structure of the original code, and provides detailed conversion logs and summaries.
34
PaddlePaddle / community
The PaddlePaddle community serves as a central hub for developers to contribute to the framework through code improvements, documentation, and presentations. It provides structured governance, specialized working groups, and various mentorship programs to support active participation. Contributors are recognized through official certifications, release notes, and inclusion in the project's authorship records.
29
shiyu-coder / Kronos
Kronos is an open-source decoder-only foundation model specifically designed to analyze and forecast financial K-line sequences. It utilizes a two-stage framework that quantizes multi-dimensional market data into hierarchical tokens before processing them through an autoregressive Transformer. The project provides a comprehensive suite of pre-trained models and tools for both direct forecasting and domain-specific fine-tuning.
28
rohitg00 / ai-engineering-from-scratch
AI Engineering from Scratch is a comprehensive 320-hour curriculum that guides students from fundamental linear algebra to building autonomous agent swarms. The course emphasizes an AI-native learning approach where students use AI coding agents to test their knowledge and build reusable tools throughout 20 distinct phases. By working across Python, TypeScript, Rust, and Julia, learners develop a professional portfolio of prompts, skills, and agents that can be deployed in real-world environments.
28

// use cases by project

Paddle
  • 01Automatic distributed parallel training for large-scale models
  • 02High-order automatic differentiation for scientific computing applications
  • 03Heterogeneous multi-chip adaptation through a standardized, pluggable architecture
PaddleOCR
  • 01Intelligent document parsing for LLM-ready structured data extraction
  • 02Universal multilingual text recognition for natural scene and document analysis
  • 03Building high-quality datasets for fine-tuning Large Language Models
ncnn
  • 01Supports a variety of mainstream CNN models, including classification, detection, segmentation, and face recognition algorithms.
  • 02Provides cross-platform deployment capabilities, supporting environments such as Android, iOS, Windows, Linux, macOS, and WebAssembly.
  • 03Helps developers port deep learning algorithms to mobile devices through efficient implementation, enabling the rapid deployment of artificial intelligence applications.
ncnn
  • 01Efficiently deploy deep learning algorithm models on mobile devices
  • 02Support mainstream CNN networks such as YOLO, MobileNet, and ResNet
  • 03Achieve high-performance cross-platform neural network inference computation
Paddle
  • 01Unified dynamic and static graph training with automatic parallelism
  • 02Integrated large model training and inference workflows
  • 03High-order differentiation for scientific computing and differential equations

// comparisons

// related topics