HubLens › Topics › Deep Learning

// topic

Deep Learning

33trending in last 90 days·33all-time

// new this month

// ecosystem

AI 33

// recent newcomers

see all newcomers →

#1vLLM Kunlun Hardware Plugin🆕 5mo ago↗ 12.53/d★ 405

// this week's top 10

PaddlePaddle / Paddle

PaddlePaddle is a comprehensive industrial deep learning platform that provides core frameworks, model libraries, and end-to-end development tools. It supports advanced features like unified dynamic and static graphs, automatic parallelism, and high-order differentiation for scientific computing. The platform is designed to facilitate large-scale model training and inference across diverse industrial sectors.

deepseek-ai / DeepGEMM

DeepGEMM is a unified CUDA library providing high-performance tensor core kernels specifically optimized for modern large language models. It features a lightweight Just-In-Time compilation module that eliminates the need for CUDA compilation during installation. The library delivers expert-tuned performance for various matrix operations, including FP8, FP4, and BF16 GEMMs, as well as fused MoE and MQA scoring.

PaddlePaddle / PaddleFormers

PaddleFormers is a Transformers library built on the Baidu PaddlePaddle framework, designed to provide training interfaces and functional experiences for Large Language Models and Vision-Language Models equivalent to Hugging Face. By integrating tensor parallelism, pipeline parallelism, and automatic mixed precision, the project achieves training performance that surpasses Megatron-LM on mainstream models. Furthermore, it fully supports domestic computing chips and is compatible with the Safetensors format, helping developers efficiently complete the entire process from pre-training to post-training.

ROLL is an efficient, user-friendly library designed for scaling reinforcement learning workflows for large language models across large-scale GPU clusters. It supports diverse training paradigms including RLVR, agentic interaction, and distillation, while integrating advanced backends like Megatron-Core, vLLM, and SGLang. The framework provides robust observability and flexible resource management to enhance performance in complex reasoning and human preference alignment tasks.

alibaba / rtp-llm

RTP-LLM is a high-performance LLM inference acceleration engine developed by the Alibaba Foundation Model Inference team. This engine has been widely applied in various Alibaba business scenarios such as Taobao and Tmall, supporting multiple mainstream model formats and hardware backends. It provides efficient production-level services for large language models by integrating advanced operator optimization, quantization techniques, and distributed inference capabilities.

bilibili / Index-anisora

Index-AniSora is a powerful open-source framework designed specifically for high-quality anime video generation and animation production. The system features a comprehensive data processing pipeline, a controllable generation model with spatiotemporal masking, and a specialized evaluation benchmark. It supports diverse creative tasks including character 3D generation, video style transfer, and multimodal guidance for precise motion control.

alibaba / TorchEasyRec

TorchEasyRec is a PyTorch-based framework designed for developing production-ready deep learning recommendation models. It supports a wide range of tasks including candidate generation, ranking, multi-task learning, and generative recommendation. The framework offers high scalability, flexible data source integration, and seamless deployment options for real-world production environments.

PaddlePaddle / PaddleCustomDevice

PaddleCustomDevice is the custom hardware integration solution provided by the PaddlePaddle framework. Through standardized interface design, this project enables developers to integrate various third-party hardware backends into the PaddlePaddle ecosystem. It currently covers support for mainstream hardware platforms including Ascend, Cambricon, Intel GPU, and Apple MPS.

bytedance / jaqmc

JaQMC is a modular, JAX-based framework designed for performing neural network quantum Monte Carlo simulations. It utilizes deep neural networks as variational wavefunctions to solve the electronic Schrödinger equation without relying on traditional basis sets. The project supports various quantum systems, including molecules, solids, and fractional quantum Hall states, through a highly configurable and extensible architecture.

baidu / vLLM-Kunlun

vLLM Kunlun is a community-maintained hardware plugin that enables the seamless execution of vLLM on Kunlun XPU hardware. It utilizes a hardware-pluggable interface to decouple the integration process, ensuring compatibility with a wide range of open-source models. The project supports various architectures including Transformer-based, Mixture-of-Expert, and multi-modal LLMs on the Kunlun3 P800 platform.

// all-time featured (33)

PaddlePaddle / Paddle

PaddlePaddle is a comprehensive industrial deep learning platform that provides a complete ecosystem of frameworks, model libraries, and development tools. It supports advanced capabilities such as automatic parallelism, unified training and inference, and high-order differentiation for scientific computing. The platform is designed to facilitate AI commercialization across various sectors by offering a flexible, high-performance architecture for diverse model development.

PaddlePaddle / PaddleOCR

PaddleOCR is a comprehensive toolkit designed to convert images and PDF documents into structured, LLM-ready data formats like Markdown and JSON. It features state-of-the-art vision-language models and high-performance text recognition engines that support over 100 languages. The platform is widely integrated into major AI agent and RAG frameworks, offering efficient deployment options across various hardware backends.

ncnn is a high-performance neural network forward computation framework specifically optimized for mobile platforms, designed to simplify the deployment of deep learning algorithms on mobile devices. The framework has no third-party dependencies and features cross-platform capabilities, with execution speeds on mobile CPUs that outperform all currently known open-source frameworks. Currently, ncnn is widely used in various mainstream applications under Tencent, helping developers easily build intelligent applications.

ncnn is a high-performance neural network forward computation framework deeply optimized for mobile platforms. The framework has no third-party dependencies and features cross-platform capabilities, outperforming all known open-source frameworks on mobile CPUs. Developers can easily port deep learning models to mobile devices using ncnn to build various intelligent applications.

PaddlePaddle / Paddle

PaddlePaddle is a comprehensive industrial deep learning platform that provides core frameworks, model libraries, and end-to-end development tools. It supports advanced features like unified dynamic and static graphs, automatic parallelism, and high-order differentiation for scientific computing. The platform is designed to facilitate large-scale model training and inference across diverse industrial sectors.

MNN is a high-performance, lightweight deep learning framework designed for efficient model inference and training on mobile and embedded devices. It supports a wide range of neural network architectures and provides versatile tools for model conversion, compression, and general-purpose computation. The framework is widely used in production environments, including various Alibaba applications, to enable device-cloud collaborative machine learning.

deepseek-ai / DeepEP

DeepEP is a specialized communication library designed to optimize Mixture-of-Experts and expert parallelism through high-throughput, low-latency GPU kernels. It provides advanced features such as asymmetric-domain bandwidth forwarding and low-precision support to enhance both training and inference performance. The library also includes hook-based mechanisms for communication-computation overlapping to maximize hardware efficiency without occupying additional streaming multiprocessor resources.

deepseek-ai / DeepGEMM

DeepGEMM is a unified CUDA library providing high-performance tensor core kernels specifically optimized for modern large language models. It features a lightweight Just-In-Time compilation module that eliminates the need for CUDA compilation during installation. The library delivers expert-tuned performance for various matrix operations, including FP8, FP4, and BF16 GEMMs, as well as fused MoE and MQA scoring.

PaddlePaddle / PaddleFormers

PaddleFormers is a Transformers library built on the Baidu PaddlePaddle framework, designed to provide training interfaces and functional experiences for Large Language Models and Vision-Language Models equivalent to Hugging Face. By integrating tensor parallelism, pipeline parallelism, and automatic mixed precision, the project achieves training performance that surpasses Megatron-LM on mainstream models. Furthermore, it fully supports domestic computing chips and is compatible with the Safetensors format, helping developers efficiently complete the entire process from pre-training to post-training.

ROLL is an efficient, user-friendly library designed for scaling reinforcement learning workflows for large language models across large-scale GPU clusters. It supports diverse training paradigms including RLVR, agentic interaction, and distillation, while integrating advanced backends like Megatron-Core, vLLM, and SGLang. The framework provides robust observability and flexible resource management to enhance performance in complex reasoning and human preference alignment tasks.

PaddlePaddle / PaddleX

PaddleX 3.0 is a low-code development tool built on the PaddlePaddle framework, integrating a vast array of out-of-the-box pre-trained models to support full-process development. Through a minimalist Python API and a graphical interface, the tool enables rapid implementation from model training to inference deployment. Furthermore, it is widely compatible with mainstream domestic and international hardware, helping developers efficiently complete industrial practices.

alibaba / rtp-llm

RTP-LLM is a high-performance LLM inference acceleration engine developed by the Alibaba Foundation Model Inference team. This engine has been widely applied in various Alibaba business scenarios such as Taobao and Tmall, supporting multiple mainstream model formats and hardware backends. It provides efficient production-level services for large language models by integrating advanced operator optimization, quantization techniques, and distributed inference capabilities.

ROLL is an efficient, user-friendly reinforcement learning library specifically designed for training and scaling Large Language Models on large-scale GPU clusters. It utilizes a multi-role distributed architecture powered by Ray to support complex tasks like human preference alignment, reasoning, and agentic interactions. The framework integrates advanced technologies such as Megatron-Core, vLLM, and SGLang to accelerate model training and inference across diverse hardware environments.

bilibili / Index-anisora

Index-AniSora is a powerful open-source framework designed specifically for high-quality anime video generation and animation production. The system features a comprehensive data processing pipeline, a controllable generation model with spatiotemporal masking, and a specialized evaluation benchmark. It supports diverse creative tasks including character 3D generation, video style transfer, and multimodal guidance for precise motion control.

alibaba / rtp-llm

RTP-LLM is a high-performance large model inference acceleration engine developed by the Alibaba Foundation Model Inference Team, widely used in various business scenarios such as Taobao and Tmall. By integrating various advanced CUDA kernels and quantization techniques, the engine significantly improves model inference performance and efficiency. Furthermore, it possesses high flexibility, supporting multiple model formats, multimodal inputs, and LoRA service deployment.

bytedance / Protenix

Protenix is an open-source framework designed for high-accuracy biomolecular structure prediction, offering models that perform competitively with state-of-the-art methods. The project provides multiple versions, including the enhanced Protenix-v2, which demonstrates significant improvements in antibody-antigen structure prediction and ligand-related plausibility. It is released under the Apache 2.0 license, making it freely accessible for both academic and commercial research applications.

Tencent / AngelSlim

AngelSlim is a highly integrated toolkit designed to provide efficient compression solutions for large language, vision, and diffusion models. It supports a wide range of techniques including advanced quantization, speculative decoding, and token pruning to optimize model performance. The framework offers developers a unified interface for training, deployment, and performance evaluation across various hardware environments.

bilibili / Index-anisora

Index-AniSora is a comprehensive open-source system developed by Bilibili for high-quality anime video generation. The project provides a controllable generation model, a specialized data processing pipeline, and an evaluation benchmark tailored for animation aesthetics. It supports advanced features such as character 3D video generation, video style transfer, and multimodal guidance to facilitate diverse animation production tasks.

alibaba / TorchEasyRec

TorchEasyRec is a PyTorch-based framework designed for developing production-ready deep learning recommendation models. It supports a wide range of tasks including candidate generation, ranking, multi-task learning, and generative recommendation. The framework offers high scalability, flexible data source integration, and seamless deployment options for real-world production environments.

OpenBMB / VoxCPM

VoxCPM2 is a tokenizer-free, 2B parameter text-to-speech system that utilizes a diffusion autoregressive architecture to generate high-quality, expressive audio. The model supports 30 languages and offers advanced capabilities including voice design, controllable voice cloning, and studio-quality 48kHz output. It is fully open-source under the Apache-2.0 license and provides production-ready deployment options via vLLM-Omni and Nano-vLLM.

PaddlePaddle / PaddleCustomDevice

PaddleCustomDevice is the custom hardware integration solution provided by the PaddlePaddle framework. Through standardized interface design, this project enables developers to integrate various third-party hardware backends into the PaddlePaddle ecosystem. It currently covers support for mainstream hardware platforms including Ascend, Cambricon, Intel GPU, and Apple MPS.

bytedance / jaqmc

JaQMC is a modular, JAX-based framework designed for performing neural network quantum Monte Carlo simulations. It utilizes deep neural networks as variational wavefunctions to solve the electronic Schrödinger equation without relying on traditional basis sets. The project supports various quantum systems, including molecules, solids, and fractional quantum Hall states, through a highly configurable and extensible architecture.

baidu / vLLM-Kunlun

vLLM Kunlun is a community-maintained hardware plugin that enables the seamless execution of vLLM on Kunlun XPU hardware. It utilizes a hardware-pluggable interface to decouple the integration process, ensuring compatibility with a wide range of open-source models. The project supports various architectures including Transformer-based, Mixture-of-Expert, and multi-modal LLMs on the Kunlun3 P800 platform.

google / magika

Magika is an AI-powered tool that utilizes deep learning to provide highly accurate file type identification for over 200 content types. It features a highly optimized model that delivers inference results in milliseconds while maintaining approximately 99% accuracy. The project offers a versatile command-line interface and language bindings for Python, JavaScript, and Rust to support diverse developer workflows.

k2-fsa / OmniVoice

OmniVoice is an advanced large-scale multilingual zero-shot speech synthesis model based on a diffusion language model architecture, supporting over 600 languages. The model features exceptional inference speed and enables high-quality voice cloning and voice design capabilities. Users can easily perform speech generation via Python API or command-line tools, with support for fine-grained non-linguistic symbols and pronunciation control.

microsoft / VibeVoice

VibeVoice is a family of open-source voice AI models that utilizes continuous speech tokenizers and next-token diffusion to achieve high-fidelity audio processing. The framework includes advanced tools for long-form speech recognition and real-time streaming text-to-speech generation. These models are designed for research purposes to advance collaboration and innovation within the speech synthesis community.

baidu / vLLM-Kunlun

vLLM Kunlun is a community-maintained hardware plugin that enables the seamless execution of vLLM on Kunlun XPU devices. It functions as a hardware-pluggable interface, allowing users to run various large language and multimodal models without modifying the original vLLM source code. The project supports advanced features like quantization, LoRA fine-tuning, and hardware-accelerated graph optimization to ensure high-performance inference.

PaddlePaddle / docs

This repository contains the source files for the official PaddlePaddle documentation platform. It organizes content into specific directories for API references, user guides, and tutorials to support developers. The project also provides CI scripts and build instructions to facilitate local documentation generation and community contributions.

PaddlePaddle / PaddleCustomDevice

PaddleCustomDevice is a custom hardware integration solution provided by the PaddlePaddle deep learning framework. This project aims to help developers efficiently integrate various third-party hardware backends into the PaddlePaddle ecosystem. Currently, it supports a variety of mainstream hardware platforms, including Ascend, Cambricon, Intel GPU, and Apple MPS.

PaddlePaddle / PaConvert

This tool is officially maintained by Paddle and aims to achieve efficient automated migration from PyTorch code to PaddlePaddle code. It supports one-click conversion of over 1,600 PyTorch APIs and 200 torchvision APIs, maintaining an average conversion rate of over 95% in tests. The conversion process is operated via the command line, preserves the style and structure of the original code, and provides detailed conversion logs and summaries.

PaddlePaddle / community

The PaddlePaddle community serves as a central hub for developers to contribute to the framework through code improvements, documentation, and presentations. It provides structured governance, specialized working groups, and various mentorship programs to support active participation. Contributors are recognized through official certifications, release notes, and inclusion in the project's authorship records.

shiyu-coder / Kronos

Kronos is an open-source decoder-only foundation model specifically designed to analyze and forecast financial K-line sequences. It utilizes a two-stage framework that quantizes multi-dimensional market data into hierarchical tokens before processing them through an autoregressive Transformer. The project provides a comprehensive suite of pre-trained models and tools for both direct forecasting and domain-specific fine-tuning.

rohitg00 / ai-engineering-from-scratch

AI Engineering from Scratch is a comprehensive 320-hour curriculum that guides students from fundamental linear algebra to building autonomous agent swarms. The course emphasizes an AI-native learning approach where students use AI coding agents to test their knowledge and build reusable tools throughout 20 distinct phases. By working across Python, TypeScript, Rust, and Julia, learners develop a professional portfolio of prompts, skills, and agents that can be deployed in real-world environments.

// use cases by project

01Automatic distributed parallel training for large-scale models
02High-order automatic differentiation for scientific computing applications
03Heterogeneous multi-chip adaptation through a standardized, pluggable architecture

01Intelligent document parsing for LLM-ready structured data extraction
02Universal multilingual text recognition for natural scene and document analysis
03Building high-quality datasets for fine-tuning Large Language Models

01Supports a variety of mainstream CNN models, including classification, detection, segmentation, and face recognition algorithms.
02Provides cross-platform deployment capabilities, supporting environments such as Android, iOS, Windows, Linux, macOS, and WebAssembly.
03Helps developers port deep learning algorithms to mobile devices through efficient implementation, enabling the rapid deployment of artificial intelligence applications.

01Efficiently deploy deep learning algorithm models on mobile devices
02Support mainstream CNN networks such as YOLO, MobileNet, and ResNet
03Achieve high-performance cross-platform neural network inference computation

01Unified dynamic and static graph training with automatic parallelism
02Integrated large model training and inference workflows
03High-order differentiation for scientific computing and differential equations

// comparisons

PaddleOCR vs FlashMLA ncnn vs ncnn ncnn vs MNN FastDeploy vs ncnn

// related topics

LLM (11)PaddlePaddle (9)Computer Vision (7)Inference (6)Machine Learning (6)