HubLens › Topics › LLM

// topic

LLM

117trending in last 90 days·117all-time

// new this month

// ecosystem

AI 114

Security 2

Frontend 1

// recent newcomers

see all newcomers →

#1ZhangXuefeng.skill: A Thinking Framework Based on Cognitive Models🆕 27d ago↗ 673.31/d★ 6,766 #2Hermes Agent: The Complete Guide (Orange Book)🆕 24d ago↗ 366.31/d★ 3,426 #3OpenClaw AI Agent Best Use Cases and Case Collection🆕 2mo ago↗ 205.61/d★ 3,959 #4Tong Jincheng.skill: An interpersonal relationship analysis tool based on the 'Deep-Feeling Grandmaster' mindset🆕 27d ago↗ 179.12/d★ 1,789 #5Awesome DeepSeek Agent Integration Guides🆕 5d ago↗ 151.86/d★ 483

// this week's top 10

deepseek-ai / FlashMLA

FlashMLA is a library of high-performance attention kernels specifically designed to power DeepSeek-V3 and DeepSeek-V3.2 models. It provides optimized implementations for both sparse and dense attention mechanisms during prefill and decoding stages. The library supports advanced features like FP8 KV cache and is compatible with various GPU architectures including SM90 and SM100.

BerriAI / litellm

LiteLLM provides a unified interface to interact with over 100 LLM providers using a consistent OpenAI-compatible format. Developers can utilize it as a Python SDK for direct integration or deploy it as a production-ready proxy server. The platform simplifies LLM management by offering features like load balancing, spend tracking, and virtual keys.

TabbyML / tabby

Tabby is a self-hosted, open-source AI coding assistant designed as an on-premises alternative to GitHub Copilot. It operates as a self-contained system that does not require external cloud services or database management. The platform supports consumer-grade GPUs and offers an OpenAPI interface for seamless integration with existing development infrastructure.

deepseek-ai / Thinking-with-Visual-Primitives

Thinking with Visual Primitives introduces a novel approach to Multimodal Large Language Models by interleaving spatial markers directly into the reasoning process. This method addresses the reference gap in complex structural tasks by anchoring abstract language to concrete physical coordinates. The framework achieves frontier-competitive performance while maintaining high visual token efficiency through a compressed architecture.

alibaba / page-agent

Page Agent is a client-side library that enables natural language control of web interfaces directly within the browser. It utilizes text-based DOM manipulation to interact with elements without requiring screenshots or complex headless browser setups. Developers can easily integrate this tool to build AI copilots, automate form filling, or enhance web accessibility.

Khoj is a versatile personal AI application designed to extend your capabilities by integrating with various local and online large language models. It allows users to interact with their personal documents and the internet through a unified interface accessible across multiple platforms. The project is open-source and supports flexible deployment options ranging from private on-device setups to scalable enterprise cloud solutions.

Mininglamp-AI / Mano-P

Mano-P is a GUI-VLA agent project designed to enable autonomous, private task execution on edge devices like Mac mini and MacBook. It utilizes advanced reinforcement learning and edge-native inference to perform complex GUI automation, cross-system data integration, and long-task planning. The project provides a secure, local-first solution that eliminates the need for cloud API calls while maintaining high performance across various benchmarks.

bytedance / deer-flow

DeerFlow is an open-source super agent harness designed to orchestrate sub-agents, memory, and sandboxes for complex task execution. The platform features a ground-up rewrite in version 2.0, offering enhanced extensibility through a modular skill and tool architecture. It supports diverse deployment environments, including local development and Docker-based production setups, with integrated support for multiple messaging channels.

Slime is a specialized post-training framework designed to scale reinforcement learning for large language models. It integrates Megatron-LM for high-performance training with SGLang to provide flexible, efficient data generation workflows. The architecture decouples training and rollout processes, enabling researchers to build and deploy complex agentic RL systems.

deepseek-ai / DeepGEMM

DeepGEMM is a unified CUDA library providing high-performance tensor core kernels specifically optimized for modern large language models. It features a lightweight Just-In-Time compilation module that eliminates the need for CUDA compilation during installation. The library delivers expert-tuned performance for various matrix operations, including FP8, FP4, and BF16 GEMMs, as well as fused MoE and MQA scoring.

// all-time featured (50)

deepseek-ai / FlashMLA

FlashMLA is a library of high-performance attention kernels specifically designed to power DeepSeek-V3 and DeepSeek-V3.2 models. It provides optimized implementations for both sparse and dense attention mechanisms during prefill and decoding stages. The library supports advanced features like FP8 KV cache and is compatible with various GPU architectures including SM90 and SM100.

deepseek-ai / FlashMLA

FlashMLA is a library of high-performance attention kernels developed by DeepSeek to power their V3 and V3.2-Exp models. The repository provides specialized implementations for both sparse and dense attention mechanisms during prefill and decoding stages. These kernels are optimized for NVIDIA GPU architectures, including SM90 and SM100, to achieve significant computational throughput.

BerriAI / litellm

LiteLLM provides a unified interface to interact with over 100 LLM providers using a consistent OpenAI-compatible format. Developers can utilize it as a Python SDK for direct integration or deploy it as a production-ready proxy server. The platform simplifies LLM management by offering features like load balancing, spend tracking, and virtual keys.

PaddlePaddle / PaddleOCR

PaddleOCR is a comprehensive toolkit designed to convert images and PDF documents into structured, LLM-ready data formats like Markdown and JSON. It features state-of-the-art vision-language models and high-performance text recognition engines that support over 100 languages. The platform is widely integrated into major AI agent and RAG frameworks, offering efficient deployment options across various hardware backends.

TabbyML / tabby

Tabby is a self-hosted, open-source AI coding assistant designed as an on-premises alternative to GitHub Copilot. It operates as a self-contained system that does not require external cloud services or database management. The platform supports consumer-grade GPUs and offers an OpenAPI interface for seamless integration with existing development infrastructure.

deepseek-ai / Thinking-with-Visual-Primitives

Thinking with Visual Primitives introduces a novel approach to Multimodal Large Language Models by interleaving spatial markers directly into the reasoning process. This method addresses the reference gap in complex structural tasks by anchoring abstract language to concrete physical coordinates. The framework achieves frontier-competitive performance while maintaining high visual token efficiency through a compressed architecture.

alibaba / page-agent

Page Agent is a client-side library that enables natural language control of web interfaces directly within the browser. It utilizes text-based DOM manipulation to interact with elements without requiring screenshots or complex headless browser setups. Developers can easily integrate this tool to build AI copilots, automate form filling, or enhance web accessibility.

Khoj is a versatile personal AI application designed to extend your capabilities by integrating with various local and online large language models. It allows users to interact with their personal documents and the internet through a unified interface accessible across multiple platforms. The project is open-source and supports flexible deployment options ranging from private on-device setups to scalable enterprise cloud solutions.

Mininglamp-AI / Mano-P

Mano-P is a GUI-VLA agent project designed to enable autonomous, private task execution on edge devices like Mac mini and MacBook. It utilizes advanced reinforcement learning and edge-native inference to perform complex GUI automation, cross-system data integration, and long-task planning. The project provides a secure, local-first solution that eliminates the need for cloud API calls while maintaining high performance across various benchmarks.

bytedance / deer-flow

DeerFlow is an open-source super agent harness designed to orchestrate sub-agents, memory, and sandboxes for complex task execution. The platform features a ground-up rewrite in version 2.0, offering enhanced extensibility through a modular skill and tool architecture. It supports diverse deployment environments, including local development and Docker-based production setups, with integrated support for multiple messaging channels.

deepseek-ai / TileKernels

TileKernels provides a collection of high-performance GPU kernels specifically designed for large language model operations using the TileLang framework. The project includes specialized implementations for Mixture of Experts routing, advanced quantization techniques, and manifold hyper-connection operations. These kernels are built to maximize hardware performance and are currently utilized in internal training and inference workflows.

MNN is a high-performance, lightweight deep learning framework designed for efficient model inference and training on mobile and embedded devices. It supports a wide range of neural network architectures and provides versatile tools for model conversion, compression, and general-purpose computation. The framework is widely used in production environments, including various Alibaba applications, to enable device-cloud collaborative machine learning.

WeaveMindAI / weft

Weft is a programming language designed to integrate LLMs, human interactions, and infrastructure into a unified, visual workflow. It features durable execution to ensure programs survive crashes and supports complex logic through a typed, modular node system. Developers can build and manage sophisticated agentic systems by wiring together native nodes without the need for manual plumbing.

Slime is a specialized post-training framework designed to scale reinforcement learning for large language models. It integrates Megatron-LM for high-performance training with SGLang to provide flexible, efficient data generation workflows. The architecture decouples training and rollout processes, enabling researchers to build and deploy complex agentic RL systems.

deepseek-ai / DeepGEMM

DeepGEMM is a unified CUDA library providing high-performance tensor core kernels specifically optimized for modern large language models. It features a lightweight Just-In-Time compilation module that eliminates the need for CUDA compilation during installation. The library delivers expert-tuned performance for various matrix operations, including FP8, FP4, and BF16 GEMMs, as well as fused MoE and MQA scoring.

bytedance / deer-flow

DeerFlow 2.0 is a ground-up rewrite of an open-source super agent harness designed to orchestrate sub-agents, memory, and sandboxes. It utilizes extensible skills and integrates with various AI models to perform complex tasks through a flexible, containerized architecture. The framework supports multiple deployment modes and provides seamless connectivity with messaging platforms like Slack, Telegram, and Feishu.

PaddlePaddle / PaddleFormers

PaddleFormers is a Transformers library built on the Baidu PaddlePaddle framework, designed to provide training interfaces and functional experiences for Large Language Models and Vision-Language Models equivalent to Hugging Face. By integrating tensor parallelism, pipeline parallelism, and automatic mixed precision, the project achieves training performance that surpasses Megatron-LM on mainstream models. Furthermore, it fully supports domestic computing chips and is compatible with the Safetensors format, helping developers efficiently complete the entire process from pre-training to post-training.

Tencent / WeKnora

WeKnora is an open-source, LLM-powered framework designed for enterprise-grade document understanding, semantic retrieval, and autonomous reasoning. It features a ReAct agent for complex multi-step tasks and a Wiki mode that distills raw documents into a structured, interlinked knowledge base. The platform supports multi-source data ingestion, various LLM integrations, and flexible deployment options to ensure complete data sovereignty.

nesquena / hermes-webui

Hermes WebUI provides a lightweight, dark-themed browser interface that offers full parity with the Hermes Agent CLI. It features a three-panel layout for chat, file management, and session navigation without requiring complex build steps or frameworks. Users can securely access their self-hosted agent via SSH tunnels or mobile devices while maintaining persistent memory and cross-session context.

farion1231 / cc-switch

CC Switch is a desktop application designed to centralize the management of Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw. It eliminates the need for manual configuration file editing by providing a visual interface with over 50 built-in provider presets and system tray quick-switching. The tool also features unified management for MCP servers, prompts, and skills, alongside cross-device cloud synchronization.

elder-plinius / CL4R1T4S

CL4R1T4S is a comprehensive repository dedicated to exposing the hidden system prompts, guidelines, and tools used by major AI models and agents. By documenting these unseen instructions, the project aims to provide users with a clearer understanding of the underlying frameworks that shape AI behavior and decision-making. The platform encourages community contributions to maintain an up-to-date collection of extracted system prompts from various industry-leading AI providers.

VoltAgent / awesome-design-md

This repository provides a curated collection of DESIGN.md files that define the visual identity and design systems of popular websites. These markdown-based documents allow AI coding agents to understand and replicate specific UI styles without needing complex tooling or Figma exports. Each entry includes detailed design tokens, typography rules, and component styling to ensure consistent and pixel-perfect AI-generated interfaces.

HKUDS / RAG-Anything

RAG-Anything is a comprehensive framework designed to process and query diverse document types including text, images, tables, and mathematical equations. Built on LightRAG, it provides an end-to-end pipeline that integrates multimodal content into a unified knowledge graph for intelligent retrieval. This system eliminates the need for multiple specialized tools by offering a single, cohesive interface for complex document analysis.

Gitlawb / openclaude

OpenClaude is an open-source coding-agent CLI that supports a wide range of cloud and local model providers. It offers a unified terminal-first workflow featuring tools for file management, bash execution, and agentic tasks. Users can easily integrate various backends, including OpenAI, Ollama, and Gemini, while leveraging advanced features like agent routing and gRPC support.

bytedance / agentkit-samples

AgentKit Code Workshop is an AI Agent development platform sample repository launched by Volcengine, designed to help developers quickly master the construction and deployment of intelligent agents. The project provides a variety of code examples ranging from basic introductions to complex scenarios, covering core functions such as multi-agent collaboration, RAG retrieval enhancement, and tool invocation. Developers can use these tutorials to gain an in-depth understanding of the AgentKit development toolchain and integrate it efficiently into various business applications.

Slime is an LLM post-training framework designed for reinforcement learning scaling by integrating Megatron for high-performance training and SGLang for efficient rollout generation. The framework utilizes a data buffer to bridge training and generation, enabling flexible and asynchronous workflows for complex RL tasks. It supports a wide range of state-of-the-art models, including the GLM, Qwen, DeepSeek, and Llama series.

ROLL is an efficient, user-friendly library designed for scaling reinforcement learning workflows for large language models across large-scale GPU clusters. It supports diverse training paradigms including RLVR, agentic interaction, and distillation, while integrating advanced backends like Megatron-Core, vLLM, and SGLang. The framework provides robust observability and flexible resource management to enhance performance in complex reasoning and human preference alignment tasks.

XiaoMi / xiaomi-miloco

Xiaomi Miloco is an open-source smart home solution that utilizes on-device large language models to integrate and control IoT devices. By leveraging camera data streams, the system enables natural language interaction for complex home automation and event analysis. It prioritizes user privacy by performing visual understanding and task planning locally on the user's hardware.

alchaincyf / hermes-agent-orange-book

This comprehensive guide provides a detailed walkthrough of the Hermes Agent framework developed by Nous Research. It covers core mechanisms like the self-improving learning loop, memory systems, and automated skill evolution across seventeen chapters. The book serves as a practical resource for developers and AI enthusiasts looking to implement and customize their own intelligent agents.

Tencent / AI-Infra-Guard

AI-Infra-Guard is an open-source red teaming platform developed by Tencent Zhuque Lab to provide comprehensive security self-examination for AI infrastructures. It integrates multiple scanning capabilities, including vulnerability detection for AI components, agent workflow security, and jailbreak evaluation. The platform is designed to be user-friendly, offering a modern web interface and a robust API for seamless integration into security workflows.

Ant Design X provides a comprehensive suite of atomic components and utility APIs designed for building intelligent AI interfaces. The library includes specialized tools for streaming Markdown rendering, dynamic card generation, and managing AI agent data streams. It offers an enterprise-ready ecosystem to help developers efficiently create high-quality, interactive AI conversation applications.

openocta / openocta

OpenOcta is a fully self-developed enterprise-grade AI Agent runtime and control plane that uses a single Go binary to encapsulate the backend and embedded frontend. The project supports intelligent conversation, process automation, and deep integration with business systems, APIs, and toolchains. Users can quickly deploy and connect to internal business systems via CLI, HTTP, or WebSocket.

abi / secret-llama

Secret Llama is an entirely in-browser chatbot that allows users to run open-source models like Llama 3 and Mistral locally. Because the application operates directly within the browser, all conversation data remains private and no server installation is required. The platform provides a user-friendly interface that functions offline while leveraging WebGPU technology for performance.

PaddlePaddle / FastDeploy

FastDeploy is an inference deployment toolkit for large language models and vision-language models based on PaddlePaddle, designed to provide out-of-the-box production-grade deployment solutions. This tool supports various mainstream hardware platforms and integrates load-balanced PD separation, unified KV cache transmission, and multiple advanced acceleration technologies. Developers can achieve rapid deployment through OpenAI API-compatible interfaces and optimize inference performance using full quantization format support.

Tencent / AI-Infra-Guard

AI-Infra-Guard is a professional AI red teaming security assessment platform developed by Tencent Zhuque Lab, designed to provide comprehensive AI security risk self-inspection solutions for enterprises and individuals. The platform integrates core functions such as AI infrastructure vulnerability scanning, Agent workflow security assessment, MCP server scanning, and jailbreak testing. Users can deploy it quickly via Docker and utilize its modern Web interface and robust API to achieve efficient security detection and management.

alibaba / rtp-llm

RTP-LLM is a high-performance LLM inference acceleration engine developed by the Alibaba Foundation Model Inference team. This engine has been widely applied in various Alibaba business scenarios such as Taobao and Tmall, supporting multiple mainstream model formats and hardware backends. It provides efficient production-level services for large language models by integrating advanced operator optimization, quantization techniques, and distributed inference capabilities.

ROLL is an efficient, user-friendly reinforcement learning library specifically designed for training and scaling Large Language Models on large-scale GPU clusters. It utilizes a multi-role distributed architecture powered by Ray to support complex tasks like human preference alignment, reasoning, and agentic interactions. The framework integrates advanced technologies such as Megatron-Core, vLLM, and SGLang to accelerate model training and inference across diverse hardware environments.

NousResearch / hermes-agent

Hermes Agent is a self-improving AI assistant designed by Nous Research that creates and refines skills through a built-in learning loop. It supports a wide range of LLM providers and can be deployed across various platforms including Telegram, Discord, and local terminal environments. The system features persistent memory, scheduled automations, and the ability to spawn subagents for complex, parallelized tasks.

PaddlePaddle / FastDeploy

FastDeploy is an inference deployment toolkit for large language models and vision-language models based on PaddlePaddle, aiming to provide out-of-the-box production-grade deployment solutions. The toolkit supports various mainstream hardware platforms and integrates core technologies such as load-balanced PD separation, unified KV cache transmission, and full quantization format support. By being compatible with OpenAI API and vLLM interfaces, it helps developers efficiently implement model inference and online service deployment.

alibaba / rtp-llm

RTP-LLM is a high-performance large model inference acceleration engine developed by the Alibaba Foundation Model Inference Team, widely used in various business scenarios such as Taobao and Tmall. By integrating various advanced CUDA kernels and quantization techniques, the engine significantly improves model inference performance and efficiency. Furthermore, it possesses high flexibility, supporting multiple model formats, multimodal inputs, and LoRA service deployment.

toverainc / willow

The Willow Inference Server allows users to self-host high-speed language inference tasks for various applications. It supports essential features including speech-to-text, text-to-speech, and large language model processing. Users can access official documentation and community support through the project's website and GitHub discussions.

openai / openai-agents-python

The OpenAI Agents SDK is a lightweight framework designed for building complex multi-agent workflows. It supports a wide range of LLMs and provides essential features like tool integration, guardrails, and human-in-the-loop capabilities. Developers can also utilize sandbox agents for long-running tasks and leverage built-in tracing to debug and optimize their agentic applications.

meituan / EvoCUA

EvoCUA is a high-performance open-source multimodal model designed for end-to-end computer automation across various desktop applications. It currently holds the top ranking on the OSWorld benchmark and demonstrates superior cross-OS generalization capabilities. Additionally, the model is recognized for its robust safety profile, exhibiting the lowest unintended-behavior rate among leading computer-use agents.

alchaincyf / zhangxuefeng-skill

ZhangXuefeng.skill is a cognitive operating system built on deep research, designed to provide an executable thinking framework rather than a simple collection of quotes. By distilling core mental models, decision heuristics, and communication DNA, the project helps users analyze major selection and career planning from Zhang Xuefeng's perspective. Users can install this skill to obtain targeted decision-making advice and in-depth analysis within Claude Code.

Tencent / AngelSlim

AngelSlim is a highly integrated toolkit designed to provide efficient compression solutions for large language, vision, and diffusion models. It supports a wide range of techniques including advanced quantization, speculative decoding, and token pruning to optimize model performance. The framework offers developers a unified interface for training, deployment, and performance evaluation across various hardware environments.

alibaba / tair-kvcache

Tair KVCache is an Alibaba Cloud system designed to accelerate Large Language Model inference through distributed memory pooling and dynamic multi-level caching. The project provides a centralized manager for global KVCache metadata and storage capacity, ensuring efficient data reliability and resource utilization. Additionally, it includes a high-fidelity simulation tool that allows developers to predict performance metrics without requiring actual GPU resources.

GammaLabTechnologies / harmonist

Harmonist is a portable multi-agent framework that enforces development protocols through mechanical IDE-level hooks rather than relying on LLM prompts. It provides a structured, validated memory system and supply-chain verification to ensure that code changes meet non-negotiable quality and security standards. The framework integrates seamlessly with popular AI coding assistants like Cursor and Claude Code, offering a catalogue of 186 specialized agents without requiring external runtimes or databases.

jnMetaCode / superpowers-zh

superpowers-zh is a Chinese enhanced project that provides systematic working methodologies for 17 mainstream AI coding tools. Building on the full localization of 14 core upstream skills, it adds 6 specialized skills designed for Chinese developers. Through a unified installation command, developers can easily configure field-tested development workflows for tools like Claude Code and Cursor.

XiaoMi / xiaomi-miloco

Xiaomi Miloco is an open-source exploration solution that integrates Xiaomi Home cameras with a self-developed LLM to control IoT devices. It utilizes an on-device model to process visual data for scene understanding while ensuring user privacy and security. Users can define complex home rules and interact with their smart ecosystem using natural language.

deepseek-ai / awesome-deepseek-agent

Awesome DeepSeek Agent is a curated collection of guides for integrating DeepSeek models into various AI coding assistants and agentic tools. Each guide provides step-by-step instructions for installation, configuration, and initial setup to ensure a smooth user experience. Developers can quickly enable DeepSeek-V4-Pro or DeepSeek-V4-Flash within their preferred terminal or editor environments.

// use cases by project

01Token-level sparse attention for prefill and decoding stages
02Dense attention kernels for high-performance prefill and decoding
03FP8 KV cache support for optimized memory and compute efficiency

01Token-level sparse attention for efficient prefill and decoding stages
02Dense attention kernels for standard Multi-Head Attention (MHA) operations
03FP8 KV cache support to optimize memory usage during decoding

01Unified API for 100+ LLM providers
02Production-ready AI Gateway with load balancing and guardrails
03Seamless integration with MCP tools and A2A agents

01Intelligent document parsing for LLM-ready structured data extraction
02Universal multilingual text recognition for natural scene and document analysis
03Building high-quality datasets for fine-tuning Large Language Models

01Self-hosted AI code completion and generation
02Internal knowledge retrieval via the Answer Engine
03Integration with IDEs like VSCode, Vim, and IntelliJ

// comparisons

FlashMLA vs FlashMLA litellm vs FlashMLA PaddleOCR vs FlashMLA voicebox vs willow voicebox vs neutts neutts vs willow

// related topics

Automation (29)Agent (21)Python (18)AI Agents (18)Inference (12)