✍️ NVIDIA Groq 3 LPX: SRAMファーストの低レイテンシ推論アクセラレータの技術詳細

本記事は Inside NVIDIA Groq 3 LPX: The Low-Latency Inference Accelerator for the NVIDIA Vera Rubin Platform の解説記事です。ブログ概要（Summary） NVIDIA Groq 3 LPUは、NVIDIAが2025年12月にGroq社の推論技術IPを$20Bで取得して開発した、SRAM中...

27/03/2026 blog tech_blog

NVIDIA Groq LPU +4

📄 論文解説: Agent Laboratory — LLMエージェントを研究アシスタントとして活用する

本記事は arXiv:2501.04306 “Agent Laboratory: Using LLM Agents as Research Assistants” の解説記事です。論文概要（Abstract） Schmidgall et al.（2025年1月）は、科学研究の全サイクル — 文献調査、実験設計・実装・実行、論文執筆 — をLLMエージェントのチームで自動化するフレームワ...

27/03/2026 blog paper

Agent-Laboratory LLM-agent research-automation +3

✍️ SkyPilotブログ解説: Karpathy's AutoResearchをGPUクラスタにスケーリングした結果

本記事は SkyPilot公式ブログ “Scaling Karpathy’s Autoresearch: What Happens When the Agent Gets a GPU Cluster” の解説記事です。ブログ概要（Summary） SkyPilotチーム（2026年3月公開）は、Andrej KarpathyのAutoResearchを単一GPUからKubernetes...

27/03/2026 blog tech_blog

SkyPilot autoresearch GPU-cluster +4

📄 論文解説: LUT-LLM — ルックアップテーブルによるFPGA上のLLM推論高速化

本記事は LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs の解説記事です。論文概要（Abstract） LUT-LLMは、FPGAの豊富なオンチップメモリリソースを活用し、LLM推論における演算をルックアップテーブル（LUT）参照に変換する手法を提案した研...

27/03/2026 blog paper

FPGA LLM inference +3

📄 論文解説: The AI Scientist — 完全自動オープンエンド科学的発見に向けて

本記事は arXiv:2408.06292 “The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery” の解説記事です。論文概要（Abstract） Lu, Lu, Lange, Foerster, Clune, Ha（Sakana AI / University of Oxford / Uni...

27/03/2026 blog paper

AI-Scientist autonomous-research LLM +3

📄 論文解説: FlightLLM — FPGAによるLLM推論の完全マッピングフロー

本記事は FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs の解説記事です。論文概要（Abstract） FlightLLMは、FPGAの固有リソース（DSP48、異種メモリ階層）を最大限に活用し、LLM推論をFPGA上で効率的に実行するための完全なマッピン...

27/03/2026 blog paper

FPGA LLM inference +3

✍️ Google Research解説: Titans + MIRAS — AIに長期記憶を与える統一フレームワーク

本記事は Titans + MIRAS: Helping AI have long-term memory - Google Research Blog の解説記事です。ブログ概要（Summary） Google Researchは、大規模言語モデル（LLM）に長期記憶を持たせるための新しいアーキテクチャ「Titans」と、それを一般化した統一フレームワーク「MIRAS」を公式ブログで...

25/03/2026 blog tech_blog

LLM memory Google-Research +3

📄 論文解説: vLLM-MLX — Apple Silicon統合メモリを活かしたLLM推論の高速化

本記事は Native LLM and MLLM Inference at Scale on Apple Silicon (arXiv:2601.19139) の解説記事です。論文概要（Abstract） vLLM-MLXは、Apple SiliconのMLXフレームワークをネイティブバックエンドとしたLLM/MLLM推論フレームワークである。テキストモデルでllama.cpp比21〜...

25/03/2026 blog paper

llm localllm llamacpp +2

📄 論文解説: Mini-Omni — ストリーミング音声出力を実現する並列デコーディングLLM

本記事は Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming (arXiv:2407.06225) の解説記事です。論文概要（Abstract） Mini-Omniは、音声入力から音声出力をテキスト中間変換なしにストリーミング生成する「any-to-any」LLMである。著者らは、テキストトークンと...

25/03/2026 blog paper

streaming-speech parallel-decoding voice-adapter +3

📄 NeurIPS 2024論文解説: MoE推論の効率化に向けた3つの最適化手法

論文概要本記事は NeurIPS 2024で発表された論文「Toward Efficient Inference for Mixture of Experts」の解説記事です。論文URL: https://papers.nips.cc/paper_files/paper/2024/hash/98bf3b8505c611ac21055dd9d355c66e-Abstract-Co...

25/03/2026 blog paper

MoE dynamic-gating expert-buffering +7

1
...
9
10
11
...
84
10 / 84