✍️ ブログ解説: llama.cppによるMoEモデルのCPU+GPUハイブリッド推論最適化ガイド

本記事は Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp の解説記事です。ブログ概要（Summary） Doctor ShotgunとGeechanが2026年1月30日にHugging Face Blogで公開した本ガイドは、Mixture-of-Experts（...

25/03/2026 blog tech_blog

llama-cpp MoE CPU-offloading +7

✍️ Google Developers Blog解説: Gemini 2.0 Multimodal Live APIの技術的基盤と設計思想

本記事は Gemini 2.0: Level Up Your Apps with Real-Time Multimodal Interactions - Google Developers Blog の解説記事です。ブログ概要（Summary） Googleは、Gemini 2.0世代のMultimodal Live APIを発表した。このAPIは、テキスト・音声・映像を双方向にリアル...

25/03/2026 blog tech_blog

gemini google multimodal +5

📄 論文解説: FluxMem — Beta混合モデルによる適応的メモリ構造選択フレームワーク

論文概要（Abstract）本記事は https://arxiv.org/abs/2602.14038 の解説記事です。 FluxMemは、LLMエージェントのメモリシステムにおいて「どの構造で記憶するか」を会話コンテキストに応じて動的に選択するフレームワークである。3層のメモリ階層（短期・中期・長期）と3種のメモリ構造（線形・グラフ・階層）を組み合わせ、Beta混合モデル（BMM）に...

25/03/2026 blog paper

LLM agent memory +2

📄 論文解説: HOBBIT — 混合精度エキスパートオフローディングによるMoE推論の高速化

本記事は HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference (arXiv:2411.01433) の解説記事です。論文概要（Abstract） HOBBITは、MoEモデルの推論においてエキスパートの重みをGPUメモリ外（CPU RAM/SSD）にオフロードする際の速度低下を、混合精度ロー...

25/03/2026 blog paper

llm mixture-of-experts localllm +2

✍️ OpenAI Realtime API解説: GPT-realtimeモデルによる本番環境向けリアルタイム音声エージェント

本記事は OpenAI: Introducing gpt-realtime and Realtime API updates for production voice agents および Updates for developers building with voice の解説記事です。ブログ概要（Summary） OpenAIは2024年10月にRealtime APIのパブリッ...

25/03/2026 blog tech_blog

openai realtime-api voice-agent +4

📄 論文解説: Memori — セマンティックトリプルで実現する高精度・低コストLLMエージェント記憶層

本記事は Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents (arXiv:2603.19935) の解説記事です。論文概要（Abstract） Memoriは、LLMエージェントの長期記憶をデータ構造化問題として再定義した永続記憶層である。著者らは、非構造化の対話履歴をセマンティックトリプ...

25/03/2026 blog paper

LLM agent memory +2

📄 論文解説: MoEモデルのSSDオフロードはエネルギー効率で有害か

本記事は SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency（Kyung, Yun, Ahn, 2025）の解説記事です。論文概要 Mixture-of-Experts（MoE）モデルは、パラメータ数に対して推論時に活性化されるエキスパートの割合が低いため、S...

25/03/2026 blog paper

MoE SSD-offloading energy-efficiency +7

📄 論文解説: DeepSeekMoE — 細粒度エキスパート分割と共有エキスパートによるMoE効率化

本記事は DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models (arXiv:2401.04088) の解説記事です。論文概要（Abstract） DeepSeekMoEは、従来のMoE（Mixture of Experts）アーキテクチャにおけるエキスパートの...

25/03/2026 blog paper

llm mixture-of-experts qwen +2

📄 論文解説: A-MAC — 5因子構造化判定によるLLMエージェントの適応的メモリ入力制御

本記事は Adaptive Memory Admission Control for LLM Agents（Zhang, Jiang et al., 2026）の解説記事です。論文概要本論文は、LLMエージェントの長期記憶においてどの情報をメモリに保存すべきかを判断する「メモリ入力制御（Memory Admission Control）」問題に対し、5つの解釈可能な因子による適応的ス...

25/03/2026 blog paper

LLM agent memory +2

📄 論文解説: VITA-1.5 — GPT-4oレベルのリアルタイム映像×音声対話を7Bモデルで実現

本記事は VITA-1.5: Towards GPT-4o Level Real-time Vision and Speech Interaction (arXiv:2501.12186) の解説記事です。論文概要（Abstract） VITA-1.5は、映像・画像・音声・テキストの4つのモダリティをリアルタイムで統合処理するオープンソースのマルチモーダルLLMである。著者らは、前身の...

25/03/2026 blog paper

multimodal-llm real-time-interaction vision-speech +3

1
...
10
11
12
...
84
11 / 84