✍️ Vespa公式解説: Matryoshka × Binary Quantization — ベクトル検索コストを64倍削減する実装手法

本記事は Vespa公式ブログ: Matryoshka + Binary vectors: Slash vector search costs with Vespa の解説記事です。ブログ概要（Summary） Vespa社のChief ScientistであるJo Kristian Bergum氏による本ブログ記事は、Matryoshka Representation Learnin...

31/05/2026 blog tech_blog

vespa vectordb matryoshka +3

📄 ICML 2025論文解説: EPD Disaggregation — マルチモーダルモデルの推論を3ステージに分離して効率化

論文概要本記事は、ICML 2025に採択された論文「Efficiently Serving Large Multimodal Models Using EPD Disaggregation」（Singh et al.）の解説記事です。著者自身が実験を行ったわけではなく、論文の内容を引用・解説しています。 Large Multimodal Model（LMM）はテキストに加えて画像・動...

31/05/2026 blog paper

multimodal llm-serving inference-optimization +2

✍️ Qdrant公式解説: Binary Quantization — ベクトル検索を40倍高速化する量子化手法

本記事は Qdrant公式ブログ: Binary Quantization — Vector Search, 40x Faster の解説記事です。ブログ概要（Summary） Qdrant社のNirant Kasliwal氏による本ブログ記事は、Binary Quantization（BQ）をQdrantベクトルDBに実装した際の技術的詳細とベンチマーク結果を報告している。BQはfl...

31/05/2026 blog tech_blog

qdrant vectordb binary-quantization +3

✍️ Google Cloud公式ブログ解説: Cloud Runのstartup CPU boostでコールドスタートを最大50%削減

本記事について本記事は、Google Cloudの公式ブログ記事 New startup CPU boost improves cold starts in Cloud Run, Cloud Functions（著者: Steren Giannini氏、Google Cloud Director of Product Management、2022年9月27日公開）の技術解説記事です。本...

31/05/2026 blog tech_blog

cloud-run cold-start gcp +2

📄 論文解説: RAGシステムのEmbeddingストレージ最適化 — 量子化×次元削減の体系的評価

本記事は Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques (arXiv:2505.00105) の解説記事です。論文概要（Abstract）本論文は、RAG（Retrieval-Augmented Generati...

31/05/2026 blog paper

embedding quantization rag +3

📄 ICDCS 2025論文解説: SlimStart — プロファイルガイド最適化によるサーバーレスコールドスタート削減

論文概要本記事は、ICDCS 2025（45th IEEE International Conference on Distributed Computing Systems）に採択された論文 “Efficient Serverless Cold Start: Reducing Library Loading Overhead by Profile-guided Optimization...

31/05/2026 blog paper

serverless cold-start profile-guided-optimization +2

📄 EMNLP 2025論文解説: SMEC — Matryoshka表現学習と量子化の共同最適化

本記事は SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression (arXiv:2510.12474) の解説記事です。論文概要（Abstract） SMEC（Sequential Matryoshka Embedding Compression）は、Matryosh...

31/05/2026 blog paper

embedding matryoshka quantization +3

📄 論文解説: ModServe — マルチモーダルモデルサービングのためのモダリティ対応リソース分離

論文概要本記事は ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving (Qiu et al., ACM SoCC 2025) の解説記事です。本記事は引用・解説であり、筆者が独自に実験を行ったものではありません。 ModServeは、大規模マル...

31/05/2026 blog paper

multimodal llm-serving inference-optimization +2

📄 論文解説: Serverless Cold Starts and Where to Find Them

論文概要本記事は Serverless Cold Starts and Where to Find Them (arXiv:2410.06145) の解説記事です。本記事は引用・解説であり、筆者自身による実験は行っていません。 Joosen らは、Huawei の本番サーバーレスプラットフォーム（YuanRong）から収集した31日間・5データセンターリージョンにわたる85億リクエスト...

31/05/2026 blog paper

serverless cold-start cloud-run +2

📄 NeurIPS 2022論文解説: Matryoshka Representation Learning

本記事は Matryoshka Representation Learning (arXiv:2205.13147) の解説記事です。論文概要（Abstract） Matryoshka Representation Learning（MRL）は、単一のEmbeddingモデルから複数粒度の次元ベクトルを同時に学習する手法である。著者らは、通常の学習パイプラインに最小限の変更を加えるだけ...

31/05/2026 blog paper

embedding matryoshka representation-learning +3

1
...
21
22
23
...
147
22 / 147