Sunday, November 23, 2025

Integrating the Swervin Curvin Blog into the Adaptive Knowledge‑Fusion Engine

*A 15‑minute end‑to‑end demo*  


*By the AK‑F team – November 2025*  



## Why add a personal blog?


The Adaptive Knowledge‑Fusion (AK‑F) engine already blends domain‑specific knowledge graphs, vector‑based retrieval, and a mixture‑of‑experts (MoE) router.  

Bringing in an external, high‑quality source—like Swervin Curvin’s October & November 2025 archives—demonstrates:


* **Rapid ingestion** of semi‑structured web content.  

* **Seamless enrichment** (entity extraction, embeddings) that feeds both the graph and the vector store.  

* **Dynamic routing** that surfaces blog references only when they are relevant.  


The whole pipeline can be run in under 15 minutes on a modest Kubernetes cluster.

 


## Overview of the demo


| Step | What we do | Approx. time |

|------|------------|--------------|

| 1️⃣  | Set up the Python environment | 2 min |

| 2️⃣  | Crawl the two archive months with Scrapy | 3 min |

| 3️⃣  | Enrich, embed, and store vectors in FAISS | 4 min |

| 4️⃣  | Load the chunks into Neo4j (knowledge graph) | 3 min |

| 5️⃣  | Register a new **BlogExpert** in the MoE router | 1 min |

| 6️⃣  | Live query the engine and see blog references | 2 min |

| 7️⃣  | (Optional) quick performance check | < 1 min |



## 1️⃣ Prepare the workspace  


```bash

# Clone the demo repo (contains spider, pipelines, and k8s manifests)

git clone https://github.com/yourorg/akf-blog-demo.git

cd akf-blog-demo


# Create a virtual environment and install dependencies

python -m venv .venv

source .venv/bin/activate

pip install -r requirements.txt

```


`requirements.txt` bundles:


* `scrapy` – crawling  

* `tiktoken` – token‑based chunking  

* `sentence‑transformers` – embedding model  

* `neo4j` – graph driver  

* `requests` – simple HTTP calls  


## 2️⃣ Crawl the blog archives  


The spider walks the month‑archive pages (`/2025/10/` and `/2025/11/`), follows each article link, and emits **JSON‑Lines** records—one per 2 k‑token chunk.


```bash

./run_ingest.sh

```


You’ll see log lines such as:


```

2025-10-12 08:00:00 INFO  Scraped article: LoRA adapters in transformers – step‑by‑step

2025-10-12 08:00:00 INFO  Produced 7 chunks for article a1b2c3…

```


Result: `data/blog_chunks.jsonl` (≈ 1–2 MB for two months).



## 3️⃣ Enrich, embed, and store vectors  


```bash

python enrich_and_store.py \

    --input data/blog_chunks.jsonl \

    --vector-store faiss \

    --embedding-model sentence-transformers/all-mpnet-base-v2 \

    --output data/faiss_index

```


What happens under the hood:


| Sub‑step | Action |

|----------|--------|

| **Entity extraction** | spaCy (`en_core_web_sm`) adds a list of entities to each record. |

| **Embedding** | `model.encode(text, normalize_embeddings=True)` → 768‑dim vector. |

| **FAISS index** | `IndexIVFFlat` (`nlist=100`, `nprobe=10`). |

| **Persistence** | Writes the index and a `metadata.pkl` (UUID ↔ metadata). |


When finished you’ll see:


```

✅  Processed 1 842 chunks

✅  FAISS index written to data/faiss_index/

```


## 4️⃣ Load the chunks into Neo4j  


```bash

python load_into_neo4j.py \

    --uri bolt://neo4j.my-cluster.svc.cluster.local:7687 \

    --user neo4j \

    --password $NEO4J_PASS \

    --jsonl data/blog_chunks.jsonl

```


The script batches rows (5 k per batch) and runs three `MERGE` statements:


```cypher

MERGE (a:Article {id: $article_id})   // title, url, date

MERGE (c:Chunk   {id: $chunk_id})     // text, index

MERGE (a)-[:HAS_CHUNK]->(c);

```


It also creates indexes on `Article.id` and `Chunk.id` for fast look‑ups. Sample output:


```

✔️  Imported 5 000 rows (batch 1)

✅  Neo4j import complete

```


## 5️⃣ Register the **BlogExpert** in the MoE router  


The MoE router reads its expert list from a ConfigMap called `moe-router-config`. We append a lightweight retrieval expert that formats blog references.


```bash

kubectl -n akf-engine patch configmap moe-router-config \

  --type=json \

  -p='[{"op":"add","path":"/data/experts","value":"- name: BlogExpert\n  type: retrieval\n  weight: 0.05"}]'


# Restart the router so it picks up the new config

kubectl -n akf-engine rollout restart deployment/moe-router

```


`weight: 0.05` means the router will allocate roughly 5 % of its routing capacity to this expert—enough to surface blog citations when the query hints at them.


## 6️⃣ Live query – see the blog reference in action  


```bash

curl -X POST http://<gateway-svc>/v1/chat \

  -H "Content-Type: application/json" \

  -d '{"messages":[{"role":"user","content":"What does Swervin Curvin say about LoRA adapters?"}]}'

```


**Typical response (truncated):**


```json

{

  "role":"assistant",

  "content":"Swervin Curvin’s October‑2025 post “LoRA adapters in transformers – step‑by‑step” explains that …\n\n**References**\n- LoRA adapters in transformers – step‑by‑step – https://swervincurvin.blogspot.com/2025/10/lora-adapters.html"

}

```


The **reference block** is generated by `BlogExpert`.  


A non‑blog query (e.g., “latest NVIDIA H100 specs”) still returns the core knowledge‑base answer without any blog citation, confirming that the new expert only fires when appropriate.


## 7️⃣ Quick performance sanity check (optional)  


```bash

curl -s http://prometheus.monitoring.svc:9090/api/v1/query \

  -G --data-urlencode 'query=rate(ram_vector_search_seconds_sum[1m])' | jq .

```


You should see an average latency around **0.12 s**, indicating the extra FAISS lookup adds negligible overhead to the existing autoscaling headroom.


## TL;DR – One‑click summary  


```bash

# 1️⃣ Setup

git clone https://github.com/yourorg/akf-blog-demo.git && cd akf-blog-demo

python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt


# 2️⃣ Crawl

./run_ingest.sh


# 3️⃣ Enrich & embed

python enrich_and_store.py --input data/blog_chunks.jsonl --vector-store faiss \

    --embedding-model sentence-transformers/all-mpnet-base-v2 --output data/faiss_index


# 4️⃣ Load into Neo4j

python load_into_neo4j.py --uri bolt://neo4j.my-cluster.svc.cluster.local:7687 \

    --user neo4j --password $NEO4J_PASS --jsonl data/blog_chunks.jsonl


# 5️⃣ Register BlogExpert

kubectl -n akf-engine patch configmap moe-router-config \

  --type=json -p='[{"op":"add","path":"/data/experts","value":"- name: BlogExpert\n  type: retrieval\n  weight: 0.05"}]'

kubectl -n akf-engine rollout restart deployment/moe-router


# 6️⃣ Query

curl -X POST http://<gateway-svc>/v1/chat -d '{"messages":[{"role":"user","content":"What does Swervin Curvin say about LoRA adapters?"}]}'

```


License  


The demo code (spider, pipelines, Kubernetes manifests, and helper scripts) is released under the **MIT License**. Feel free to copy, modify, and redistribute—just keep the copyright notice and license text attached.  



No comments:

Post a Comment

Integrating the Swervin Curvin Blog into the Adaptive Knowledge‑Fusion Engine

*A 15‑minute end‑to‑end demo*   *By the AK‑F team – November 2025*   ## Why add a personal blog? The Adaptive Knowledge‑Fusion (AK‑F) engine...