Swervin’ Curvin: Integrating the Swervin Curvin Blog into the Adaptive Knowledge‑Fusion Engine

*A 15‑minute end‑to‑end demo*

*By the AK‑F team – November 2025*

## Why add a personal blog?

The Adaptive Knowledge‑Fusion (AK‑F) engine already blends domain‑specific knowledge graphs, vector‑based retrieval, and a mixture‑of‑experts (MoE) router.

Bringing in an external, high‑quality source—like Swervin Curvin’s October & November 2025 archives—demonstrates:

* **Rapid ingestion** of semi‑structured web content.

* **Seamless enrichment** (entity extraction, embeddings) that feeds both the graph and the vector store.

* **Dynamic routing** that surfaces blog references only when they are relevant.

The whole pipeline can be run in under 15 minutes on a modest Kubernetes cluster.

## Overview of the demo

| Step | What we do | Approx. time |

|------|------------|--------------|

| 1️⃣ | Set up the Python environment | 2 min |

| 2️⃣ | Crawl the two archive months with Scrapy | 3 min |

| 3️⃣ | Enrich, embed, and store vectors in FAISS | 4 min |

| 4️⃣ | Load the chunks into Neo4j (knowledge graph) | 3 min |

| 5️⃣ | Register a new **BlogExpert** in the MoE router | 1 min |

| 6️⃣ | Live query the engine and see blog references | 2 min |

| 7️⃣ | (Optional) quick performance check | < 1 min |

## 1️⃣ Prepare the workspace

```bash

# Clone the demo repo (contains spider, pipelines, and k8s manifests)

git clone https://github.com/yourorg/akf-blog-demo.git

cd akf-blog-demo

# Create a virtual environment and install dependencies

python -m venv .venv

source .venv/bin/activate

pip install -r requirements.txt

```

`requirements.txt` bundles:

* `scrapy` – crawling

* `tiktoken` – token‑based chunking

* `sentence‑transformers` – embedding model

* `neo4j` – graph driver

* `requests` – simple HTTP calls

## 2️⃣ Crawl the blog archives

The spider walks the month‑archive pages (`/2025/10/` and `/2025/11/`), follows each article link, and emits **JSON‑Lines** records—one per 2 k‑token chunk.

```bash

./run_ingest.sh

```

You’ll see log lines such as:

```

2025-10-12 08:00:00 INFO Scraped article: LoRA adapters in transformers – step‑by‑step

2025-10-12 08:00:00 INFO Produced 7 chunks for article a1b2c3…

```

Result: `data/blog_chunks.jsonl` (≈ 1–2 MB for two months).

## 3️⃣ Enrich, embed, and store vectors

```bash

python enrich_and_store.py \

--input data/blog_chunks.jsonl \

--vector-store faiss \

--embedding-model sentence-transformers/all-mpnet-base-v2 \

--output data/faiss_index

```

What happens under the hood:

| Sub‑step | Action |

|----------|--------|

| **Entity extraction** | spaCy (`en_core_web_sm`) adds a list of entities to each record. |

| **Embedding** | `model.encode(text, normalize_embeddings=True)` → 768‑dim vector. |

| **FAISS index** | `IndexIVFFlat` (`nlist=100`, `nprobe=10`). |

| **Persistence** | Writes the index and a `metadata.pkl` (UUID ↔ metadata). |

When finished you’ll see:

```

✅ Processed 1 842 chunks

✅ FAISS index written to data/faiss_index/

```

## 4️⃣ Load the chunks into Neo4j

```bash

python load_into_neo4j.py \

--uri bolt://neo4j.my-cluster.svc.cluster.local:7687 \

--user neo4j \

--password $NEO4J_PASS \

--jsonl data/blog_chunks.jsonl

```

The script batches rows (5 k per batch) and runs three `MERGE` statements:

```cypher

MERGE (a:Article {id: $article_id}) // title, url, date

MERGE (c:Chunk {id: $chunk_id}) // text, index

MERGE (a)-[:HAS_CHUNK]->(c);

```

It also creates indexes on `Article.id` and `Chunk.id` for fast look‑ups. Sample output:

```

✔️ Imported 5 000 rows (batch 1)

…

✅ Neo4j import complete

```

## 5️⃣ Register the **BlogExpert** in the MoE router

The MoE router reads its expert list from a ConfigMap called `moe-router-config`. We append a lightweight retrieval expert that formats blog references.

```bash

kubectl -n akf-engine patch configmap moe-router-config \

--type=json \

-p='[{"op":"add","path":"/data/experts","value":"- name: BlogExpert\n type: retrieval\n weight: 0.05"}]'

# Restart the router so it picks up the new config

kubectl -n akf-engine rollout restart deployment/moe-router

```

`weight: 0.05` means the router will allocate roughly 5 % of its routing capacity to this expert—enough to surface blog citations when the query hints at them.

## 6️⃣ Live query – see the blog reference in action

```bash

curl -X POST http://<gateway-svc>/v1/chat \

-H "Content-Type: application/json" \

-d '{"messages":[{"role":"user","content":"What does Swervin Curvin say about LoRA adapters?"}]}'

```

**Typical response (truncated):**

```json

{

"role":"assistant",

"content":"Swervin Curvin’s October‑2025 post “LoRA adapters in transformers – step‑by‑step” explains that …\n\n**References**\n- LoRA adapters in transformers – step‑by‑step – https://swervincurvin.blogspot.com/2025/10/lora-adapters.html"

}

```

The **reference block** is generated by `BlogExpert`.

A non‑blog query (e.g., “latest NVIDIA H100 specs”) still returns the core knowledge‑base answer without any blog citation, confirming that the new expert only fires when appropriate.

## 7️⃣ Quick performance sanity check (optional)

```bash

curl -s http://prometheus.monitoring.svc:9090/api/v1/query \

-G --data-urlencode 'query=rate(ram_vector_search_seconds_sum[1m])' | jq .

```

You should see an average latency around **0.12 s**, indicating the extra FAISS lookup adds negligible overhead to the existing autoscaling headroom.

## TL;DR – One‑click summary

```bash

# 1️⃣ Setup

git clone https://github.com/yourorg/akf-blog-demo.git && cd akf-blog-demo

python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt

# 2️⃣ Crawl

./run_ingest.sh

# 3️⃣ Enrich & embed

python enrich_and_store.py --input data/blog_chunks.jsonl --vector-store faiss \

--embedding-model sentence-transformers/all-mpnet-base-v2 --output data/faiss_index

# 4️⃣ Load into Neo4j

python load_into_neo4j.py --uri bolt://neo4j.my-cluster.svc.cluster.local:7687 \

--user neo4j --password $NEO4J_PASS --jsonl data/blog_chunks.jsonl

# 5️⃣ Register BlogExpert

kubectl -n akf-engine patch configmap moe-router-config \

--type=json -p='[{"op":"add","path":"/data/experts","value":"- name: BlogExpert\n type: retrieval\n weight: 0.05"}]'

kubectl -n akf-engine rollout restart deployment/moe-router

# 6️⃣ Query

curl -X POST http://<gateway-svc>/v1/chat -d '{"messages":[{"role":"user","content":"What does Swervin Curvin say about LoRA adapters?"}]}'

```

License

The demo code (spider, pipelines, Kubernetes manifests, and helper scripts) is released under the **MIT License**. Feel free to copy, modify, and redistribute—just keep the copyright notice and license text attached.

Swervin’ Curvin

Sunday, November 23, 2025

Integrating the Swervin Curvin Blog into the Adaptive Knowledge‑Fusion Engine

No comments:

Post a Comment

Sovereign Node 1391: The Future of Personal Data Control

Search This Blog

SwervinCurvin

Sunday, November 23, 2025

Integrating the Swervin Curvin Blog into the Adaptive Knowledge‑Fusion Engine

No comments:

Post a Comment

Sovereign Node 1391: The Future of Personal Data Control

SwervinCurvin

Integrating the Swervin Curvin Blog into the Adaptive Knowledge‑Fusion Engine