Galetech RAG - How a Query Works

Technician Asks a Question

The user types a question into the chat interface. They can optionally pre-select a site and turbine type using the filter dropdowns above the chat, which narrows all lookups to that context.

Example 1: "What does alarm 176 mean on a VMP5000?"
Example 2: "I am at Kilronan with a V39 and I have got error code 1"
Example 3: "What was found on site for error 176 at DKIT?"
Example 4: [Filter: Site=DKIT, Turbine=Vestas V52] "What about error 176?"

Extract Key Information

The system combines filter selections (if any) with six extractors that scan the question text:

Filter overrides — if the user selected a site or turbine type from the dropdowns, those take priority over text extraction
extractSite() checks the text against all known windfarm names from MMS_Data.csv
extractTurbineType() matches full names like "Vestas V39" or short codes like "V39", and also recognises VMP references like "VMP5000"
extractController() matches VMP-prefixed controller types (e.g. "VMP5000", "VMP3500") or bare numbers (e.g. "5000") and resolves them to the canonical VMP form
extractMake() identifies manufacturer names like "Gamesa", "Enercon", "GE", "Vestas", or "NEG-Micon" (including aliases)
extractTurbineNumber() identifies specific turbine identifiers like "T1", "T02", "turbine 3" from the text
extractErrorCodes() finds 1-4 digit numbers, but first strips out model references (V39, VMP5000) and turbine numbers so they are not mistaken for error codes

User Query	Filters	Site	Turbine	Controller	Make	Codes
"alarm 176 on VMP5000"	-	-	VMP5000	VMP5000	-	[176]
"error 176?"	Site=DKIT	DKIT	-	-	-	[176]
"which sites have vmp3500"	-	-	VMP3500	VMP3500	-	-
"what gamesa sites do we have"	-	-	-	-	Gamesa	-
"Kilronan V39 error 1"	-	Kilronan	V39	VMP4400	-	[1]
"3 turbines with error 42"	-	-	-	-	-	[42] ← "3" ignored

Look Up Data in All 3 Sources

The lookup(site, turbineType, errorCode, keywords, controller, make, turbineNumber) function searches all three data sources simultaneously using all extracted parameters. A unified VMP controller naming system (e.g. VMP5000, VMP3500) links across all sources:

Sites

Manual (alarm codes & suggestions)

Technician Comments (MMS history)

Columns

Site, Make, Model, Controller, Count

How it's searched

Filters by site name, turbine type (Model or Make), controller (VMP type), and manufacturer.

Sample data

Site	Make	Model	Controller	Count
DKIT	Vestas	V52	VMP5000	1
Tursillagh I	Vestas	V47	VMP3500	23
Lahanaght	Vestas	V52	VMP5000.2	3

Columns

Control manufacturer, Turbine manufacturer, Type, Alarm code, Alarm code meaning, Comment, Description, Vestas Alarm suggestion, On-Site suggestion, Link to document, Status

How it's searched

Filters by error code (matched against Alarm code) and controller type (matched against Type, e.g. VMP5000). When no specific error code is given, the system prioritises manual entries for the most common MMS errors so the LLM has meanings for the errors it's reporting on. Reference documents (PDFs from the Vestas Service DVD) are resolved to clickable Google Drive links.

Sample data

Type	Alarm code	Alarm code meaning	On-Site suggestion	Documents
VMP5000	176	US error, Turbine paused	Check US windsensor, cable, CT3218 pos x3	176 - US error.pdf
VMP5000	38	No comm. with hub	Check voltage supply of T54C, ARCNET, hub processor	-

Columns

windfarm, turbine, date, turbine_type, error_code, error, comment, Controller

How it's searched

Filters by site, turbine type, error code, controller (VMP type), turbine number (e.g. T1, T02), and keywords in comments. When no specific error code is given and many results exist, the system computes an error frequency summary (top 15 errors by occurrence count) so the LLM can answer "most common" questions from real data.

Sample data

windfarm	turbine	date	turbine_type	error_code	comment	Controller
DKIT	T1	2012-02-02	Vestas V52	176	Found loose connection on auxiliary switch in top cabinet	VMP5000
Arigna	T2	2012-09-06	Vestas V42	182	New contactor ABB A16-30-10	VMP3500

Frequency summary (example for VMP3500)

Error	Meaning	Occurrences
181	Feedback Yaw CW	7
80	Low gear oil pressure	4
166	Thermoerror hydraulicmotor	4

Did We Find Data?

The system checks if the strict search (all filters combined) returned any results from any of the 3 sources.

Yes — Data Found

Combine results from all matching sources into one context block. For multi-code queries (e.g. "difference between 169 and 172"), each code's results are appended separately.

Confidence: HIGH

No — No Match

Broaden the search by progressively dropping filters:

Try 1: site + controller + make (drop turbine type)
Try 2: turbine + controller + make (drop site)
Try 3: site + make (drop controller & turbine)
Try 4: controller only (drop site, turbine, make)
Try 5: error code only (drop all filters)

Uses the first combination that returns results. If all fail, a comment keyword search is tried (preserving site/controller context first, then broadening). Confidence level is communicated to the LLM so it can inform the technician.

Build the Prompt

The system assembles the full message to send to the LLM, combining:

System Prompt — the rules the LLM must follow (use friendly source names: "Manual", "Technician Comments", "Sites")
Retrieved Data — all the results from the lookup step, including resolved PDF document links
Confidence Note — tells the LLM whether results are exact or broadened
Chat History + Question — previous messages plus the current question

You are a Galetech wind turbine fault diagnosis assistant.
You help technicians diagnose and resolve turbine alarm
codes using ONLY the data provided to you.

Rules:
1. ONLY answer based on the data provided. You CAN reason
   over it (count, compare, summarise), but NEVER invent
   facts. If unsure, say "I don't have enough information."

1b. NEVER make vague generalisations like "technicians
    frequently check...", "comments often mention...", or
    "common issues include..." unless directly quoting or
    summarising specific records. Every claim must be
    traceable to a specific record. If you have a frequency
    summary, use exact counts — do not estimate or round.
    If you don't have frequency data, say so.

2. It is ALWAYS better to say you don't know than to
   fabricate an answer. If the data is insufficient, ask
   the technician to rephrase with an alarm code, site,
   or turbine type.

3. Be direct and practical. Give alarm meaning, what to
   check, and suggestions from the data.

4. EVERY answer must end with source tags:
   (Source: Manual), (Source: Technician Comments),
   (Source: Sites). Never use raw CSV filenames.

5. Include relevant technician comments from MMS history.
5b. MUST include ALL reference document links as clickable
    markdown. NEVER omit them.

6. Acknowledge CONFIDENCE NOTES (MEDIUM/LOW).

7. End with 2-3 follow-up questions phrased as if the
   technician is asking (first person).

For query: "What does alarm 176 mean on a VMP5000?"

[System Message]
  {System Prompt rules above}

  --- RETRIEVED DATA ---
  --- Data for error code 176 ---
  ## manual_data.csv
  ### Alarm 176 - US error, Turbine paused
  - Type: VMP5000
  - Description: Communication error from the
    US "ultra sonic" windsensor to the top controller.
  - On-Site suggestion: US windsensor, Cable to the
    windsensor, CT modul CT3218 Pos x3...
  - Reference documents: [176 - US error.pdf](https://drive.google.com/...),
    [925482.pdf](https://drive.google.com/...)

[User Message]
  What does alarm 176 mean on a VMP5000?

Send to OpenAI (GPT-4.1)

The assembled prompt is sent to OpenAI's GPT-4.1 model via the API. The LLM reads the retrieved data and generates an answer grounded only in that data. The response streams back to the user word by word.

Model: gpt-4.1
Temperature: 0.3 (low = more factual, less creative)
Max tokens: 2048
Streaming: enabled (answer appears word by word)

Technician Gets the Answer

The answer appears in the chat interface with:

Clickable source badges — "(Source: Manual)" and "(Source: Technician Comments)" become interactive buttons that open a modal showing the raw data rows that were used
Reference document links — PDF filenames from the Vestas Service DVD are rendered as clickable links that open directly in Google Drive
Follow-up suggestions — 2-3 suggested questions appear as buttons below the answer for quick follow-up

The technician can also browse all data directly in the TechNotes page, which provides filterable tables for all three data sources with date range filtering, controller filtering, and more.

Worked Examples

Example 1: Simple alarm code lookup with document links

User asks:

"What does alarm 176 mean on a VMP5000?"

Extraction:

Site: (none) | Turbine: VMP5000 | Error codes: [176]

Lookup: lookup("", "VMP5000", "176") finds a match in Manual. Reference documents resolved from Google Drive. Confidence: HIGH.

LLM Answer:

Alarm 176 is a communication error between the ultrasonic wind sensor and the top controller. It is an auto-reset short fault. Check the US windsensor, cable, and CT3218.

Reference documents: 176 - US error, Turbine Stopped.pdf, 925482.pdf

(Source: Manual)

Example 2: Site + maintenance history

User asks:

"What was found on site for error 176 at DKIT?"

Extraction:

Site: DKIT | Turbine: (none) | Error codes: [176]

Lookup: lookup("DKIT", "", "176") finds results in all 3 sources. Confidence: HIGH.

Sites → DKIT, Vestas V52, Controller VMP5000, 1 turbine
Manual → Alarm 176 meanings for VMP5000 + document links
Technician Comments → DKIT T1, 2012-02-02, Controller VMP5000: "Found loose connection on auxiliary switch"

LLM Answer:

At DKIT T1, a loose connection was found on the auxiliary switch in the top cabinet. A new offline filter motor was also fitted and bled while on site. (Source: Technician Comments)

Example 3: Using chat filters

User selects filters: Site = DKIT, Turbine = Vestas V52

Then asks:

"What about error 176?"

Extraction: Filter overrides take priority.

Site: DKIT (from filter) | Turbine: Vestas V52 (from filter) | Error codes: [176]

Lookup: lookup("DKIT", "Vestas V52", "176") — all three parameters narrow the search. Results are specific to DKIT with V52 turbines. Confidence: HIGH.

LLM Answer: The answer is tailored to DKIT and V52 specifically, rather than showing results across all sites and turbine types.

Example 4: Broadened search fallback

User asks:

"I am at Kilronan with a V39 and I have got error code 1"

Extraction:

Site: Kilronan | Turbine: Vestas V39 | Error codes: [1]

Strict lookup: lookup("Kilronan", "Vestas V39", "1") → no results

Broadened search: Confidence drops to MEDIUM.

Try 1: lookup("Kilronan", "", "1") → found! Returns Manual data for alarm 1 + Kilronan site data

LLM Answer:

Alarm 1 is an "Illegal InitVal data in PROM" error. The turbine enters emergency mode. On-site: reboot the controller, upload new PROM software, and upload parameters with the parameter wizard. Note: this result is from a broadened search and may not be specific to Vestas V39. (Source: Manual)

Example 5: Controller cross-reference with frequency data

User asks:

"Which sites have VMP3500 and what are the most common errors there?"

Extraction:

Site: (none) | Turbine: VMP3500 | Controller: VMP3500 | Error codes: (none)

Lookup: lookup("", "VMP3500", "", ..., "VMP3500", "") finds data in all 3 sources.

Sites → Tursillagh I (23), Arigna (2), Largan Hill (9), Arigna8 (8) — all VMP3500
MMS → filtered to VMP3500 only + frequency summary: Error 181 (7x), Error 80 (4x), Error 166 (4x)...
Manual → prioritised to show meanings for the top MMS errors (181, 80, 166, etc.) with doc links

LLM Answer:

Sites with VMP3500: Tursillagh I (23 turbines), Arigna (2), Largan Hill (9), Arigna8 (8).

Most common errors:
1. Error 181 — Yaw CW feedback (7 occurrences). Technician: "Fitted new yaw contactor."
2. Error 80 — Low gear oil pressure (4 occurrences).
3. Error 166 — Thermoerror hydraulic motor (4 occurrences). Technician: "Found one phase dead on contactor."

(Source: Sites) (Source: Manual) (Source: Technician Comments)

How It All Fits Together

RAG = Retrieval Augmented Generation

Instead of the LLM making up answers, we first retrieve relevant data from Galetech's actual records, then the LLM generates an answer using only that data.

Why Not Just Use the LLM Directly?

GPT-4 does not know Galetech's fault codes, site names, or maintenance history. By feeding it the real data, we get accurate, grounded answers.

Document Links via Google Drive

387 Vestas Service DVD PDFs are hosted on Google Drive. When the Manual references a document, the system resolves the filename to a clickable Drive link so technicians can open the original PDF directly.

Confidence Signals

When an exact match fails, the system progressively relaxes filters and tells both the LLM and the technician how confident the result is.

Smart Extraction

Six extractors parse site, turbine type, controller (VMP), manufacturer, turbine number (e.g. T1), and error codes. "VMP5000", "5000", and "V52" all resolve to the same controller type.

Unified Controller Names

All data sources use standardised VMP-prefixed controller names (VMP3500, VMP5000, VMP5000.2). This lets the system cross-reference manual meanings, technician comments, and site data through a single controller identifier.

Frequency-Based Answers

For "most common" questions, the system computes actual error frequency counts from MMS data and prioritises manual entries to match, so answers are data-driven, not guesswork.

TechNotes: Browse Data Directly

Technicians can also browse all data in filterable tables via the TechNotes page, with filters for site, turbine type, controller, manufacturer, error code, and date range.

How the Chatbot Works

Technician Asks a Question

Extract Key Information

Look Up Data in All 3 Sources

Columns

How it's searched

Sample data

Columns

How it's searched

Sample data

Columns

How it's searched

Sample data

Frequency summary (example for VMP3500)

Did We Find Data?

Build the Prompt

For query: "What does alarm 176 mean on a VMP5000?"

Send to OpenAI (GPT-4.1)

Technician Gets the Answer

Worked Examples

How It All Fits Together

RAG = Retrieval Augmented Generation

Why Not Just Use the LLM Directly?

Document Links via Google Drive

Confidence Signals

Smart Extraction

Unified Controller Names

Frequency-Based Answers

TechNotes: Browse Data Directly