Intelligent Data Automation with LLMs

Philipp Hartenfeller

$whoami

Philipp Hartenfeller

  • DΓΌsseldorf, Germany
  • Oracle APEX Dev since 2016
  • Product Lead at United Codes
  • Web Dev and Databases

Agenda

(Shift + A: jump to Agenda, Shift + O: overview)

The pitch

  • LLMs are not just chatbots
  • They turn unstructured text and images into structured data
  • Perfect for relational databases
  • No Python, we can do it from PL/SQL
  • DB devs have perfect opportunities to become AI engineers

LLM quickstart

What is an LLM?

  • Large Language Model: generates text
  • Simple flow: text input → processing → text output
  • No memory, no learning — that all happens around the LLM
  • One specific instance is called a model; the companies that train them are providers

Conversations without memory?

  • The whole history gets sent again with every request
  • Request 1 → : πŸ‘±πŸΌ Capital France?
  • Request 1 ← : πŸ€– Paris
  • Request 2 → : πŸ‘±πŸΌ Capital France? + πŸ€– Paris + πŸ‘±πŸΌ River?
  • Request 2 ← : πŸ€– Seine

Tokens & context window

  • Text is measured in tokens, not characters or words
  • ~0.75 words per token in English
  • The context window = how much text the model can consider at once (history + system prompt + user message + response)
  • You pay per input token + per output token

Model providers

πŸ‡ΊπŸ‡Έ πŸ‡¨πŸ‡³ πŸ‡ͺπŸ‡Ί
OpenAI: GPT Alibaba: Qwen Mistral
Anthropic: Claude DeepSeek
Google: Gemini MoonshotAI: Kimi
xAI: Grok MiniMax
Amazon: Nova

Models excel at different things — don't lock yourself into one provider.

First PL/SQL call

The inference API

You send a JSON array of messages, you get a text response back. This is just the messages part β€” the full request also includes model, and other parameters. Exact structure differs between providers.

            
              
            
          

Frameworks do the heavy lifting

  • You don't want to handle this manually for each provider
  • Python: LangChain, LlamaIndex
  • JavaScript: Vercel AI
  • Java: LangChain4j, Spring AI
  • Oracle: UC AI, Select AI (23ai/26ai native), APEX AI

Simplest possible call

            
              
            
          

Our demo company: ACME Insurance

  • Fictional mid-size insurance company
  • Offers home, auto, and life insurance products
  • Handles thousands of customer inquiries and claims daily
  • Uses Oracle APEX + PL/SQL as their core platform
  • We will follow ACME through several real-world LLM use cases

Domain Q&A from PL/SQL

            
              
            
          

Output

=== What is a deductible? ===
  A deductible in insurance is the amount of money you must pay
  out-of-pocket before your insurance company starts to cover the
  costs of a claim. For example, if your policy has a $500
  deductible and you file a claim for $2,000, you would pay the
  first $500 and the insurer would cover the remaining $1,500.
  Deductibles are common in auto, health, and homeowners insurance,
  and choosing a higher deductible typically lowers your premium.

Free-form text. Great for chat — but what about our relational world?

Structured output

The problem

  • Free-form text is unpredictable and hard to parse
  • For databases and automation we want structure
  • How do we tell the LLM exactly what shape we want back?

JSON schema in, JSON out

  • You provide a JSON schema describing the desired output
  • The model is constrained to return JSON matching that schema
  • JSON Schema is a pre-existing standard
  • Common use cases: data extraction, classification, content tagging, decision support

Schema for customer extraction

            
              
            
          

The PL/SQL call

            
              
            
          

Output

            
              
            
          

Reasoning

What is reasoning?

  • New capability in advanced LLMs (2024+)
  • Model thinks step-by-step before producing the final answer
  • Generates internal "thinking tokens" to decompose, explore, and validate
  • Effective for complex logic, multi-step analysis, math, code
  • Cost: 3–10x more tokens, 2–10s slower
  • Don't use for: simple tasks, real-time, cost-sensitive ops

Why fraud detection needs reasoning

  • No single red flag: risk comes from combining many weak signals
  • Cross-checking claim text, history, timing, amounts, locations
  • Weighing contradictions: plausible story vs. suspicious pattern
  • Decision has real cost: false positives anger customers, false negatives pay out fraud

Enable reasoning + structured output

            
              
            
          

A peek at the reasoning

=== MODEL REASONING (excerpt) ===
Claim is $9,800 against a $10,000 limit β€” that's 98% of coverage.
"Max-out" claims correlate strongly with fraud; flagging as a signal.

Policy started 18 days ago. Industry baseline for first claim is ~180+
days. Early claims after policy inception are a classic indicator.

No witnesses and no police report for a collision claim. Not proof of
fraud on its own β€” could be a minor parking incident β€” but combined
with the above, the pattern strengthens.

Customer risk score is 85 vs. typical 40-50 for this segment. Worth
weighting but shouldn't dominate the decision alone.

None of these is conclusive individually. Together: 3 strong signals +
1 elevated baseline β†’ cannot auto-approve, but evidence is not strong
enough for outright denial. Right call is MANUAL_INVESTIGATION_REQUIRED
and route to SIU.

Output: high-risk claim

=== FRAUD RISK ASSESSMENT ===
  Overall Risk: HIGH
  Fraud Score: 82/100
  Recommendation: MANUAL_INVESTIGATION_REQUIRED

  Risk Factors Identified:
  1. [HIGH] Claim amount near coverage limit
  $9,800 claimed against $10,000 limit β€” fits "max-out" pattern.
  2. [HIGH] Very early claim after policy start
  Filed 18 days in vs. industry average of 180+ days.
  3. [MEDIUM] Elevated customer risk score (85 vs. typical 40-50)
  4. [MEDIUM] No witnesses, no police report

  Suggested Actions:
  1. Assign to special investigations unit (SIU)
  2. Request police report and any dashcam/road camera footage
  3. Cross-check prior claims across carriers via ISO ClaimSearch
  4. Hold payment pending investigation

File input

Multimodal models

  • Modern frontier models accept text and images, PDFs (some also audio/video)
  • Send a text prompt plus files; get a text answer back
  • PDFs: extracted as text and rendered as page screenshots
  • For CSV / Excel / DOCX: prepare locally as text and send as text
  • Token-priced like everything else; big images get scaled down automatically

Common use cases

  • Invoice processing: extract vendor, amount, date, line items
  • Receipt scanning for expense reports
  • Contract analysis: review and summarize
  • Diagram understanding: convert architecture diagrams to docs

The claim photo

Send a BLOB from your DB

            
              
            
          

Output (gpt-5.4)

=== VEHICLE DAMAGE ASSESSMENT ===
Vehicle Identification:
Brand (guess): Renault
Model (guess): Megane Scenic
Color: Black

Damaged Parts:
1. Engine bay / front structural [SEVERE] Est. $5,000
Engine compartment crushed; likely radiator and frame damage
2. Roof front / A-pillar area [SEVERE] Est. $2,500
Buckled from impact, possible structural deformation
3. Front suspension / alignment [SEVERE] Est. $2,200
4. Front fenders [SEVERE] Est. $1,400
5. Hood [SEVERE] Est. $1,200
6. Front bumper [SEVERE] Est. $900
7. Windshield [SEVERE] Est. $700
8. Headlights, grille, doors… Est. $2,750

Total Estimated Repair Cost: $16,650
Overall Severity: TOTAL_LOSS
          

Tools

Let the LLM call your PL/SQL functions

Generative vs agentic

  • So far: LLM produces text
  • What if the LLM could trigger an action — query a table, call an API, write data?
  • That's function calling, also known as tools
  • We define tools (name, description, parameter schema) and pass them with the request
  • The LLM asks us to call a tool with specific arguments — we execute it and return the result
  • Then it can ask again, or produce a final answer

The tool loop

Tool design principles

  • Names matter — clear, action-oriented: search_customers not tool_1
  • Descriptions are prompts — the LLM picks tools based on description quality
  • Right granularity — not too broad (do_everything), not too narrow (get_user_first_name)
  • Limit count — each definition costs input tokens; 20+ tools degrades accuracy
  • Filter per context — only send relevant tools (use tags)
  • Never raise — return descriptive error strings; LLMs recover and retry

Register a tool

            
              
            
          

Enable tools and ask

            
              
            
          

What happened under the hood

Message flow:
β†’ System prompt
β†’ User asked question
β†’ AI decided to call tool(s): DEMO_GET_CUSTOMER_INFO
β†’ System returned tool result for: DEMO_GET_CUSTOMER_INFO
β†’ AI provided final answer

AI Final Response:
Ashley Taylor (CUST-1000037), customer since 2022. Lives in
Phoenix, AZ. Risk score: 38 (low). 2 active policies:
β€’ POL-AUTO-2024-00012 β€” Auto, $50k coverage
β€’ POL-HOME-2024-00018 β€” Home, $400k coverage
Recent claims: 1 (status: closed, paid).

Tool calls made: 1
          

Agentic AI

End-to-end claim processing from a single email

From one tool to a suite

  • An agent = LLM + tools + loop
  • Give it a goal, it decides what to call and when
  • Multiple tools = multiple business operations — the model orchestrates them
  • Tip: pair with reasoning for complex workflows

The claim-handling agent

  • Four PL/SQL tools, one autonomous workflow:
  • DEMO_GET_POLICY_INFO — validate the policy exists and is active
  • DEMO_FILE_CLAIM — create the claim, return claim number
  • DEMO_ASSIGN_CLAIM — route to a specialist with appropriate priority
  • DEMO_SEND_EMAIL — confirm with the customer

One call, full workflow

            
              
            
          

What the agent did, on its own

Tool Calls Made by Agent:
───────────────────────────────────────
Tool: DEMO_GET_POLICY_INFO
Args: { "policy_number": "POL-HOME-2024-00001" }
Result: Active. Coverage $400k. Holder: Robert Anderson.

Tool: DEMO_FILE_CLAIM
Args: { "policy_number": "POL-HOME-2024-00001",
"claim_type": "Water Damage",
"description": "Basement flood from burst pipe..." }
Result: CLM-1000089 filed.

Tool: DEMO_ASSIGN_CLAIM
Args: { "claim_number": "CLM-1000089",
"specialization": "Property",
"priority": "HIGH" }
Result: Assigned to Maria Lopez (Property).

Tool: DEMO_SEND_EMAIL
Args: { "to_email": "robert.anderson@email.com",
"subject": "Claim CLM-1000089 received - urgent water
  damage",
"body": "Hi Robert, we've filed your claim..." }
Result: Email sent.
          

You only need PL/SQL

  • βœ“ Any modern Oracle DB
  • βœ“ An API key from a provider of your choice
  • βœ“ UC AI (open source) / APEX 26.1
  • βœ— No Python Β· No MCP servers Β· No Autonomous DB

Resources

Thank you! — Q&A

philipp@united-codes.com

Slides + demos available after the talk