Introduction

In today's dynamic generative AI landscape, large language models (LLMs) have radically transformed how we interact with technology. These models demonstrate exceptional capabilities in tasks like text generation, sentiment analysis, and contextual understanding. However, when faced with scenarios that demand absolute precision and deterministic results, we encounter inherent limitations that require innovative approaches.

This article walks through a practical architecture we've implemented for enterprise clients — one that solves the exact/approximate problem by combining the interpretive power of LLMs with the precision of structured query engines.

The Non-Deterministic Model Problem

How LLMs Actually Work

Large language models operate through a sophisticated probabilistic system. At their core, these models:

Contextual Prediction: Analyze prior context to predict the next most probable word or sequence.
Probability Distribution: Generate a probability distribution across different response options.
Temperature and Randomness: Use parameters like temperature to control creativity versus determinism in their responses.

This probabilistic nature is precisely what makes LLMs so versatile for creative and analytical tasks — but it's also what makes them unreliable for queries requiring exact numerical accuracy.

From a Failed POC to an Innovative Solution

During multiple generative AI workshops with enterprise clients, a recurring pattern emerged. A development team shared their frustration with a proof of concept they considered a failure. Their problem: their generative AI implementation for support ticket analysis produced inconsistent results.

Digging into the case, an interesting pattern emerged:

What Worked Well:

"Analyze support ticket X"
"What is the summary of case Y?"
"What does this incident report suggest?"

These questions, which required contextual understanding and qualitative analysis, received precise and useful answers.

What Consistently Failed:

"Which department has the most open tickets?"
"How many tickets were handled last month?"
"What is the average resolution time?"

Questions requiring numerical precision and exact calculations never delivered reliable results.

The Key Revelation

The reason for the failure became clear once we understood the fundamental nature of LLMs: they are inherently non-deterministic. Their strength lies in natural language processing and probability-based content generation — not in performing precise calculations or executing exact queries on structured data.

This insight led to reformulating the core question:

How can we answer deterministic questions when an LLM, by its very nature, is not designed to do so?

The answer came from recognizing that we don't need to force the LLM to do something it wasn't designed for. Instead, we can:

Use the LLM for what it does best: understanding the intent behind the question.
Translate that intent into structured queries when necessary.
Use specialized tools for precise calculations.
Present results in a coherent and natural way.

The Gap Between Precision and Probability: Implementing the Solution

Once the core problem was identified, we designed a system that first determines the nature of the query, then applies the appropriate processing path.

Query Classification

Deterministic Queries:

Characteristics:

Require exact, reproducible counts.
Involve aggregations over specific ticket fields.
Operate on the schema defined in Athena.

Real-world examples:

"Which department has the most open tickets?" Generated SQL:

SELECT departamento, COUNT(*) as total
FROM tickets
WHERE estado != 'CLOSED'
GROUP BY departamento
ORDER BY total DESC

"What is the leading cause of registered incidents?" Generated SQL:

SELECT causante, COUNT(*) as total_incidentes
FROM tickets
WHERE solicitudes = 'Incidentes'
GROUP BY causante
ORDER BY total_incidentes DESC
LIMIT 1

Non-Deterministic Queries:

Characteristics:

Require contextual analysis of ticket content.
Benefit from natural language processing.
Handled by the Bedrock Knowledge Base.

Examples:

Analysis of specific ticket content.
Case summaries.
Interpretation of patterns in reports.

Processing Flow

The flow is divided into three straightforward steps:

Initial Evaluation
- Uses the defined prompt to determine whether the query is deterministic. In this step, we use an LLM itself to decide if what the user is asking is deterministic by nature.
- When deterministic, the LLM generates the appropriate SQL within <SQL> tags. This is based on the Athena table definition and a data dictionary.
Processing
- Deterministic queries: Executed through Athena — we send an LLM-generated SQL that satisfies the user's query.
- Non-deterministic queries: Processed via Amazon Bedrock Knowledge Base. This knowledge base contains the same CSV file used in Athena.
Response Formatting
- Athena results are capped at 25 records (to prevent a single question from returning the entire database).
- The LLM converts the results into natural-language responses.
- The original question's language is preserved.

Solution Architecture

The implemented architecture resolves the deterministic query challenge through a strategic combination of AWS services and LLM processing. Let's examine each component and its detailed implementation.

1. Data Storage and Preparation Layer

1.1 Base Data Structure

The system operates on a CSV file hosted in S3 that contains ticket records. Preparing this data is critical and requires:

CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`tickets` (
  `fechaResolucion` string,
  `asignado` string,
  `solicitudes` string,
  `producto` string,
  `departamento` string,
  -- [remaining fields]
)
COMMENT "Sample Tickets Table"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ';')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://MyBucket/'
TBLPROPERTIES ('classification' = 'csv');

This DDL is critical because:

It defines the exact structure Athena will use for queries.
Specifies the ; delimiter for correct CSV interpretation.
Establishes the S3 location where data resides.
Configures input/output format to optimize processing.

1.2 Data Dictionary

Alongside the structure, we maintain a detailed data dictionary that the LLM uses to understand each field's context. For example:

fechaResolucion: Field indicating the ticket resolution date and time.
                Format: month/day/year hour:minute
causante: Categorical field indicating whether the ticket was raised by A or B
departamento: Calculated descriptive field of the department that handled the ticket

2. Query Classification System

2.1 Classification Prompt

The first critical step is determining whether a query is deterministic. We implement this through a specific prompt:

StringBuilder prompt = new StringBuilder(
   "You are an expert in ticket analysis. Analyze the question I provide. " +
   "If the question cannot be answered by an LLM (because it is deterministic), " +
   "respond ONLY with the word 'DETERMINISTIC' followed by an SQL query inside " +
   "a <SQL> tag that satisfies the following Athena table definition and its glossary..."
);

This prompt is critical because:

It defines the model's specific role.
Establishes the exact expected response format.
Includes schema context and data dictionary.
Forces a structured, parseable response.

2.2 SQL Generation via LLM

Once the system identifies the query as deterministic, it returns the SQL to be sent to Athena for execution. This works because the prior prompt included the table definition and data dictionary.

2.2.1 Model Configuration and Invocation

var message = Message.builder()
      .content(ContentBlock.fromText(prompt.toString()))
      .role(ConversationRole.USER)
      .build();

try {
  var client = BedrockRuntimeClient.builder()
      .credentialsProvider(DefaultCredentialsProvider.create())
      .region(Region.US_EAST_1)
      .build();

  // Send the message with a basic inference configuration.
  ConverseResponse response = client.converse(request -> request
          .modelId(FOUNDATIONAL_MODEL)
          .messages(message)
          .inferenceConfig(config -> config
                  .maxTokens(512)    // Sufficient for complex SQL queries
                  .temperature(0.5F) // Low for greater precision
                  .topP(0.9F)));     // High coherence in structure

  // Retrieve the generated text from Bedrock's response object.
  var responseText = response.output().message().content().get(0).text();
  client.close();

  return responseText;

} catch (SdkClientException e) {
  System.err.printf("ERROR: Can't invoke '%s'. Reason: %s", FOUNDATIONAL_MODEL, e.getMessage());
  return "Unable to answer that question";
}

2.2.2 Complete Flow Example

To illustrate the process, consider the question: "Which department has the most open tickets?"

Input Processed by the Model:

[All prior context + schema + data dictionary]
Question: Which department has the most open tickets?

Generated SQL:

SELECT
   departamento,
   COUNT(*) as total_tickets
FROM tickets
WHERE fechaResolucion IS NULL
GROUP BY departamento
ORDER BY total_tickets DESC
LIMIT 25

The generated SQL is sent directly to Athena for execution. The model already knows the exact table structure and the meaning of each field thanks to the provided context.

The key to this approach's success lies in the precision of the context provided to the model and the consistency of the requested response format — enabling reliable SQL generation that exactly matches the schema.

3. Deterministic Query Processing

3.1 Athena Query Execution

Once a deterministic query is identified, the system executes the generated SQL:

public String executeAthenaQuery(String query, String database) {
  try (AthenaClient athenaClient = AthenaClient.builder()
    .region(Region.US_EAST_1)
    .credentialsProvider(DefaultCredentialsProvider.create())
    .build()) {

    // Configure the query request
    StartQueryExecutionRequest startQueryExecutionRequest = StartQueryExecutionRequest.builder()
      .queryString(query)
      .queryExecutionContext(QueryExecutionContext.builder()
        .database(database)
        .build())
      .resultConfiguration(ResultConfiguration.builder()
        .build())
      .build();

    // Start the query
    StartQueryExecutionResponse startQueryExecutionResponse =
        athenaClient.startQueryExecution(startQueryExecutionRequest);
    String queryExecutionId = startQueryExecutionResponse.queryExecutionId();

    // Wait for the query to complete
    waitForQueryToComplete(athenaClient, queryExecutionId);

    // Retrieve the query results
    return getQueryResults(athenaClient, queryExecutionId);

  } catch (Exception e) {
    e.printStackTrace();
    throw new RuntimeException("Error executing Athena query", e);
  }
}

This code:

Establishes a secure connection with Athena.
Executes the query asynchronously.
Handles the execution ID for tracking.

4. Response Formatting

The final step involves transforming technical results into comprehensible answers:

StringBuilder prompt = new StringBuilder(
   "You are an expert at answering queries. Respond professionally, " +
   "concisely, and clearly. The question asked was: " +
   userQuestion + " and the database response is: " +
   dbResponse);

This formatting:

Maintains the context of the original question.
Structures the response naturally.
Preserves the precision of the obtained data.

5. Handling Non-Deterministic Queries

When the system identifies a query as non-deterministic, it means the query requires contextual or interpretive analysis that cannot be resolved through a direct SQL query. In this case, the system uses Amazon Bedrock's Knowledge Base directly.

5.1 Identification and Processing

Identification occurs in the first step, when the model does not return the word "DETERMINISTIC" followed by SQL. The system then processes the query using Bedrock's Knowledge Base.

5.2 Model Configuration

For these queries, we use the base configuration with Anthropic Claude Sonnet 3.5 v2:

RetrieveAndGenerateInput input = RetrieveAndGenerateInput.builder()
    .text(prompt)
    .build();

KnowledgeBaseRetrieveAndGenerateConfiguration knowledgeConfig =
    KnowledgeBaseRetrieveAndGenerateConfiguration.builder()
        .knowledgeBaseId(KNOWLEDGE_BASE_ID)
        .modelArn(MODEL_ARN)
        .build();

RetrieveAndGenerateConfiguration retrieveConfig =
    RetrieveAndGenerateConfiguration.builder()
        .knowledgeBaseConfiguration(knowledgeConfig)
        .type("KNOWLEDGE_BASE")
        .build();

RetrieveAndGenerateRequest request1 = RetrieveAndGenerateRequest.builder()
    .retrieveAndGenerateConfiguration(retrieveConfig)
    .input(input)
    .build();

RetrieveAndGenerateResponse response1 =
    bedrockAgentRuntimeClient.retrieveAndGenerate(request1);

5.3 Examples of Non-Deterministic Queries

The following are typical queries the system processes interpretively:

Content Analysis:

Question: "What are the common patterns in connection error tickets?"

Case Interpretation:

Question: "How was a similar case resolved last time?"

Contextual Summaries:

Question: "Summarize the main problem in ticket #12345"

In these cases, the system:

Does not attempt to generate SQL.
Processes the query directly through the model.
Provides a response based on available context and information.
Maintains a consistent format and tone matching the original question.

The response is delivered directly to the user, preserving the conversational nature and context of the original question.

Enterprise Implementation Considerations

Beyond the core technical architecture, enterprise deployments require additional attention to:

Security and Access Control

Athena query results should be scoped to authorized data — use AWS IAM policies to restrict which S3 prefixes the service account can access.
Consider using Athena workgroups to isolate query execution environments and enforce result location policies.

Cost Management

Athena charges per data scanned — partition your data by date and category to minimize scan costs on filtered queries.
Cache frequent deterministic query results in ElastiCache or DynamoDB to reduce both cost and latency.

Prompt Injection Prevention

Sanitize user input before incorporating into the classification prompt. An adversarial user could attempt to inject SQL that exfiltrates unauthorized data.
Validate generated SQL against a whitelist of allowed tables and columns before sending to Athena.

Observability

Log the classification decision (deterministic vs. non-deterministic) alongside the generated SQL and Athena execution ID.
Track classification accuracy over time — periodic review reveals prompt drift and schema changes that require prompt updates.

Conclusions and Next Steps

This hybrid system — combining the precision of SQL queries with the interpretive capability of language models — represents just the beginning of what's achievable with Generative AI in enterprise data analysis.

Key Reflections

Automatic distinction between deterministic and non-deterministic queries lets us leverage the best of both worlds: the accuracy of relational databases and the contextual understanding of LLMs.
The implemented architecture proves that it's possible to maintain the precision required in enterprise environments while significantly improving user experience.
Using modern services like Amazon Bedrock allows implementing advanced AI solutions without managing complex infrastructure, with access to state-of-the-art LLMs.

The architecture described here forms the foundation for more sophisticated patterns: adding conversation history, supporting multi-turn clarification when queries are ambiguous, and extending the classification system to handle time-series analytics and cross-table joins.

GenAI is transforming how we interact with data — and the organizations that will benefit most are those that design with its strengths and limitations clearly in mind.

Want to Implement This in Your Organization?

At Akarui, we architect and implement production-grade AI systems on AWS. Whether you're building a text-to-SQL analytics layer, a hybrid RAG system, or a full multi-agent platform, our senior architects have done it at enterprise scale.

Schedule a Free Architecture Review — no sales pitch, just a direct technical conversation with the architects who build these systems.

Solving Deterministic Questions with Generative AI: A Practical Approach