GPT-4 T vs. 4o Mini vs. 3.5 T: Which LLM Works Best for Data Mapping?
LLMs can greatly speed up prototyping for tasks like extracting structured data from PDFs, normalizing it, and mapping it to predefined schemas. I recently tested three models from OpenAI for this purpose, evaluating accuracy, cost, and speed. Here’s a brief summary of my findings.
My Approach
- Data Extraction: Copy structured text directly from ERP-generated PDF offers (no OCR needed).
- Schema Validation: Use Zod to ensure data integrity.
- LLM-Powered Normalization & Mapping: Map product categories, identify brands, and extract key information.
To compare model performance, I set up a small test suite with different PDF documents, each containing various formatting styles, product descriptions, and pricing structures. This allowed me to systematically measure how well each model handled the same inputs across different scenarios.
Models Tested
Model | OpenAI SDK Name | Quality | Speed (vs. GPT-3.5) | Cost (vs. GPT-3.5) |
---|---|---|---|---|
GPT-4 Turbo | gpt-4-0125-preview |
✅✅✅ Best | ~2x slower | ~3x more expensive |
GPT-4o Mini | gpt-4o-mini |
✅ Decent | ~1.5x slower | ~2x more expensive |
GPT-3.5 Turbo | gpt-3.5-turbo-0125 |
✅ Decent | Fastest (Baseline) | Cheapest |
Prototypical Setup in TypeScript
A simple setup for sending structured data to OpenAI’s API and normalizing it with Zod:
import { OpenAI } from "openai";
import { z } from "zod";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const productSchema = z.object({
name: z.string(),
brand: z.string().optional(),
category: z.string(),
price: z.number(),
});
async function normalizeProductData(text: string) {
const response = await openai.chat.completions.create({
model: "gpt-4-0125-preview",
messages: [{ role: "system", content: "Extract structured product data from this text." }, { role: "user", content: text }],
});
const rawData = JSON.parse(response.choices[0].message.content || "{}");
return productSchema.parse(rawData);
}
// Example usage
normalizeProductData("Trek Domane SL 6, Carbon Road Bike, 3499€").then(console.log).catch(console.error);
Key Takeaways
- GPT-3.5 Turbo was surprisingly good for schema validation but struggled with very complex mappings (e.g., identifying product categories from descriptions).
- GPT-4 Turbo was significantly more accurate, with simpler prompts but at a higher cost and slower response time (~2x slower, ~3x costlier).
- GPT-4o Mini landed between the two in both quality and cost, but in my tests, its accuracy was closer to GPT-3.5 than GPT-4 Turbo, making it a less compelling option for me.
My Recommendation
For fast and cost-efficient prototyping, use GPT-3.5 and fall back to GPT-4 Turbo only when necessary (e.g., for ambiguous mappings). If cost is less of a concern, go directly with GPT-4 Turbo for the best accuracy.