Come for the Tucker Decompositions, get lost in the messy data, and leave with a respect for big data tools and AI.
Repo market data taught me something last week: throwing raw data at an LLM is like asking a poet to do your accounting. It’s not a good idea.
That got me thinking about markets and data. If LLMs are for unstructured data, what’s the equivalent for structured data?
Large Language Models vs. Large Data Models
When it comes to market volumes, prices, and time series, I discovered last week that indexing, filtering, and pre-aggregating the data are essential. Only after that can an LLM step in to do what it does well — summarising, navigating, or visualising the data.
This week’s blog is researching a new area: are there Large Data Models (LDMs) — instead of LLMs — that can help me understand patterns in the market? I wanted something that could handle structured data natively — ideally something like a tensor model — and, with help from ChatGPT, I set out to build one.
The Tensor Model Approach
I asked ChatGPT how to implement a tensor-like model that links swap volumes with rate changes. The input would be time-series data; the output, latent relationships between price moves and volumes.
I chose two primary sources:
- CFTC Weekly Swaps Report — specifically Table 8c, giving weekly USD OIS volumes by tenor.
- SOFR OIS Rates — daily data across standard tenors (1Y, 2Y, 5Y, 10Y, 30Y) from Bloomberg.
The Painful Set-Up
I didn’t want to touch Excel with this project. I was hoping ChatGPT could take the Swaps Report and match it to my Bloomberg data, creating a clean weekly dataset with volume and rate metrics.
Spoiler: It couldn’t.
Bloomberg’s time-series data has quirks: different tenors sometimes have different date stamps. This confused ChatGPT entirely. Then there’s the CFTC report — published on Mondays but covering activity from two weeks prior. That mismatch between report date and trading period wasn’t correctly handled.
The result? ChatGPT couldn’t generate a reliable dataset with:
- Weekly reporting date
- Change in rates
- Average rate per tenor
This was disappointing. It’s exactly the kind of repetitive task I want LLMs to take off my plate.
A Quick Fix
Luckily, I’ve built these files dozens of times. So I knew how to patch it up manually (in Excel – booo).
Lesson One: Always check the input data. Just because an LLM says it’s “clean, final, aligned” doesn’t mean it is. The below file is (you can export it if you want), because it was created by me!
Lesson Two: Asking an LLM to merge and standardise structured data is like using a sledgehammer to crack a walnut. The code might look slick, but without validation steps, things go wrong fast. Agentic workflows or better prompt engineering might help — but for now, it’s on the user.
Large Data Models
On to the heavy stuff:
Hey, ChatGPT – do me some large data models on swaps markets please!
As we know from all of these blogs, that is not how we go about things. Scoping it out first with your new “colleague” is the way of these things:

This is where LLMs really start to shine. With structured data all cleaned up, I can ask ChatGPT to help me run a Tucker Decomposition. I had no idea what that meant 24 hours earlier.
Thanks to ChatGPT, I would soon be decomposing my dataset into latent components, viewing a core tensor and feature factor matrices, and analysing the underlying structure in swap data.
Lesson Three: It was one of those “AI unlock” moments: I couldn’t have done this on my own.
But Then… Memory Woes
Ironically, ChatGPT remembered that I had worked on a CFTC Swaps extractor project before — even referencing MySQL. But when it wrote a new extractor, it completely ignored those learnings. It recreated earlier mistakes: wrong URLs, missing worksheet names, incorrect logic.
I’m now faster at spotting those errors, but it’s frustrating.
Lesson Four: LLMs need better memory across sessions, which would make the experience radically better.
Visualising the Results
Soon I had everything I needed:
- A merged dataset of volumes and rates
- A
Tensor_3D.npyfile capturing the structured data - A corresponding list of feature names
When I ran the decomposition, the result was impressive. The model generated five distinct components, each with associated feature loadings. I didn’t know what any of them meant at first — but I could ask ChatGPT to explain the below chart:

What does the chart show? Let’s use ChatGPT’s own words;

Lesson Five: LLMs are great at explaining technical concepts and charts. Don’t just produce a chart – use an LLM to learn new data modelling techniques!
It became clear that ChatGPT has both read and understood the research, so is able to guide me towards particular outcomes;

Component Interpretation
I asked ChatGPT to “label” Component 0. It came back with;
🧠 “Short-End Rate Sensitivity”
Driven primarily by changes in 1Y and 2Y OIS, the component appears to capture how front-end OIS reprices week-to-week — likely tied to policy moves or front-end volatility.
These kinds of interpretations are incredibly powerful. The model reveals latent structure in the data, but I can then lean on ChatGPT to give those structures real-world meaning.
And guess what? We’re basically doing Principal Component Analysis of swap curves — a well-established technique I’ve seen before in markets. I didn’t set out to replicate PCA, but I ended up discovering it again, guided by data and model logic.
When the User Brings the Value
Inspired by the PCA-like outcome, I expanded the dataset to include slope and curvature measures — 2s10s, 5s30s, and 2s10s30s butterfly metrics. This gave me 25 features across 32 weeks of data — a decent foundation for market structure analysis.

While the heatmap of component loadings is visually striking, I found bar charts of top feature loadings per component more useful for interpretation:

ChatGPT even auto-labeled components using language drawn from swap market literature. These labels felt directionally right — but not always accurate. That’s something to refine.
Still, it felt like a big step forward.
The fact I could build, run, and interpret a Tucker Decomposition of swap data — with no prior knowledge — is testament to what LLMs can unlock.
Key Takeaway
“Use LLMs for LLM-type tasks.”
They’re not built to be data engineers. But once the structured work is done, they are very able research assistants — able to bridge the gap between data analysis and market meaning.
In Summary
- Tensor models are now accessible: I built a Large Data Model using swap volumes and SOFR rates. With clean inputs and guidance from ChatGPT, I ran a Tucker Decomposition to extract latent market signals.
- LLMs amplify the modelling process — if guided: They don’t always get input formats right, but they do help interpret complex models and relate them to real-world market structures.
- The real value is collaboration: I brought the market knowledge; ChatGPT brought the tools and logic. I couldn’t have done this in Excel or coded it solo. But working together, we created something useful and relevant.


Leave a Reply