On February 2nd, OpenAI introduced Deep Research, a tool designed to take complex questions, gather relevant information, and generate in-depth reports. Powered by their latest model, o3, it promises to deliver well-researched insights within minutes or hours. But how well does it actually perform? I decided to put it to the test.

The reports from Deep Research can sometimes be really impressive. Here are some reactions compiled by Zain Khan, published on Linkedin:
- Biomedical scientist Derya Unutmaz analyzed a pair of cancer cases he’s working on, and “both reports were simply impeccable, like something only a specialist MD could write.”
- AI educator Mckay Wrigley created a “one-stop daily news report” that’s personalized to the topics and sources he’s interested in.
- Johns Hopkins physician Banda Khalifa was “mind blown” when deep research crafted a comprehensive literature review about the HPV vaccine in under five minutes.
- HubSpot co-founder Dharmesh Shah generated an 11,000-word industry report filled with “genuinely great insights — including some [he] hadn’t really thought of before.”
Deep Research isn’t for anyone (yet), but only for the ChatGPT Pro tier. Luckily I have connections that helped me get a report that I could evaluate myself. (Thanks to Fredrik Ahlgren!)
Task: How does AI advancements affect the Swedish society?
I believe that it’s wise to use AI (or other tools) for authentic tasks when evaluating them. It makes it easier to spot useful and useless things, and you’ll learn more in the process.
A question I’m genuinely interested in is how AI advancements will affect Swedish society on a 5–10 year horizon. So I asked for a report on that topic.
You can view the full reply on ChatGPT and in this copy-pasted Word document. (They are both in Swedish, but you will find a way to translate them.) It’s a 23 page document of dense text, and I read it with eagerly.
I’ve spent a lot of time learning and thinking about AI impact on Sweden, in particular when it comes to education, so I am pretty qualified for evaluating the resulting report.
This is what I think.
In general good and useful results, but not great
The report has a good structure, is easy to read, and has an overall quality well above ”useful”. Deep Research does a really good job of summarizing the current state of AI in Sweden, and it cited a few resources I hadn’t seen before but will happily use in the future.
It does not perform as well when it comes to analyzing trends and drawing conclusions about the future. It is still ok and useful, but not much beyond that.
When it comes to the area where I have the most expertise – AI and education – I find Deep Research’s conclusions pretty shallow. Some of the actions it recommends are good, none were new to me, and a few were quite bad (such as the suggestion that all students should learn basic programming)
Another area where I’ve spent quite a bit of time is democracy and the information ecosystem. I couldn’t spot any bad results from Deep Research there, but was kind of disappointed when it suggested rules for marking AI generated content – and didn’t even mention the approach of signing content digitally.
In all, I was somewhat disappointed in what Deep Research had to say in the areas where I know the most. I suspect that its conclusions in other areas are as shallow, but that I don’t have the expertise to see it. My overall assessment of the report is thus good, but not great.
Great Language
Another expertise of mine is language and writing (in Swedish), and I have to say that I’m impressed by how Deep Research uses the Swedish language. The language was better than in most (published) reports I’ve read, and the report was free of some of the mistakes I often see both from humans and ChatGPT. In the 23 pages I found three mistakes that a human could have done, two that only an AI would do, and two that was an effect of the English language spilling over into Swedish in a bad way.
One of the mistakes were pretty serious, since the number ”billion” means three more zeroes when used in Swedish. (See ‘long and short scale‘ if you want to know more.)
Conclusions
I was a bit surprised by the good-but-not-great results from Deep Research, when others have been so impressed. This is of course only a single report, but I would still venture the hypothesis that Deep Research performs better on collecting and summarizing information than drawing conclusions, extrapolating and making well-founded guesses. This would explain the difference in quality between how it describes the current state of AI in Sweden, and its prognoses and recommendations for the future. It also fits well with what we know about the strengths of large language models.
In many areas, just making a good compilation of current research – and a synthesis of what it says – is extremely useful. In other areas, such as forecasting, it is less useful.
The results of Deep Research is still impressive. I’m a pretty smart guy, and I’ve spent a bit more than two years obsessively trying to understand AI’s impact on education. The results in this area felt pretty lame to me, but it is definitely on par with most talks on the topic at various conferences and webinars. For someone who hadn’t spent a lot of time on the topic the report from Deep Research would provide a start and some useful insights – but also one or two leads that I think are dead ends or counterproductive.
This makes Deep Research very useful in areas where you have the expertise to critically evaluate its findings, but less useful in other areas. It suspect that you can trust it more on less speculative tasks, but that’s an hypothesis that needs more evidence. I’m looking forward to learning more, as well as experimenting with ways to improve automation in analyses involving speculations.
What are your experiences of Deep Research? Please share in a comment.
Lämna en kommentar