// podcast · masters of search

Where ChatGPT gets its products | Tom Wells, Researcher @ Peec AI

My guest in this episode is Tom Wells, AI Search Researcher at Peec AI. Tom has spent over a decade decoding ranking signals at companies like Semrush and Searchmetrics, and he recently published some fascinating research that got a lot of…

Niklas BuschnerFounder & CEO

March 24, 202656 min read

My guest in this episode is Tom Wells, AI Search Researcher at Peec AI. Tom has spent over a decade decoding ranking signals at companies like Semrush and Searchmetrics, and he recently published some fascinating research that got a lot of attention in the ecommerce SEO and AI Search world.

Check out Tom’s research:

Check out the full episode

Check out the episode on YouTube, Spotify, or Apple Podcasts.

Key learnings from this episode

ChatGPT scrapes Google Shopping

83% of products in ChatGPT’s product carousel are sourced from Google’s organic shopping results, confirmed across 43,000 unique products and ~100 subcategories
Over 40% were exact title matches, making the evidence overwhelming
The scraping parameters are visible in ChatGPT’s source code — you can decode a Base64 field to reconstruct the exact Google Shopping URL for any product
Bing was ruled out as a source through a negative control (only ~10% match, mostly overlapping with Google anyway)
Google Maps data is similarly scraped, with parameters left in the source code

How ChatGPT retrieves products

ChatGPT uses third-party scraping providers to get Google Shopping results, not direct API access
Shopping-specific fan-out queries are short and target product categories, unlike contextual fan-outs which are longer and target articles/buying guides
The scraping introduces latency issues: prices can be outdated, products may be out of stock, and images may not match (e.g., wrong color)
ChatGPT failed to keep up with live pricing during Black Friday

ChatGPT 5.4 introduces brand bias

In earlier versions, shopping fan-outs were brand-neutral (e.g., “best running shoes under $500”)
In 5.4, the model shows inherent brand bias visible in the reasoning steps — it pre-selects brands before searching, which cascades through the entire retrieval chain
This may be connected to OpenAI onboarding retailer product feeds (Dick’s Sporting Goods, Best Buy, Etsy)

Practical implications for e-commerce

Feed optimization is foundational: make sure you appear in the top 5-10 of Google organic shopping results for your key terms — this directly correlates with carousel position
Google Ads won’t help: ChatGPT only scrapes organic results, not paid, because organic is more stable and available across geos
Third-party mentions matter: the buying guide content in ChatGPT responses is built from contextual web sources — your brand/product sentiment across editorial content, affiliate sites, and review platforms feeds directly into AI recommendations
Map your source landscape: identify which third-party sources rank for your category, check if you’re mentioned, and pursue organic or paid placement where relevant
Build your own data moat: rather than relying on ChatGPT as the discovery layer, brands with well-organized product data can create their own agentic experiences (e.g., an AI shopping assistant on top of their Shopify data) for a better user experience

Upcoming research

Peec AI is studying how Google’s dynamic meta description rewrites influence ChatGPT’s content ranking — since ChatGPT ingests the rewritten snippet (not the original), this small text may have outsized impact on how ChatGPT thinks about and ranks content

Auto-generated transcript

Niklas Buschner (00:02.069)
My guest today is Tom Wells, AI search researcher at PKI. Tom has spent over a decade decoding ranking signals at companies like SEMrush and Searchmetrics, and he recently published some fascinating research that got a lot of attention in the e-commerce SEO and AI search world. Excited to dive into this, so thanks for coming on the podcast, Tom.

Tom – Peec AI (00:23.877)
Thanks for inviting me, Nicholas. Super happy to be here.

Niklas Buschner (00:26.616)
Thanks for taking the time. Tom, let’s do a quick intro for everybody that does not know you yet. So who are you and what do you do at PEEK.AI? And how did you end up in this whole AI search SEO space?

Tom – Peec AI (00:39.185)
So yeah, actually my story goes all the way back to Search Metrics, which believe it or not used to be a rival of SEMrush. When I tell people this, they’re like, I don’t know the name or maybe I heard it and it was like kind of famous like 10 years ago, but then what happened, you know, which is fair enough. But believe it or not, were rivals with BrightEdge and with SEMrush and we were actually doing pretty well. And how it started there was I entered the marketing department in 2015.

And what I really loved was, this sounds quite funny, but the actual office space. So it was open plan and everybody could speak with every other team on the same floor. And I remember being really fascinated with the data science team and everything that they were doing. And I noticed that one of the key kind of gaps at that time, 10 years ago, at least, was the divide between the marketing departments who at that time in 2015 were focused on market, so email marketing, if we can even remember this.

and data science who had all of this gold mine of information and I felt the channel between the two was not as good as it could be at least at that time. And so I tried to basically be the bridge between both and this led to doing things like conference presentations, famous studies like the ranking factor study back then which was pretty well downloaded. And then I just increased my data science knowledge from there leading on to things like doing a

a master of data science degree from Georgia Tech and really kind of obsessing with the data side of stuff, having started more in the marketing side. So yeah, I kind of have both elements of it and I kind of love both elements of that world. And from then on went on to do consulting, yes, still in SEO as well, but also in kind of AI pipelines from building them from scratch and stuff like this.

returning a while ago to Semrush, particularly with the interest in AI search that was going there when Perplexity launched, doing this for a couple of years and then seeing the opportunity to move across to PEEK.ai and what really inspired me to make the move, or you could call it make the jump from a thousand person organization where I’m doing very high level research with many resources at my disposal to PEEK.

Tom – Peec AI (03:00.292)
was exactly the opposite. Now I’m responsible for effectively running the research unit as good as possible with slightly fewer resources, but I actually found that quite an empowering idea, basically. So yeah, that’s why I’m there.

Niklas Buschner (03:14.326)
Sounds pretty much like the motivation from Malte when he decided to leave Idealo and then go to PKI. Did you also have the chance to work with him when he was back then at Searchmetrics or did you two overlap?

Tom – Peec AI (03:18.509)
Ha

Tom – Peec AI (03:28.462)
We did overlap, especially in the 2015-2016 period, so already ancient history now. Malta was also trying to bridge the gap between data science and marketing, particularly with a view for primary research. So I helped him there. And we also reconnected in a funny way when he at ED Arlo went effectively on a mission to try and prove that Google shopping was in some way manipulating the ecosystem.

And there was a famous lawsuit in Germany where Idialo sued Google successfully. And one of the pieces of primary research, I was actually the author of this for Malta. So while I was still at Searchmetrics, I did a study looking at basically Google shopping vendors and proving that the majority of them were basically still somehow a Google entity or somehow influencing the Google rankings. And this was part of a successful lawsuit against Google at the time.

Niklas Buschner (04:27.116)
So PKI is doing a pretty good job in pulling together a lot of brain power from the whole SEO space and like OG SEOs. For people that might also be interested in like this whole research thing, because I see more and more companies also investing resources into research and publishing their own studies, et cetera. Can you just take us through a normal day, like super regular day of

Tom – Peec AI (04:30.522)
Ha

Tom – Peec AI (04:36.112)
you

Niklas Buschner (04:56.266)
your job as a researcher at PKI?

Tom – Peec AI (04:58.86)
Absolutely. mean, I don’t know if I could say there is a regular day at PKI right now as we’re in a scale growth phase. So the days are very different, but maybe I would just preface this response so people in the audience know a little bit why it’s actually why a research department is even a thing in such a small startup because, you know, people can build apps and execute, you know, now very, very quickly with AI coding software and actually come up with pretty good stuff and get it on the market very, very quickly.

However, many of them don’t even have a research department and never even considered this. For me, this is kind of crazy because I always say in the world of AI search, we’re in the business of understanding how AI search works. And this is impossible in a probabilistic system without actually researching how it works. We can do the basic level of throw in some prompts and see the responses, but…

Does this actually tell you how ChachiPT works under the hood? I would answer probably no. And so having this quality primary research of how AI search works across all threads of this is actually critical for the business understanding, the product understanding, and also potentially educating the industry as well. So this is kind of why I believe even in a small scale startup, having a good research unit, even if it’s one or two people,

can make a huge difference to what you’re actually able to achieve basically. A normal day, so, you know, we split our research into quite a few different buckets. So there can be short-term trends where we have to basically react to something as now a couple of weeks ago, one of the big things on LinkedIn was that chat GPT query fan-outs were announced that they disappeared. So I spent then…

three days researching this to actually get to the truth and the answer is it’s nuanced and more complicated than what some people claimed on LinkedIn, let’s say, and I published then a response to give a more detailed opinion on how this is. This is reactive. Then we have long-term research, which I’m sure is one of the things we’ll talk about in detail today, where an idea can become an obsession or a quest to prove or disprove something and it can

Tom – Peec AI (07:22.596)
take anywhere from one or two weeks to even four months in the case of the shopping studies. yeah. And of course we collaborate with some of our great clients come up with actually amazing research ideas. So instead of just show them the door and say, Hey, we are the ones with all the knowledge. have an open door policy to people who want to collaborate with us. And I evaluate every idea as if it was brought to me by the same person. There’s no hierarchy there. So

We have a few research pieces coming up that were actually inspired by our clients and said, hey, could you actually check out this? Wouldn’t this be nice to know the answer to this question? And then we decide internally, hey, can we actually do this? It would be really cool if we could. And that’s more or less how we work at the moment.

Niklas Buschner (08:09.93)
Awesome. And if there’s someone in the audience that also fancied starting their own research in any way, like let’s even say we, for example, at Radiant, so we already thought about, how can we do some cool research, for example, around, I’m spoiling an idea here, some cool research around how customized are really the responses from Chatchabit or other AI search interfaces.

depending on the persona that we are. So we’re simulating different personas, like let’s say a CEO or a CMO or like an SEO manager. And if so, we think that this will be loaded from the memory of Chachibiti, how different other responses, this is something that we, for example, just thought about, hey, this would be cool to have some solid takes on. How would you recommend?

starting with research, if we see if someone sees the value and says, OK, yeah, Tom, you convinced me. How do I how do I start like today?

Tom – Peec AI (09:09.104)
That’s a great question, Niklas. I really actually like this question because what you touched on is actually what I would call internal business intelligence, but also combined with a bit of instinct. So your instinct that this might be interesting, I would say 90 % of the time for clients, it’s actually correct that they have this question that they’ve kind of been obsessing with for months and it’s been at the back of their mind and they might didn’t know how to surface it as a question or a well-formed research study.

I do find it interesting that we do have some innate instincts to actually access what is interesting to us as a business. I think that is kind of the first step. Can you ask or find interesting questions that you want to solve? Because if yes, you have the really step one mailed down for a great research. from there, this is of course where it gets more complicated and of course where it might not be so easy, but…

If you ask the question, one really good thing is to basically do what I call, which is effectively design the optimum version of the study that you would like to see at the end and then work backwards from here. So in your case, I would say, okay, personas for radiance, maybe we have these five personas that we know in our ICP, we can actually describe them really well. Our ideal study would be very detailed at giving information on all of these.

Then I would say, okay, how do we then access this information? So in the prompt universe, you know, this is one of the exercises that any AI search company is able to do is we translate the idea of a study from the abstract into prompts. So we would say, okay, maybe for this persona, they would ask prompts in a different way on the backend, then we can measure the difference of the information. Of course, there are a lot of different technical skills that I’m not going to go too deep into now.

However, I would say the core part is asking a great, interesting question and sort of even imagining like, how could we get this data back? And then that is already at the stage where you could give it to a developer or a researcher or someone and they could actually deliver that for you.

Niklas Buschner (11:17.867)
Awesome. think that’s very helpful. And I also see my idea now being validated. But honestly, I have to give shout out to our internal AI search strategy lead, Janek, who came up with the idea because we were fascinated by the research, for example, that PKI already put out and we wanted to do something similar around things that we just felt is interesting. And obviously we also feel like we can’t just offload all our ideas on you because you only have so much time in the day.

Tom – Peec AI (11:21.455)
you

Niklas Buschner (11:46.756)
But let’s talk about some of the research that you actually published because I already mentioned in the intro that it’s about e-commerce, SEO and AI search. For everybody listening, we will also put all the relevant links in the video description or in the show notes. So you can also check the very extensive write-ups from Tom and the team. Give us a quick, like very brief intro.

What was the idea that you had with this research and how did you like how did you also come up with the idea?

Tom – Peec AI (12:20.749)
Yeah, so in October last year, this is when it started, I noticed that ChatGPT was, it seemed to me, leaning on Google shopping results. So it did actually start as a fully formed idea. More interesting is maybe how I noticed this. There was very few researchers, one of them being Olivier de Sagonzac from…

in France, me and him had some messages with each other about, are you seeing this? Why does no one care? And the trick for me was to see that I just did a random sample of literally on my personal chat GPT account asking questions about e-commerce products. And I saw that, hey, the price and the star rating looks the same as if I asked the same question on Google shopping. Like, is it pulling from Google shopping? Is it so advanced that it can actually pull the live data? Wouldn’t that be actually pretty cool?

And then I thought, well, now I think about it a bit more carefully. There’s no way that Google would give open access to their shopping graph to Chachi PT, who could be considered a rival at this stage. Then I decided that the only logical conclusion is that they are scraping these results via a third party scraper. So for people in the audience who don’t know really about web scraping, it’s effectively where you pay a provider to get the results emulated as if you were a real person. So if I say,

I want the Google shopping results for best running shoes under $500 in an area of the US, let’s say. You can then pay a scraping provider like SERP API or search API.io to name just two and then effectively get those results back. Now, kind of a complicated explanation into the study, but it’s as simple as I could frame it is that it started with the belief that Chachi PT was scraping Google shopping, which is already a very big finding if it would prove to be true.

Niklas Buschner (14:17.985)
Nice. And can you talk a little bit about the key findings that you made in this study without spoilering like too much because we will go into the details one by one, but maybe like high level, what was it that you found?

Tom – Peec AI (14:35.469)
Yeah, so the study confirmed what I heavily suspected, that the vast majority of products that make their way into the ChartGPT product carousel that we’ve probably already seen are in fact sourced from the Google organic shopping results. And it makes me happy to say this because the journey to prove this to be true took me three months or four months of work, essentially.

Niklas Buschner (15:03.107)
And I still see a lot of people claiming that chat.tpt sources its information if they do web search or if they do grounding from Bing. So did you also check, okay, maybe this comes from Google, but maybe it also comes from Bing and it’s just a coincidence. So did you also think about that?

Tom – Peec AI (15:21.069)
Yeah. So actually a shout out here to the Search Engine Land editorial team. So part of the long process of getting published on a reputable journal or a reputable website is that the editors actually have to recreate the study with their data science team, or at least take it apart and look at it from a conceptual point of view. Like they will not publish something that they truly don’t believe to be the case when it’s a headline claim like Chat GPT sources from Google shopping.

And they actually came up with the idea when I first spoke with them early in January that it would be really good to add a negative control. A negative control in an experiment is exactly what you say, Niklas, that you must prove that it’s not random chance. And for the viewers thinking like, but it would be easy to prove no, it’s not trivial actually, because if you imagine a very wide set of products like running shoes, watches with

heart rate monitors or any e-commerce products that you could possibly imagine effectively, we want to diverse the source of products as possible to prove that it’s true across all of them. If you ask questions to Bing and to Google, you might assume that probably the top brands and the top few products would appear on both of them. That’s a very logical and reasonable assumption. It turns out that’s actually not true, but that was also not obvious.

we ran the exact same data pipeline on both Google Shopping Organic and Bing Shopping Organic. So yeah, we did actually do the negative control. And while we found about 10 % of products on matching the ChatGPT carousel from Bing, of those that matched, very, very few were not also found in Google with the overlap.

So we were able to sufficiently prove that no, they are not using Bing as a source basically.

Niklas Buschner (17:17.124)
And can you talk a little bit about the methodology? I mean, people would probably think, okay, but is this true across various verticals? So did they really check a couple of different product categories or is it maybe a bias in certain categories? How did they look at the matching? Like, what does it mean that the vast majority of products are matched? So how can I actually think about this? So can you just…

share a little bit about how you approached it, maybe without being too technical, although I think the audience is probably, it’s very intelligent people, I know that, so you can go down a little bit into the rabbit hole.

Tom – Peec AI (17:50.508)
Yep, of course.

Tom – Peec AI (17:57.101)
I’m so, I’m so.

Tom – Peec AI (18:03.072)
Yeah, as you say, without going into the rabbit hole, whenever you do a study, and I would encourage any research team to sort of maybe follow this one or two pieces of advice. It’s not me preaching. It’s just if they already good at research, probably they know this already. But if you’re just getting started into it, this is a really important thing that when you first create, let’s say in this case, the product set that you want to check.

not only do you want it to be very diverse across categories, you want what’s called data harmonization. you ideally want a similar number of things in a similar number of categories across a similar number like this. So you then don’t have some, let’s say running shoe bias in the data. If we have 20,000 queries about running shoes, my findings wouldn’t be very good, right? Like this is obvious. So we had somewhere in the region of a hundred subcategories of products.

and a total of 43,000 unique products that we checked in ChachiBT, 250,000 products in Google Organics Shopping, and 250,000 products in Bing Shopping.

Niklas Buschner (19:14.242)
Okay, and how did you look at the match? Like, because I saw in the search engine land piece that you actually have a very sophisticated way of thinking about the matching and I even so honestly, I think I generally understand what you did there, but just on a very distant level. So maybe you can share a little bit about that.

Tom – Peec AI (19:24.258)
Yeah.

Tom – Peec AI (19:38.019)
So firstly, why in Search Engine Land we were so robust with the methodology, I will share this completely openly with all the viewers, is that because I’m employed by PKI as a researcher, the chance of me being able to cherry pick or influence the data and publish it on Search Engine Land is more than your average researcher. However, I take my job very, very seriously. I’m an absolute neutral person.

And as I said, the obsession began before my job at PKI already in November. So we published a lot of methodology that we appreciated in advance that many readers might not fully understand, but it was important for me to have it there to show what was done. So also someone could recreate this study if they have the ability to do this, which I hope they would do one day. the ideal case would be if all of the product titles

from ChatGPT exactly matched Google Shopping. However, Google Shopping and ChatGPT for that matter have the ability to dynamically rewrite small amounts of the product title. So if you do a test, you can ask the same question to Google Shopping 100 times and you’ll see slight variations in the titles that are retrieved. Sometimes it might leave certain aspects of if you’re looking for smart TVs.

it might include the dimensions already in the product title or it might not. However, the dimensions of that product don’t actually count as a match because it’s not to do with the brand or the model. So to create a way to actually efficiently get past this dynamic process, because we’re dealing with two probabilistic systems and we must somehow try and match them, I created a three-step algorithm to actually do this. And when people hear the word algorithm, they get a bit scared.

However, it’s just a repetitive process that works across the whole dataset. so step one, exact match, very easy. So if the strings or the text match exactly, it’s a hit. This is the easy case. Then we did a couple of more advanced things. So we basically said, okay, what about if the word order is mixed, but the words are the same? This is of course, you know, if you imagine a silly example would be iPhone 15 and we had

Tom – Peec AI (22:02.286)
15 iPhone for whatever reason. This is a case that wouldn’t come in the data, but you can imagine that sometimes when the words are switched, it still counts as a match. And as a third, we actually counted the number of similar tokens, it’s called, by subdividing the words into different parts and then match them. There was also random sampling, manual controlling, many other controls in place. And effectively then with all of this, we decided

we want to set a threshold of matching where the brand and the product is the same. And so we set a threshold of basically 0.8. So that allows for those things where there’s a slight deviation in the title, but we are still very confident that the brand and the product are the same. And this is where we can then say, as the results of the study, 83 % make this threshold. Interestingly enough,

already above 40 % made it into the exact match category, which is also a huge finding in itself.

Niklas Buschner (23:02.928)
So summarizing this in a very simple way, you’re saying that based on 45 % of products that show up in ChatGPT that are exactly the same in Google organic, this is such strong evidence with also then more than 37 % of products being almost an exact match that there’s no other

reasonable explanation that these products come from the organic shopping results from Google. Is that correct? Feeling like a lawyer here like in court.

Tom – Peec AI (23:43.522)
This is correct from the interpretive side, just to give a few more details on why in this case it’s very rare in research that we come up with these confident claims unless we are very sure. The real reason going back already to November is, if you remember I said at the top of the show that I looked and I basically saw that the prices and the ratings seem to be the same. Then I…

I a step deeper and I looked at the source code of ChatGPT to see what is happening there. Like what is happening in the background? Like what is it searching for? How is it thinking about these products? And as is stated in the search engine land piece, there was a field which is a product token. And when you decoded this field, magically you saw what looked exactly like Google shopping parameters. Now, using these parameters, I could then reconstruct

the exact URL that would link to a product on Google Shopping. Every time, 100 % of the time. Now, this still does not prove what I wanted to prove in the study. This just says, I already knew then that this must be a method of scraping Google. What it didn’t prove is, you know, how many Google organic shopping products does it look at? How does it source them? How does it decide who makes it first, second, third, fourth in the carousel? It didn’t prove any of this.

And it also did improve it at the scale where the basically international research community would accept it. However, and it still remains to this day that that is in the source code that everybody can actually look at and get the Google shopping URL from within ChatGPC’s source code.

Niklas Buschner (25:23.087)
Now two questions. First is, do you think that this finding matters to OpenAI in any way? Like, is it something that they would have rather, I know this is subject to interpretation. If you’re not happy to share interpretation, totally fine. But is this something that they feel like, we are not so happy with this now being public? And then secondly, did you already get a letter from OpenAI lawyers?

Tom – Peec AI (25:50.062)
Amazingly, I didn’t get anything in the PKI inbox and honestly, I take it as a compliment to the methodology of the research because I believe using the steps in the Search Engine Land article, if you were to go out today, get 40,000 products on chat GPT carousels and run it through using the exact shopping query finance, which is something we’ll talk about a bit later, and then map that all back.

Niklas Buschner (25:56.41)
You

Tom – Peec AI (26:18.753)
you would arrive at similar results. I don’t think OpenAI have time to care about my research in the sense that they have a lot of other things too. However, it is fitting in with the history of when they develop very fast and push features. I have seen in other areas this over-reliance on the Google method. So for example,

I haven’t checked for a while if it’s still the case, but certainly if you look at the map feature of things near you, they actually left Google Maps scraping parameters hidden in the source code as well. And it’s more symptomatic of, know, we can ask like, yeah, but why would they do this? Don’t they have the greatest engineers in the world and raise the biggest fundraising round in the world? Like, do they really need to do this? Well, I would say to that that it’s a Google

10 or 15 or 20 years to make Google Shopping Graph what it is today. It’s one of the most difficult feats of engineering, know, taking entities, live pricing, putting it all together. And it still, of course, has its flaws. For ChachiPT to recreate that in the space of a few months would be very, very challenging, I think.

Niklas Buschner (27:36.005)
Hmm, got it. Do you feel like this is a supporting argument for the people that sometimes make the claim that AEO, GEO, AI search is basically just SEO?

Tom – Peec AI (27:53.654)
I would say that it’s very different in the different AI models. know, Gemini, Google AI mode potentially has the ability to pull directly in from shopping graphs. So just to give you an example, although I was able to match all of these products, if you actually went to the carousel and then checked, because it pulls also the image that is from the third party scraper,

you can then see basically deficiencies in chat GPT where if you actually go through and you’re trying to buy a carry-on suitcase for your next business trip and you want it in black, there are cases if you click on the chat GPT link, it might be pink and it might be out of stock. Why? Latency, scraping latency. So often to save cost, any scraping provider will tell you that if you basically…

pay to have the thing scraped overnight, it will be much cheaper when you collect it in the morning, but it might be a day late. So I also tested this across Black Friday and chat GPT carousel could not keep up with the live pricing data of Google shopping. So you’ll see sometimes differences also in the price. So a scraping layer will always be worse than the real first party actual shopping thing. So that’s just basically from a user perspective. However, what

What AI can do that I believe is the reason that users do like it is they can summarize information from a variety of sources very quickly and make it into a buying guide and the style that you would like to your very specific queries. So it turns out that it’s actually the combination of, you know, your brand and product sentiment, how you’re mentioned in third party sources mapped in with how you appear on shopping feeds like Google shopping and taking that all together of

you know, how your overall e-comm presence would be in AI search, which of course is more complicated than it used to be. So I would say that if anything, the discipline has become more complicated. I speak with SEO veterans of 10 years every day for work and, you know, they are struggling still to understand and it’s quite right and quite easy to understand that, you know, things are not…

Tom – Peec AI (30:06.25)
move beyond the ten blue links and the keywords now and it’s a much more complicated system that is operating, I would say.

Niklas Buschner (30:13.783)
Now, talking about fan arts, you already mentioned fan art queries. Let’s look a little bit about how Chatchpt actually pulls in the information. So how do they go from a user prompt to the actual product carousel and everything that I see as a response, because you also found something very interesting there. Not only that the vast majority of products come from

Google organic shopping, but there’s more to this whole process, right?

Tom – Peec AI (30:44.586)
Absolutely. yeah, fan-outs has been a much discussed term in the online community and effectively you can just think of it as the type of web search that the AI does to get the information that you want. I think this is a fairly simple explanation. We looked at query fan-outs in a lot of detail at peak over the last few months and in around November time, the same time I was interested in the shopping topic,

there’s actually a doubling of the average word length of the search query. And this was a surprising finding to many people. They thought, why would chat GPT suddenly make these fan-outs very, very long? The reason is that the longer the fan-out query, typically the more long tail or the more specific the web search results will be. And potentially it’s not always true, closer to what the user has actually…

asked and basically ChachiPT learned that if you ask a longer question you get more specific answers. This is part one of the answer. Part two of the answer is once you have more specific results related to the user query it’s actually easier for the AI to differentiate between them and pick the right ones to answer the question. This is on the level of normal search fan-outs. This is one of the trends happening there. Now in the

field that I decoded that I mentioned where it was what’s called Base64 decoded. It just means it’s not human readable. You have to use something to basically convert it back to something you can read. Within this field was the specific query that ChatGPT was using to retrieve the products from Google Shopping encoded in this field and it still is today. Both on 5.4 and 5.3 this still applies. Now I find this really interesting because if you

just check Google shopping with the same prompt that the user did. Your accuracy would be very low on can you target the same products. But within this field, not only does ChartGPT encode the location that it uses to search for the products, is, if people are interested, it’s the field UULE. And if you decode this, it gives you the location. So New York, New York, or whatever.

Tom – Peec AI (33:00.748)
If you combine the location with the actual query that it uses, the accuracy of what products Chai GPT is retrieving suddenly increases greatly. And this is how you can then do a study like what I showed. So effectively Chai GPT source code tells us exactly what it is doing in the background. And so specifically about the shopping fan outs, what we can say is that they are much shorter and they’re specifically targeting

product categories, whereas in a contextual normal search fan out, it might ask something like, what do I need to consider when buying running shoes? Some long query where it’s targeting articles where it can take chunks of text. And so the AI itself can understand, oh, you need to understand other shoes waterproof. Are you training for a marathon? This is what it uses to basically pad that text of buying guide content. But specifically when it’s pulling the products, the query is much shorter targeted and actually

attacking a specific product category.

Niklas Buschner (34:03.058)
Now, something I’m wondering is that JetGBT also introduced their own product feed, I think already quite some time ago. so why do you think or do you have any information or like, did you in any way see that this product feed that JetGBT has launched themselves is also part of how products are recommended? Is this something that is only relevant?

chat gbt ads so what is the what what’s going on with the whole chat gbt product feed

Tom – Peec AI (34:34.314)
Yeah, so the data for the study for Search Engine Land was pulled in around mid to end of January. And in this case, back then, a couple of months ago now, I did not see ChachiPT’s own feed be a relevant source across the 43,000 products, because if I had, I wouldn’t have been able to match 83 % of them or whatever the exact figure was. what’s happening there is if you…

think of retailers like Etsy, there’s an increased move of ChatGBT to basically onboard big brands feeds such as Dick’s Sporting Goods, Best Buy, this type of things, where instead of doing the Google Scraping search methodology, it would then in the background try and retrieve directly from other providers who have their own effectively data set or their own shopping graph, if you like. The added complexity of this becomes

from a simple user query, how would ChachiPT decide which of those networks to pull the data from? So as of now, as ChachiPT is still onboarding these providers and doing its testing, you see some volatility in the data while it’s doing this testing and while it’s onboarding. However, if you look at the, it was announced recently that ChachiPT decided to move the instant checkout feed outside of this.

This was one case where they pushed very quickly to try and basically do this and it ended up not really working very well. Now with onboarding the different clients and their own product feed, there’s quite a technical challenge to actually do this. Then also if they do this, these clients need to see some kind of benefit that they actually spent infrastructure resources on actually somehow integrating. And this is also very, very tricky because

In a query, it kind of introduces a type of bias where ChachiPT says, well, we have these 10 providers. You know, can buy your running shoes from Google shopping or Dick’s Sporting Goods. However, if you think about Google shopping, is it a democracy? Not really, but it’s more of a democracy than going straight to Dick’s Sporting Goods because, you know, let’s say the most optimized feed, whatever that means, are the products that appear on top.

Tom – Peec AI (36:58.078)
and you do at least get a variety of providers or retailers who you could buy from. If you then go down the road of saying, well, ChatGBT has something like preferred partners or preferred places where it searches, then it solves the live data issue and the live pricing issue because you’re essentially then hooking straight in to Dick’s Sporting Goods database and you can then confirm that it’s in stock and you have the right price, which already

corrects the big deficit. However, in terms of the diversity of the search results, it would be very, very interesting to see what happens. So to add a little bit more to this, with ChatGPT 5.3 and 5.4 that were launched very recently, particularly with ChatGPT 5.4, what I did see already in the data now, just to be clear, because people on LinkedIn are very quick to attack this angle, ChatGPT

5.4 has not been rolled out for logged out users. This data can only be asked if you do it in your own personal logged in sessions, which is what I did in my own case where I analyze effectively my own streaming data. So there’s no data privacy issues there at all. I effectively recreated asking for different product carousels and I just wanted to see what happened. Now what I saw and what I’ve posted about publicly on LinkedIn already is that now with ChatGPT 5.4

Chai GPT actually has an inherent brand bias. it started to… Whereas in my study, the shopping query fan-outs did not mention a brand typically, very, very rarely. So it would say, best running shoes under $500 or best smartphone under $400 or something like this. So if you notice there, it doesn’t say some specific brand and therefore doesn’t really influence the Google shopping results that are returned.

In 5.4, this is actually different. In 5.4, there is a brand bias and you can see it in the reasoning steps that it does. So when it starts to think, it will often already be thinking of specific brands. And this affects the whole kind of like dominoes falling over it. It affects the whole chain of, if it’s already thinking that these are the best providers in this field, this affects what products it retrieves, this affects what context it retrieves. And with this, you can see kind of how sensitive

Tom – Peec AI (39:23.936)
the model is. And I do wonder, is this sort of super brand bias that’s coming into 5.4 related to ChatGPT kind of making a move to onboarding clients and their own product feeds. So yeah, it’s just food for thought. I don’t know the answer to it already.

Niklas Buschner (39:39.964)
I think there’s a lot to process and a lot food for thought. A very simple question. So thanks so much for explaining it in so much detail. A simple question around the whole, Gentic checkout or like integrated checkout. Also, do you think that, open AI just like identified that people like to use Chai GPT for product discovery? This is why they also have this carousel feature, et cetera, but

Tom – Peec AI (39:42.028)
For sure.

Niklas Buschner (40:08.903)
that people are still hesitant to basically check out from there or do you feel like maybe it’s just too much effort for them to get it right technically so they had to deparatize so just a little bit of your thoughts because you’ve really been deep into the whole topic.

Tom – Peec AI (40:25.449)
Yeah, it’s great question. So it’s a bit like the question for you as a user, if you know that you shop regularly on, let’s say, Amazon or Best Buy, if we take the US data. So if I go to ChatGPT and I drop in the Best Buy search plugin or however it’s integrated now, because it does actually change over time. And I know that with this, I get the live product data, the live review data, and effectively more integration than if I asked ChatGPT to do it natively.

For me then it’s kind of like a no-brainer that me as a user, if I’m aware of the benefit, I would do this. And so for me, effectively the retailers will always have better data than effectively someone scraping the results. for me it’s quite, it definitely kind of has to happen that they move it more towards the retailers because they were not able already in the last two.

one or two years to kind of come up with their own in-house solution that says, we’ll aggregate all these diverse feeds of data. No, they tried this, they kind of failed. They scraped Google Shopping and it doesn’t really produce necessarily the best user experience if pricing is off or it’s not technically live data, which is kind of strange if you’re missing out on offers, coupon codes, discounts, things like this. And so I do think it makes complete sense from a user perspective that

the goal of the team must be to basically give better shopping results to the users. So yeah, this is part one of the answer, part two. I would say that, yes, there is data that shows that while a lot of users use AI for the discovery and research phase, there is still a little bit of lack of trustworthiness and that maybe users as a second step might go to the brand website and check, hey, or they might read reviews on various social sites and they might…

themselves put all of the information together before making the purchase decision. They might watch a review video on YouTube and so on. So I think that having it all in-house in one AI window, I don’t think there’s enough users trusting this end-to-end workflow yet, which is why we saw Agents Ignite really basically take off.

Niklas Buschner (42:40.135)
Before we go into the practical implications for e-commerce managers, because you also put together a great piece on what people can actually take from this research in order to optimize their presence. I just like to ask you if there’s anything like surprising for you or something where you feel like, Hey, this is something that I don’t want to be missed if we talk about the whole research and analysis part before moving over.

Tom – Peec AI (43:10.175)
Yeah, sure. So, yeah, I just want to be so loud and clear that the study does prove that chat GPT scrapes Google shopping. And for me, this is like the scene in The Wizard of Oz where someone peers behind the curtain and you see that it’s not actually magic anymore. It is just scraping. And too often in the case of OpenAI, the answer is just scraping and often connected with Google. And, you know, it’s

I can’t even call it laziness from the side of the engineers. It’s an incredibly hard task to just build a shopping graph. However, for a company with such funding, I think a lot of people from personal LinkedIn messages and congratulations I’ve received on the study was surprised just how blatant and obvious this scraping was. This is one part. And the other part is if you are using AI as a user to do your shopping, I would say, you know, keep a little bit of that

untrusting side of you and do go out and verify the information, the links, do your own product research, use AI as a tool but also actually make sure that you are checking the information is accurate yourself, they’re kind of the main points.

Niklas Buschner (44:24.09)
love the Wizard of Oz comparison. think that that is really a great visual to have in mind. So now talking about the practical implications, because obviously now everybody is like, okay, I thought it was magic. Now I know it’s not. So then let’s get to work. How can I work with this? So what would you recommend like people that work in Ecom, SEO, AI search?

Tom – Peec AI (44:26.265)
Ha ha ha.

Niklas Buschner (44:52.392)
They want to get their products into the product carousels of Chatchviti. Where should they start?

Tom – Peec AI (45:01.994)
So the first part to check I would say is what I would call feed optimization So that simply means for the key product terms or the key searching terms Are you appearing in the top few results of Google shopping if you want to take it another level? You can then check how high up the results you’re appearing Another level is to check if you’re only interested about your product showing up

and your products show up across multiple merchants and multiple retailers, that’s another level to check as well. So it really depends on your use case, right? So if your use case is we are the brand and we are the retailer, that’s quite specific. You need to kind of go out and check firstly on Google shopping feed for the locations that you’re interested in, for the product terms and search terms you’re interested in, do you show up? And that is basically, you know,

feed hygiene, are your prices up to date, is your merchant center account up to date. Basically all the basics of Google shopping done very, well. This is sort of like level one. The other part is using platforms like PKI or similar is also checking if you are appearing already in Google shopping, does this translate as I predicted it would to carousel positions?

Are you regularly appearing in the top five positions in Google Shopping? If so, you could quite rightly expect to appear in the top few positions of the carousel, but you actually need to go out and check and confirm that that’s the case. If it is the case, probably you’re doing quite a few things right already, and there’s maybe less for you to worry about, but you can always optimize further. actually collecting that data is something that you need to take the time to do to basically check that.

In the study we saw a position correlation, so if you do rank higher in Google Shopping you do rank higher in the ChachiPT carousel as well, so it’s not just the let’s say top 20 results but ideally top 5 would be great. Your chance of making it into the carousel for the related query then would be pretty high. Consistency, you know, are you regularly showing up in the top few results of Google Shopping or is it changing over time? Is it very dynamic market that you’re in?

Tom – Peec AI (47:22.154)
So that’s kind of where I would start with that. And the second part is it’s not just the organic shopping results that are influencing the carousel, but also all supporting information. could call it like with essentially all of the sources that influence those context queries. So buying guide style contents.

Is your product and brand mentioned there regularly? And if so, is it mentioned more than your competitors? And if so, are you mentioned in a positive way? And with this information, you effectively then have the roadmap of what you need to optimize over the next weeks and months basically.

Niklas Buschner (48:02.124)
So if I, for example, so if I’m a shop that sells products that are not necessarily from myself, so I’m not the manufacturer, but maybe just, just in quotes, a retailer, and I’m selling, for example, ultra wide or like high quality monitors. So basically I should make sure that my products and that my like Google Merchant Center feed, that it’s well optimized and that I’m also

like even showing up in Google’s organic shopping results because I I contributed to the piece you published. I have seen quite some Google Merchant Centers where there is this big red error if you check for your free listings approval status and it just tells you not approved due to the whole comparison shopping service, et cetera thing where people think they can save a lot of money on ads, which

Malte probably has also a lot of opinion on it, probably you too. But this is piece number one. And then piece number two is I could create content on my own side, like a comparison guide, dual. So should you go with dual monitors or with an ultra wide monitor? And I should check how am I mentioned or how are these products mentioned across the web, maybe on Reddit, maybe YouTube, et cetera. Is that a correct summarization?

Tom – Peec AI (49:25.524)
So you mentioned on your own site, I would recommend to first look at the niche that you’re in. So if I’m in, let’s say, a high authority, low content volume niche, which means something like cybersecurity, it takes a lot of knowledge to write about how to install a mesh VPN securely on your enterprise software or a lot of the articles are quite long tail and you actually need a cybersecurity expert to even check that it’s correct before publishing.

Niklas Buschner (49:31.453)
Mm-hmm.

Tom – Peec AI (49:55.115)
In those type of niches we see brands both being the source and the mentioned brand which is sort of like they control the narrative from end to end, know, an example for people to check. I’m not affiliated with them in any way but I use it as an example a lot as the cybersecurity brand Sentinel-1. They produce a lot of that type of content and you see them both mentioned as a top cybersecurity brand and a top source. However, if you’re in the industry of fragrances and beauty

you you’re selling perfume or clothes, this is probably impossible to achieve. So I would say that 90 % of the content in a sort of high competitive market is going to come from third party sources. Now, that means that you want to find a way to influence how your brand and products appear on those sources. And that will give you the product and brand mention and sentiment uptick that then feeds into the AI creating its buyer guide content and so on.

Now, there’s another level to this that is often missed and overlooked. So you need to understand the sort of market landscape of those third party sources that are regularly being mentioned. So firstly, you must know who are the top sources. Are they more high quality editorial pieces? Are they more maybe let’s say medium quality listicles to low quality listicles? Like the fact is whichever sources are ranking are influencing your market. So it’s not really our place to necessarily judge them. It’s our place to then basically

see if we can get mentioned or our clients mentioned in those, ideally in an organic way if it’s not pay-to-play, however in the industry of fragrance and beauty you’ll find that the percentage of affiliate content is incredibly high which actually gives all of those brands the lever to pull where if they’re not mentioned they can actually then have a paid strategy to become mentioned and they will then see a direct uptick in how often they are mentioned in that retrieval content which combines together

in the magic recipe to then hopefully be featured more in the product selection.

Niklas Buschner (51:56.086)
Now maybe a non-obvious stupid question. Can I also get into the Chachabitipark carousel by just running Google Ads? So you always said it’s about the Google organic shopping results, so the non-paid. Is there a… Yeah, let’s… Because I was wondering why is Chachabitipark not looking at the ads?

Tom – Peec AI (52:11.304)
I tell you why.

Tom – Peec AI (52:18.674)
Yeah, so this is because ads have, well, there’s a few reasons really, but the true reason is it scrapes the organic results because it gives this nice, fairly low bias selection of 40 products to choose from. In the study, I went to 40 organic products because what I did was I mirrored the exact settings that ChatGPT was using.

with its scraping provider to get the most accurate results possible. So that was the top 40 organic results. If you go into Google Shopping with a USVPN, that will be the part where you see either browse all products or all products completely skipping the ads. With ads, because they dynamically change so so often, and the makeup of these ads changes so so often, if ChatGPT just scrapes the ad results,

it would probably have a worse selection set for the products it wants to feed back into the carousel. The organic results are more stable over time than course dynamic bidding and ads and things like this. And I believe that Puchat GPT made the decision that this selection of organic products, this 40, is a better approximation of what the user may select from than the top few ads in Google Shopping. The other part of that is

geo-specific stuff, mean, geo isn’t not geo what we do, but geo isn’t location-wise. Google Shopping is different in different countries. However, what is fairly non-controversial is organic products can be shown in most locations, whereas ads, maybe not. It’s not clear that ads can be shown in every country. So, they want to choose something that they can scrape for the majority of locations, which is almost always organic.

Niklas Buschner (54:07.561)
So no way to buy myself into the JGBT carousel by running Google shopping ads.

Tom – Peec AI (54:14.823)
Technically no, because as far as I’ve seen from now, it’s not scraping from paid feeds, if you like. It would be a better way to be in the top five to 10 on the Google shopping feed and have a lot of supporting content that says if you’re really trying to rank a specific product for a category, that says that your product best fits that solution, if you like.

Niklas Buschner (54:42.055)
Anything else besides or beyond you mentioned that people working in ecom either for a brand, for retailer, selling their own products, selling others products that they should think about, care about, maybe take from the study if they want to, if they generally want to win, if you want to call it like this, the AI search presence.

Tom – Peec AI (55:10.451)
Yeah. So one thing is how surprisingly easy it would be to sit in AI agents on top of well organized data. So in the era of Claude code, nano-claw and running, agents relatively harmlessly and Docker containers and so on every day, it’s becoming a little bit easier for people to do that. So it’s a little bit of hopefully a word of inspiration for retailers and brands that I truly believe that

Their added value is the data modes that they create themselves. And I’ve been using this example, so you have to bear with me why I choose an abstract example, but it has a reason. If you think of people shopping for products that are kind of hard to find and that you need a certain human in the loop, almost like approval to be confident that you’re getting it, imagine you’re looking for luxury Italian furniture. It’s a high ticket item.

But would you trust ChatGPT to deliver your results? Because if it scrapes Google Shopping or scrapes the general internet, which is of course by definition not specific, the results aren’t going to be good. However, if there’s a brand out there who has taken the time and the energy to collect this information and systemize it and effectively make their own data modes, if you imagine an agent sitting on top of there and helping you make personal decisions, likely it’s going to be much, much better. So…

I really believe that I would love to see effectively rather than ChatGPT being the default starting place, brands begin to win back a bit of their own visibility by creating amazing, agentic experiences on top of the data and their own expertise. And I think that that is really one of the things I would love to see this year, because if you have well-organized live price data that you’re either already integrated with a Shopify store,

you have 90 % of the recipe that you effectively need to have your own agent and then it becomes about things like brand awareness, you why should they go to you rather than someone else? But in the era of X, LinkedIn, social media, it’s actually easier than probably before to get a lot of eyeballs in front of amazing UX experiences that you create. yeah, just to reiterate that the data mode that you create yourself is actually often the value and just to give it away,

Tom – Peec AI (57:31.795)
to have the sake of an agentic checkout within an AI app may not necessarily be the best decision for your category.

Niklas Buschner (57:39.233)
Tom, that’s a very great summarization and I really appreciate you sharing such details about the study. Now the obvious, maybe not final question, but like an obvious question I have to ask is what’s on the horizon? Like what new research are you working on where you can maybe give us a little bit of a teaser, a little bit of a spoiler so that people, I don’t know if PKI already has like a newsletter or something, but I think you already have a newsletter.

then people definitely subscribe to the newsletter or want to follow you to be the first to read and see what you guys are working on.

Tom – Peec AI (58:16.967)
If they follow any of the research teams, so that’s me, Malte and Tomek Rucki on LinkedIn, you will get the direct feed to all of the research that we publish, because for now at least it comes through one of these three channels, typically on LinkedIn, so people can also message me and ask me questions about the research, which they love to do, and I love to also answer these questions. But yeah, a sneak peek of what’s next. I can give you a sneak peek of what I’m working on right now, and I find it really…

course really interesting. We talked about Google dynamically changing content and the example that we had in this podcast was it can change the product title or the product description ever so slightly by dynamically rewriting it. The reason it does that is it believes that it offers the users a closer answer to what the question is that they’re asking. And so it does the same with meta descriptions. So that little description that you add about your blog article that Google uses to effectively rank it.

it rewrites these very, very regularly. Why is this interesting? Well, because we know that ChatGPT scrapes these web results, it means by definition it ingests the rewritten description, not the actual description that’s in the source code. So what Google chooses to rewrite your meta description to is what actually ChatGPT uses to effectively rerank its content. So I decided to do a study on this.

I won’t reveal too much, but the idea of the study is firstly, this rewritten meta-description idea to see how much of chat GPT’s answer relies purely on this rewritten meta-description. My belief is quite a high percent. The complexity to study this is pretty tricky, I will tell you. So that is effectively what we doing because it’s really valuable information because if we can see that there’s a high percentage really

based on quite a small snippet, we see a very high reliance on how Chatch GPT actually begins to think about things. So this is going to be one of the next big research pieces that we’re working on.

Niklas Buschner (01:00:24.438)
Awesome. So people, if you are listening to this and you want to hear more about the research, definitely go follow Tom, Tomak and Malte from the PKI team and watch out for what they’re doing. And I will also make sure that Tom makes the time to come on the podcast again when they publish the next piece. Yeah. So we get basically the behind the scenes, also the reasoning and a little bit more of the nitty gritty details, which obviously

Tom – Peec AI (01:00:42.184)
Thanks Nicholas.

Niklas Buschner (01:00:53.11)
People can also read, but you know how people are. sometimes just want it to be read to them. So Tom, thanks so much for taking the time for sharing this in so much detail. I am very much looking forward to every new research piece that’s there to come because so far, PKI or the PKI research team did not disappoint, to say the least.

Tom – Peec AI (01:00:58.226)
That’s true. That’s true.

Tom – Peec AI (01:01:18.984)
Thank you, Nicholas. We’re really trying our best to produce super high quality research and also want to hopefully inspire other research teams to push it to another level, make it potentially peer reviewed or at least have the option to look into the methodologies in a lot of detail. Something like people like Ethan Smith are also pushing for as well. So we really want to go in this direction as much as possible, add some transparency to how we do research in the industry as a whole. So yeah, thanks, Nicholas. It was a really super fun interview.

Niklas Buschner (01:01:48.983)
Thanks so much, Tom. Have a great day and see you soon. Bye bye.

Tom – Peec AI (01:01:52.274)
Thanks, you too. Bye for now.

Ready to make organic the channel you can count on?

Run a free audit on your domain or book a 30-minute call with the Radyant team — we'll dive into your category, share what we've seen work in similar situations, and outline a plan if there's a fit.

run_audit book_meeting

Ready to make organic the channel you can count on?

Related reads