Close

Discover the world's leading Synthetic Market Research solution.

hello@symar.ai

Beyond the Spreadsheet: The Science of Turning Clickstream Data into Customers

Nikola 23.01.2026

In the modern digital landscape, we have solved the problem of data collection. If you run an e-commerce platform, you have no shortage of information. You have session IDs, timestamps, cart abandonment rates, and conversion funnels.

But we haven’t solved the problem of understanding.

This has always been the catch.

Traditional analytics excel at answering “what” and “when.” They can tell you that a Cluster 4 (a segment, a cohort) of users consists of 32,000 users who visit frequently but rarely buy.

But they cannot tell you why. Is Cluster 4 comprised of “Window Shoppers” enjoying the visual experience? Are they “Price Sensitive Waiters” holding out for a sale? Or are they “Frustrated Users” encountering a UX hurdle?

At SYMAR (previously OpinioAI), we believe that the next generation of market research isn’t about gathering more data; it’s about extracting human narratives from the data you already have. It’s about modeling humans through data that already exists.

To validate this belief, I joined academics for a rigorous study.

We recently published our findings in the peer-reviewed journal Telematics and Informatics in a paper titled “Deepening digital user understanding through large language model analysis of clickstream-based segments.” 1

Here is a look at what we did, what we found, and why it changes the way businesses should look at their user base.

The Study: Analyzing 250,000 Digital Journeys

We aimed to bridge the gap between hard statistics (like K-means clustering) and qualitative strategy. To do this, we utilized a massive dataset from a real-world online store specializing in fashion and lingerie.

The scope was significant:

  • 255,334 unique customers
  • 631,976 unique actions (clicks, views, cart additions)
  • 3 months of continuous behavioral data

In a traditional workflow, an analyst would group these users into segments and then spend weeks manually reviewing the data to guess the characteristics of each group. We wanted to see if the AI could automate and if we could deepen this process.

We fed aggregated behavioral data into the model, not personal identifiable information, but statistical summaries of how different groups acted. We then tasked the AI with three sophisticated challenges: identifying new metrics, describing the segments in human terms, and recommending strategic actions.

Finding 1: AI Can “Invent” Meaningful Metrics

One of the most limiting factors in standard analytics is that we often rely on the same old metrics: Recency, Frequency, and Monetary value (RFM).

In our research, we asked the AI to propose new ways to measure these customer segments based on their behavior. The results were striking. The model didn’t just calculate averages; it identified behavioral nuances. It proposed sophisticated metrics such as:

  • Revenue Variability: Distinguishing between consistent spenders and erratic “whale” buyers.
  • Engagement Duration: Measuring the quality of attention, not just the click count.
  • Checkout Completion Rate: Isolating the friction points in the final mile of the purchase.

The study showed that the AI could look at a segment of users and identify that while their spending was low, their engagement was high, effectively reclassifying them from “low value” to “high potential.”

Finding 2: The “Cluster 0” vs. “Cluster 2” Distinction

The power of this approach became clear when we compared the segments.

Cluster 0 was our largest group (over 118,000 users). Traditional tools might just label them “Low Converters.” The AI, analyzing their browsing patterns and lack of cart activity, accurately described them as “Casual Browsers”, users who are there for the content, not the commerce.

Contrast this with Cluster 2, a smaller group of 17,000 users. They had high page views and interaction rates. The AI identified them as “High-Intent Researchers.”

The recommendations the AI generated for these groups were radically different, and strategically sound.

  • For the Casual Browsers, the AI suggested “low-friction” content marketing to keep them in the ecosystem without aggressive sales pressure.
  • For the High-Intent Researchers, it recommended detailed product comparisons and reassurance on return policies to close the deal.

This proved that Synthetic Market Research can move beyond generic “blast” marketing to hyper-personalized strategies at scale.

Finding 3: Stability and the “Human” Element

Perhaps the most critical part of our research for building SYMAR was testing the reliability of these insights.

Ultimately, can we trust what the AI says?

We used a statistical measure called the Cosine Similarity Index to check if the AI changed its mind when asked the same question multiple times.

We found that the AI is incredibly consistent when scoring users on a structured scale (e.g., rating “Buying Potential” from 1 to 5). However, the textual justifications, the paragraphs explaining the “why”, showed more variability.

This is a feature, not a bug, but it requires management.

It highlights why businesses cannot simply paste data into a generic chatbot and hope for the best. To get reliable, actionable business intelligence, you need a system that structures the inquiry and anchors the AI to the data.

From Academic Theory to Business Reality

We conducted this research to prove that Synthetic Market Research stands firm in rigorous science.

The ability to process hundreds of thousands of user journeys and distill them into actionable, human-readable personas is no longer a theoretical concept. It is a reality that we have built into the core of SYMAR.

Our platform takes the principles validated in this paper, advanced segmentation and persona synthesis, and makes them accessible for businesses.

You don’t need to run t-SNE visualizations or calculate standard deviations. You simply connect your data, and SYMAR generates the “Synthetic Personas” that represent your actual customers.

You can then talk to them. You can ask the “High-Intent Researchers” what is stopping them from buying. You can ask the “Casual Browsers” what kind of content they enjoy.

The data has always been there. Now, we finally have the technology to listen to the story it is trying to tell.

To read more about our research, check the citation below.

A Personal Note

I am incredibly grateful to have collaborated with professors Adam Wasilewski and Yash Chawla on this research. Working alongside them was an immense learning opportunity, bridging the gap between rigorous academic analysis and practical business innovation.

This paper represents just the beginning of our journey. I look forward to further exploring the intersection of AI, behavioral data, and human understanding, and continuing to translate these scientific discoveries into value for SYMAR users.


  1. Wasilewski, A., Chawla, Y., & Kozuljevic, N. (2025). Deepening digital user understanding through large language model analysis of clickstream-based segments. Telematics and Informatics, 104, 102359. https://doi.org/10.1016/j.tele.2025.102359 ↩︎
Privacy & Cookie Policy

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

You can review our Privacy & Cookie Policy, here.