Predicting Who Starts the Next Buying Spree!

Check out the full thesis here:

Time-Aware Centralities and Embeddings of Nodes for Influence Prediction in Evolving Socio-Financial Networks

https://drive.google.com/file/d/16-wYIRD4KyFTpqw_6ASdLtwb-dBN-t-m/view?usp=sharing

Predicting Who Starts the Next Buying Spree: Influence in Socio-Financial Networks

If a handful of traders on a social investing app start buying the same stock, how far does that wave travel through the network? And more importantly: can we predict who is likely to trigger the next wave?

That is the core idea behind my thesis: “Time-Aware Centralities and Embeddings of Nodes for Influence Prediction in Evolving Socio-Financial Networks.”

In plain terms: I built a graph-based, time-aware ML pipeline to forecast how influential individual traders will be in triggering cascades of trades in a real social trading platform.

The setting: a real socio-financial network

The data comes from Invstr’s “Fantasy Finance” app, a gamified social network for trading and financial literacy with over 1.5M downloads globally. I work on an anonymized subset focused on trade posts: when a user executes a trade and shares it with followers, that creates a time-stamped interaction.

I model each trade post as a directed temporal edge:

(TradedBy, ReceivedBy, ReceivedOn)

Over a 3 month window at the peak of the 2021 bull market (Oct - Dec 2021), that yields:

39 million shared trade posts

72k active traders

200k post-receiving users

I slice this evolving network into 6-hour snapshots and study how information and behavior propagate through it over time.

Four pillars of the approach

The thesis is built around four conceptual pillars that eventually feed into one prediction task: node-level influence forecasting.

Centrality indices

Classic and temporal network measures for “how important” a node is:

Static: in-degree, out-degree, eigenvector, PageRank, closeness, betweenness

Temporal: causal (time-respecting) closeness and betweenness, computed on time-respecting paths rather than static shortest paths

A causal measure of influence

I define causal cascade size: starting from a trader’s post, I follow all time-respecting causal paths through the network (under a time delta constraint), build causal trees, and count how many unique users are reached downstream. That count is the trader’s influence score in that time window.

Node embeddings

I learn representations of traders in a latent space:

Static: node2vec-style embeddings on aggregated graphs

Time-aware: embeddings produced by spatio-temporal GNNs that see both graph structure and its evolution over time

Spatio-temporal graph neural networks

I treat influence prediction as a spatio-temporal node regression problem:

Inputs per snapshot: centrality features + node embeddings

Target: the node’s future causal cascade size in an extended window (current + next snapshot), capturing current influence, continued relevance, and continued influence.

Concretely, I compare several architectures:

Baseline: feed-forward network (FFNN) with centralities only

Static embedding model: FFNN + node2vec

Spatio-temporal GNNs:

GCRN-GRU, GCRN-LSTM
T-GCN
EvolveGCN-H and EvolveGCN-O

First: what does trading actually look like?

Before predicting influence, I answer two descriptive questions about the network.

RQ1 - What traits does trading activity exhibit?

Some key patterns in the 3-month window:

Weekly cycles

Trading peaks on Mondays and declines toward the weekend
Weekends have consistently low activity

Intraday cycles

Peak activity around 14:00–15:00 UTC, aligning with the US market open and execution of queued orders
Very low activity between 21:00–12:00 UTC, aligning with market close and typical sleep hours

Concentration of assets

Out of 4,165 tradable instruments, a small fraction drives most trading volume

Asset Symbol	# of Trades	Percentage of trades
[All trades]	1629882	100%
[Top 10]	690477	42.4%
BTCUSD	202083	12.4%
ETHUSD	118164	7.3%
Silver	75936	4.7%
Gold	67671	4.2%
TSLA	67256	4.1%
AAPL	64599	4.0%
SNAP	46022	2.8%
AMZN	45638	2.8%

Macro link

Trading activity is positively correlated with BTC absolute daily returns, BTC trading volume, and BTC volume momentum

📈

So the network is not just noisy social chatter. It is tightly coupled to both market microstructure (timing) and crypto market conditions.

RQ2 - How do individual users behave?

Looking at the time between subsequent trades, per user and globally:

Inter-trade durations are well modeled by a Weibull distribution with shape k < 1, which implies a decreasing trading rate over time.

This is driven by the app’s mechanics: users get a one-time virtual 1M USD allocation with no recurring top-ups, so many become “holders” rather than constant churners.

The same decreasing-trade-rate pattern appears when aggregating across all users, so individual behavior scales up to the global level.

For cascades themselves:

The majority of trade posts generate no cascade at all.

The distribution of positive cascade sizes follows an approximate power law:

Most cascades are tiny
A small minority grow large, reaching up to size 24 and depth 36 generations in the causal tree.

🔥

In other words: most posts are duds, but a few behave like viral memes.

Do classic centralities actually predict influence?

This leads to the next pair of questions:

RQ3: Can centrality measures be used as node features to predict influence?

RQ4: Which centralities are effective?

To probe this, I treat each trader’s centrality time series and causal cascade size time series as signals and run Granger causality tests.

Key result:

All explored static and temporal centralities (in-degree, out-degree, eigenvector, PageRank, closeness, betweenness, temporal closeness, temporal betweenness) Granger-cause future cascade sizes at the 5 percent threshold level.

ㅤ	Centrality	Lag 1	Lag 2	Lag 3
Static	Indegree	46%	47%	50.4%
Static	Outdegree	100%	100%	99.9%
Static	PageRank	91.2%	92.1%	90.8%
Static	Closeness	39.4%	38.2%	43.4%
Static	Betweenness	86.7%	80.7%	78.6%
Static	Eigenvector	27.6%	28.6%	29.3%
Temporal	Closeness	45.7%	46.4%	50.1%
Temporal	Betweenness	76.2%	71.4%	69.4%

When I temporally shuffle the centrality series (destroying time structure), this predictive signal disappears and p-values jump back to around 5 percent.

💡

So classic centrality is not just a descriptive toy here. It carries real predictive information about who will spark larger cascades.

Adding time: temporal centralities and time-aware embeddings

The last two research questions ask whether explicit time-awareness buys us anything beyond static graphs.

RQ5 - Do temporal centralities help beyond static ones?

I compare models trained with:

Only static centralities, vs.

Static + temporal centralities

Result: adding temporal centralities improves at least one metric for 5 out of 7 models.

Notable MAE improvements when adding temporal centralities:

FFNN: 27.9%

node2vec + FFNN: 20.0%

GCRN-GRU: 7.33%

⌛

So knowing when a node sits on time-respecting paths (not just where it sits in the static graph) helps the model.

RQ6 - Do time-aware node embeddings improve influence prediction?

Here I move from centralities into representation learning and ask whether spatio-temporal embeddings beat static ones.

Benchmarks:

No embeddings: FFNN + centralities only

Static embeddings: node2vec + FFNN

Time-aware embeddings: spatio-temporal GNNs (GCRN variants, T-GCN, EvolveGCN-H/O)

Two layers of comparison:

All embedding-based models vs FFNN (no embeddings)

Every node2vec and spatio-temporal GNN model significantly outperforms the FFNN in MAE, with improvements like:

node2vec: ~35–28%
T-GCN: ~68–36%
EvolveGCN-H: ~61–39%
EvolveGCN-O: ~61–34%

Spatio-temporal GNNs vs node2vec (static)

Almost all spatio-temporal models significantly improve MAE over node2vec:

GCRN-LSTM: up to 56% better
T-GCN: ~50–10% better
EvolveGCN-H/O: ~40–15% better in many setups

✅

Overall conclusion: time-aware node embeddings that learn both graph structure and its evolution in time materially improve influence prediction over static embeddings.

Why this matters

From a research perspective, this work shows that:

📈

Peer effects in financial markets can be quantified through causal cascades on real social trading data.

⌛

Classic network tools still matter, but their temporal variants and learned embeddings amplify predictive power.

🚗

Spatio-temporal graph neural networks are not just for traffic prediction and sensor networks; they are effective for socio-financial influence modeling too.

From a practical perspective, this type of modeling could support:

🫧

Early detection of herding and bubble-like cascades

🧠

Better understanding of who to monitor as potential “super-spreaders” of trading sentiment

⛔

Risk management and platform design that actively account for network-driven contagion, not just fundamentals

Limitations and next steps

There are some important caveats:

The analysis uses only 3 months out of 3.5 years of data, chosen for computational tractability and because they coincide with a bull market. The dynamics may differ in bear or sideways regimes.

Time is rescaled from millisecond to minute precision, which may miss ultra-fast reactions, especially from bots.

A natural next step is a contagion study: explicitly seeding influential users and simulating how far and how fast influence spreads under different market regimes, and contrasting simple vs complex contagion in financial peer influence.

If you work at the intersection of ML, graph theory, and finance, the takeaway is simple:

📈

Do not just ask what people are trading. Ask who is connected to whom, when they act, and how that behavior propagates through the network.

Because the next buying spree might already be visible as a pattern of influence in your socio-financial graph.