Agentic AI Nowcasting - How LLM Agents Build Real-Time Factor Scores

You scan 50 stocks before the open. You check news, read filings, skim social sentiment, and end up with a shortlist of 5 names. An LLM agent does this across 1,000 stocks every day, synthesizing web sources into a single attractiveness score per ticker. A January 2026 paper by Zefeng Chen and Darcy Pu tested exactly this on the Russell 1000. The results are specific enough to be useful and limited enough to be honest about.

The concept is agentic AI nowcasting: an autonomous LLM searching the web in real time, filtering sources without human curation, and outputting quantitative stock predictions at the current edge of time. No curated news feeds. No structured data pipelines. Just an LLM with web access, a universe of tickers, and a scoring prompt.

What the Research Actually Tested

Chen and Pu deployed a state-of-the-art LLM (the specific model is not named in the paper) to evaluate every Russell 1000 constituent daily, starting April 2025. The agent autonomously searched the web, filtered sources, and synthesized information into an attractiveness score for each stock. No human fed the model news, disclosures, or curated text.

The design eliminates look-ahead bias by construction. Predictions were collected at the current edge of time. Once the information environment passes, it cannot be recreated. This makes the dataset irreproducible in both directions: you cannot backtest it, and you cannot reconstruct what the model saw last Tuesday.

The key result: longing the 20 highest-ranked stocks generated a daily Fama-French five-factor plus momentum alpha of 18.4 basis points, with an annualized Sharpe ratio of 2.43. Transaction costs represented less than 10% of gross alpha because these are liquid Russell 1000 names.

I run my own multi-factor screens on US large caps. A daily alpha of 18.4 bps is extraordinary. For context, most quantitative equity strategies target 3-8% annualized alpha. This result implies roughly 46% annualized. That number alone should make you skeptical enough to read the fine print.

Why Only the Top 20 Mattered

The actionable finding is not “AI predicts stock returns.” It is: AI predicts the top winners, and nothing else. Expanding the portfolio beyond the top 20 rapidly diluted alpha. Bottom-ranked stocks showed returns statistically indistinguishable from the market. The model cannot identify losers.

Chen and Pu hypothesize this asymmetry reflects online information structure. Genuinely positive news generates coherent, consistent web signals. Negative news is contaminated by strategic corporate obfuscation (investor relations spin, vague forward guidance) and social media noise. The LLM can synthesize a clear bullish case when one exists. It cannot reliably synthesize a bearish case from deliberately muddied waters.

This matches what I see in my own sector rotation and relative strength work. Strong stocks produce clean signals. Weak stocks produce conflicting ones. If you have ever tried to short a stock that “should” drop based on fundamentals, you know the noise problem firsthand.

What “Agentic” Means in Practice

The word “agentic” matters here. This is not a model reading a structured dataset. The LLM receives a ticker symbol and independently decides what to search, which sources to trust, and how to weight conflicting information. It builds the information set from scratch every day.

For a directional trader, the closest analogy is your morning research routine, automated and scaled. You check earnings calendars, scan headlines, read analyst notes, look at social sentiment. The agent does this across 1,000 names simultaneously. The output is a ranked list, not a trade recommendation.

What the agent does NOT do: it does not look at price charts. It does not read order flow. It does not have access to Level 2 data or options positioning. It processes text-based information from the open web. That means the scoring captures fundamental and sentiment signals, not technical structure. You still need volume confirmation and chart structure before acting on any name the model surfaces.

Practical Use as a Screening Layer

The right way to think about agentic AI nowcasting is as a narrow screening layer, not a trading system. The output is: “These 15-20 names have the most coherent bullish information environment right now.” Your job is to decide which of those names also have favorable chart setups, manageable risk, and appropriate position sizing.

I use a similar approach with my drawdown-quality momentum filter. The filter narrows a broad universe to a short list. Then I apply technical criteria. The agentic nowcast would sit in that same workflow position: upstream of your chart work, downstream of your universe definition.

What traders commonly get wrong here: treating the AI score as a signal rather than a filter. A score of “highly attractive” means the public information environment is bullish. It does not mean the stock will go up tomorrow. Corporate insiders, institutional flows, and technical exhaustion are all invisible to a web-scraping agent.

Concentration Risk You Cannot Ignore

The research shows alpha concentrated in exactly 20 names out of 1,000. That is a 2% hit rate. If you are trading a concentrated 20-stock portfolio with daily turnover, you face several practical problems.

First, daily rebalancing across 20 names generates tax events, execution costs, and operational overhead that the paper acknowledges but does not fully model beyond the 10% transaction cost haircut. For a retail account, the effective cost is higher due to wider spreads on smaller order sizes and short-term capital gains treatment.

Second, concentration risk at this level means a single stock blow-up can destroy a month of accumulated alpha. The paper reports average results. Individual days will produce drawdowns that feel nothing like a 2.43 Sharpe strategy. I have seen traders blow up concentrated momentum books precisely because the average performance masked the tail risk of holding 20 correlated names during a sector rotation.

Third, as more participants deploy similar agentic systems, the coherent information signal that drives alpha gets priced faster. This is network momentum working in reverse: when enough agents read the same web and reach the same conclusion simultaneously, the edge compresses to the speed of execution.

Compute Cost and API Economics

Running a state-of-the-art LLM with web search across 1,000 stocks daily is not cheap. Each evaluation requires multiple web searches, source retrieval, and synthesis. At current API pricing for frontier models, a single stock evaluation with web search might cost $0.50-2.00 depending on token count and search calls. Scale that to 1,000 stocks and you are looking at $500-2,000 per day in API costs alone.

The paper does not disclose exact compute costs. But the economics are clear: this strategy only works at scale if the alpha per dollar of compute justifies the infrastructure. For a $1M account generating 18.4 bps daily (roughly $1,840 gross), $500-2,000 in daily compute eats 27-100% of gross returns. The strategy needs institutional capital to be economically viable at full Russell 1000 scale.

A practical retail alternative: run the agent on a subset. Score 100-200 stocks from a pre-filtered universe (your sector, your watchlist, names already showing relative strength) rather than the full 1,000. This cuts compute by 80-90% while preserving the concentrated top-tier selection approach.

What This Does Not Tell You

The paper covers April 2025 onward. That period included specific market conditions that may or may not persist. A strategy with a 2.43 Sharpe over one year of bull-market conditions tells you less than the same Sharpe observed across a full cycle including a bear market and a volatility event.

The model is unnamed. You cannot replicate this today without knowing which LLM was used, what prompting strategy was applied, and how “attractiveness” was defined and scaled. The paper demonstrates the concept works. It does not hand you a deployable system.

Short selling does not work with this approach. The bottom-ranked stocks are not identifiable losers. They are noise. Do not use an agentic nowcast to build a short book. The information asymmetry the authors describe means short signals from web text are structurally unreliable.

I have tested simpler sentiment scoring on my own watchlist using GPT-4 class models with web access. The results are directionally consistent with Chen and Pu: bullish consensus from web sources correlates with near-term outperformance better than bearish consensus correlates with underperformance. The asymmetry is real and observable even at hobby scale.

Where This Fits in Your Workflow

If you trade a quality-momentum approach on US large caps, agentic nowcasting adds a layer you likely cannot replicate manually. You can read 10 stocks deeply each morning. You cannot synthesize web information across 200 names before the open. The LLM can.

The workflow would be: define your universe (Russell 1000 or a sector subset), run the agentic score daily, take the top 15-20 names as your morning research list, then apply your existing technical criteria. Discard names without chart confirmation. Size positions as you normally would. The AI narrows; you decide.

What fails: using the score as a standalone buy signal, running it on illiquid names where the information environment is thin, or trusting the model on the short side. Stick to the long-only, concentrated, large-cap application that the research actually validated.

The Honest Assessment for Directional Traders

Agentic AI nowcasting is a legitimate screening tool, not a trading system. The Chen and Pu paper demonstrates that an autonomous LLM can identify stocks with coherent bullish information environments, and that those stocks outperform on a risk-adjusted basis over a specific period. The alpha is real but concentrated, expensive to generate, and validated under one set of market conditions.

For a swing trader running a 5-20 name book on US large caps, the practical takeaway is narrower than the headline suggests. You want the top-tier long list. You do not want the full ranking. You do not want the short side. You want it as input to your existing process, not a replacement for it. And you need to watch the economics carefully, because compute costs scale linearly while alpha may not.

Educational content only. Not investment advice. Trading involves risk. You are responsible for your decisions.

What the Research Actually Tested

Why Only the Top 20 Mattered

What “Agentic” Means in Practice

Practical Use as a Screening Layer

Concentration Risk You Cannot Ignore

Compute Cost and API Economics

What This Does Not Tell You

Where This Fits in Your Workflow

The Honest Assessment for Directional Traders

You Might Also Like

Positive Volume Index – Read Crowd Behavior on Loud Days

Weighted Moving Average WMA: Formula, Best Periods, and Chart Behavior

Relative Vigor Index (RVI): Settings & Rules