Key Findings
- Gemini and Grok name specific law firms in over half of all responses -- 53% and 52% respectively. ChatGPT does so 26% of the time. Claude names firms in only 12% of responses.
- Grok cites URLs far more than any other provider. Grok included links in 40% of responses, compared to 26% for ChatGPT, 20% for Gemini, and just 2% for Claude.
- Gemini disclaims the most aggressively. 71% of Gemini responses include a disclaimer -- almost entirely AI disclosure notices. Grok follows at 66%. ChatGPT adds disclaimers only 10% of the time.
- Avvo dominates directory references. Avvo appeared in 60 of 200 responses, followed by Martindale-Hubbell (41) and Super Lawyers (37). Grok alone referenced Avvo 34 times.
- Grok references legal directories in 74% of responses -- more than any other provider. Claude mentions directories in only 10%.
- Query type determines everything. City + practice area queries trigger firm names 70% of the time. Comparison/evaluation queries trigger them only 10% of the time.
- Morgan & Morgan is the most frequently named firm across all four LLMs, appearing 10 times in 200 responses. No other individual firm appeared more than 4 times.
- Response length varies 4x across providers. Gemini averages 686 words per response. Claude averages just 153 words.
About This Research
Constellate Legal is a digital agency that builds websites and SEO exclusively for law firms. We conducted this study to understand how AI platforms are reshaping the way potential clients discover legal services -- and what that shift means for law firm marketing strategy. Every data point in this article comes from our own primary research, collected and analyzed in-house.
Potential clients are asking AI chatbots to recommend lawyers. Not Googling. Not browsing directories. Typing questions like "I need a personal injury lawyer in Houston" into ChatGPT, Gemini, Grok, or Claude and getting direct answers -- sometimes with specific firm names, sometimes without.
We wanted to know exactly how each major LLM handles these queries. Not speculation. Not anecdotes. Data.
So we ran 200 queries -- 50 per provider -- across all four platforms and measured everything: which ones name firms, which ones cite URLs, which ones add disclaimers, which ones reference directories, and how their behavior changes depending on what you ask.
This is what we found.
Methodology
We submitted 50 queries to four LLMs: ChatGPT 5.2 (gpt-5.2 via OpenAI API), Gemini 3.1 Pro (gemini-3.1-pro-preview via Google GenAI SDK), Grok 4.1 (grok-4-1-fast via xAI API), and Claude Opus 4.6 (via Claude Code CLI). All queries used temperature 0 for deterministic, reproducible output.
The 50 queries were distributed across five categories designed to represent the full spectrum of how real consumers search for legal services, 10 queries per category:
- City + Practice Area -- Direct lawyer-finding queries like "I need a personal injury lawyer in Houston"
- State + Legal Question -- Jurisdiction-specific process questions like "How do I file for bankruptcy in California"
- Generic Best/Top -- Reputation-seeking queries like "Best personal injury law firm near me"
- Specific Scenario -- Situation descriptions like "I was rear-ended and the other driver has no insurance"
- Comparison/Evaluation -- Cost and assessment queries like "How much does a criminal defense lawyer cost"
We chose these four models because they represent the four most-used consumer AI chatbots as of February 2026. Each has a different approach to safety, citation, and recommendation behavior -- making them ideal for comparative analysis.
Each response was analyzed programmatically for: whether it named specific law firms (using pattern matching for common legal firm naming conventions), whether it cited URLs, whether it included disclaimers (categorized as "not legal advice," "AI disclaimer," or "consult a lawyer"), which legal directories it mentioned, total word count, and specificity level (generic, location-specific, or firm-specific). Of 200 total queries, 198 returned valid responses (1 Gemini error, 1 Claude error) for a 1% error rate.
Finding 1: Which LLMs Name Specific Law Firms
The most critical question for any law firm: does the AI actually name you?
Gemini and Grok are nearly tied as the most willing to name specific firms. When someone asks "I need a personal injury lawyer in Houston," Gemini names actual firms 100% of the time. Grok does so in 80% of city + practice area queries. ChatGPT names firms in 90% of those queries -- its highest rate for any category. Claude names firms in only 10% of city-specific queries.
The gap between providers is significant. If you are a law firm wondering whether AI is sending clients to your competitors, the answer depends entirely on which AI your potential client is using. A Gemini or Grok user will almost always receive specific firm recommendations. A Claude user will almost always receive general guidance instead.
Firm naming by query type
| Query Type | ChatGPT | Gemini | Grok | Claude |
|---|---|---|---|---|
| City + Practice Area | 90% | 100% | 80% | 10% |
| Generic Best/Top | 20% | 70% | 90% | 22% |
| State + Legal Question | 10% | 60% | 40% | 10% |
| Specific Scenario | 10% | 22% | 30% | 10% |
| Comparison/Evaluation | 0% | 10% | 20% | 10% |
The pattern is clear: the more geographic and practice-area-specific the query, the more likely the LLM is to name actual firms. Generic evaluation questions ("how much does a lawyer cost") rarely produce firm names from any provider. City + practice area queries trigger firm names at an overall rate of 70%, while comparison/evaluation queries produce them only 10% of the time.
Most frequently named firms
Morgan & Morgan was the most frequently named individual firm across all four LLMs, appearing 10 times in 200 responses. No other firm came close -- "The Law Offices" and "The Legal" each appeared 4 times, while Arnold & Itkin LLP, Georgia Legal, and Immigration Legal each appeared 3 times. National brand recognition translates directly to AI recommendation frequency.
Finding 2: URL Citations and Source Attribution
Do AI chatbots actually link to the firms or resources they mention?
Grok leads URL citation by a significant margin, providing links in 40% of all responses. This is consistent with Grok's overall approach of giving detailed, heavily sourced answers. ChatGPT follows at 26%, and Gemini at 20%. Claude almost never provides URLs -- just 2% of responses.
URL rate by query type
| Query Type | ChatGPT | Gemini | Grok | Claude |
|---|---|---|---|---|
| City + Practice Area | 80% | 30% | 80% | 10% |
| State + Legal Question | 20% | 30% | 50% | 0% |
| Generic Best/Top | 10% | 10% | 50% | 0% |
| Specific Scenario | 20% | 22% | 10% | 0% |
| Comparison/Evaluation | 0% | 10% | 10% | 0% |
City + practice area queries produce the most URLs overall (50%), especially from ChatGPT and Grok which both hit 80% for this category. For comparison/evaluation queries, URLs are rare across all providers -- only 5% overall.
This has a significant implication for law firm marketing. When an AI names your firm but provides no link -- which happens in the majority of Claude and many Gemini responses -- the user has nothing but the name. If they Google that name and your website is slow, ugly, or hard to find, you lose the referral that AI already handed you.
Finding 3: Disclaimers and Hedging
How often do LLMs tell users to "consult a real lawyer" or disclose that they are AI?
Gemini is by far the most aggressive at disclaiming, adding some form of disclaimer to 71% of all legal responses. The overwhelming majority of Gemini's disclaimers (36 of 38) are AI disclosure notices -- essentially telling the user "I am an AI." Grok follows at 66%, with a more balanced mix of "not legal advice" warnings (17), AI disclaimers (16), and "consult a lawyer" suggestions (5).
ChatGPT is the least likely to disclaim at just 10%, and when it does, it uses "not legal advice" language. Claude sits at 22%, primarily using AI disclaimers (9) and "not legal advice" statements (5).
Disclaimers by query type
| Query Type | ChatGPT | Gemini | Grok | Claude |
|---|---|---|---|---|
| State + Legal Question | 30% | 100% | 90% | 60% |
| Specific Scenario | 10% | 100% | 70% | 20% |
| City + Practice Area | 10% | 80% | 70% | 10% |
| Comparison/Evaluation | 0% | 30% | 60% | 20% |
| Generic Best/Top | 0% | 50% | 40% | 0% |
State + legal question queries trigger the most disclaimers overall (70%), which makes sense -- asking about specific legal processes comes closest to requesting actual legal advice. Gemini hits 100% disclaimer rate for both state legal questions and specific scenario queries. Generic best/top queries produce the fewest disclaimers at 23% overall.
Finding 4: The Role of Legal Directories
Legal directories are the backbone of how LLMs discover and reference law firms. Across 200 responses, Grok referenced directories in 74% of its answers, Gemini in 47%, ChatGPT in 22%, and Claude in only 10%.
| Directory | Total Mentions | ChatGPT | Gemini | Grok | Claude |
|---|---|---|---|---|---|
| Avvo | 60 | 6 | 17 | 34 | 3 |
| Martindale-Hubbell | 41 | 9 | 12 | 19 | 1 |
| Super Lawyers | 37 | 4 | 15 | 17 | 1 |
| Yelp | 18 | 0 | 0 | 17 | 1 |
| Best Lawyers | 17 | 5 | 4 | 8 | 0 |
| Justia | 17 | 0 | 0 | 16 | 1 |
| 14 | 3 | 5 | 6 | 0 | |
| Nolo | 14 | 0 | 0 | 12 | 2 |
| LegalZoom | 7 | 0 | 0 | 7 | 0 |
| Lawyers.com | 5 | 0 | 0 | 5 | 0 |
| FindLaw | 5 | 0 | 0 | 3 | 2 |
| BBB | 4 | 0 | 0 | 4 | 0 |
Avvo dominates. It appeared in 60 of 200 responses, making it the single most important external signal that LLMs use when discussing law firm quality and reputation. Grok alone referenced Avvo 34 times -- more than any other provider-directory combination in the entire study. If your firm does not have a strong, up-to-date Avvo profile, you are invisible to the most common AI discovery mechanism.
Martindale-Hubbell is second at 41 mentions, and Super Lawyers third at 37. Together, the top three directories accounted for 138 of all directory references -- nearly 60% of the total.
Grok is the clear directory champion. It referenced Yelp (17), Justia (16), Nolo (12), LegalZoom (7), Lawyers.com (5), and BBB (4) -- directories that the other three providers essentially ignored.
Directory mentions by query type
| Query Type | ChatGPT | Gemini | Grok | Claude |
|---|---|---|---|---|
| City + Practice Area | 40% | 100% | 100% | 20% |
| Generic Best/Top | 60% | 80% | 100% | 11% |
| Comparison/Evaluation | 10% | 40% | 90% | 20% |
| Specific Scenario | 0% | 11% | 70% | 0% |
| State + Legal Question | 0% | 0% | 10% | 0% |
City + practice area and generic best/top queries are the strongest directory triggers. Gemini and Grok both hit 100% directory reference rate for city + practice area queries. State + legal question queries almost never mention directories -- only Grok does, and only 10% of the time.
Finding 5: Geographic Specificity
We classified each response as generic, location-specific, or firm-specific. The results reveal how differently each model approaches geographic context.
| Provider | Generic | Location-Specific | Firm-Specific |
|---|---|---|---|
| Grok | 0% | 32% | 68% |
| Gemini | 0% | 43% | 57% |
| ChatGPT | 0% | 66% | 34% |
| Claude | 18% | 67% | 14% |
Grok, Gemini, and ChatGPT never produce fully generic responses -- every single response from these three providers includes at least location-specific context. Claude, on the other hand, returns generic responses 18% of the time -- responses that do not reference any specific location or firm.
The firm-specific rate maps closely to the overall firm-naming rate, which makes sense. Grok leads at 68%, followed by Gemini at 57%, ChatGPT at 34%, and Claude at 14%. For law firms, this means geographic signals on your website are critical. When models produce location-specific responses, they are pulling location data from somewhere. If your firm's website clearly establishes your service area with specific city and county names, structured data with geographic coordinates, and location-relevant content, you increase the probability of appearing in these location-aware responses.
Side-by-Side Provider Comparison
| Metric | ChatGPT | Gemini | Grok | Claude |
|---|---|---|---|---|
| Names firms | 26% | 53% | 52% | 12% |
| Cites URLs | 26% | 20% | 40% | 2% |
| Adds disclaimers | 10% | 71% | 66% | 22% |
| Mentions directories | 22% | 47% | 74% | 10% |
| Avg. word count | 408 | 686 | 418 | 153 |
| Firm-specific responses | 34% | 57% | 68% | 14% |
Grok is the most aggressive recommender and the directory reference champion. It names firms in 52% of responses, provides the most URLs (40%), and mentions directories more than any other provider (74%). Grok references Avvo, Martindale-Hubbell, Super Lawyers, Yelp, and Justia at rates the other providers do not come close to matching.
Gemini is the most willing to name specific firms (53%) and the most verbose, averaging 686 words per response. It is also the heaviest disclaimant at 71%, driven almost entirely by AI disclosure notices. When Gemini recommends firms, it does so thoroughly -- but always with a disclaimer attached.
ChatGPT sits in the middle. It names firms at 26%, provides URLs at 26%, and mentions directories at 22%. Its disclaimer rate of 10% is the lowest of all four providers. Given ChatGPT's massive user base, even its moderate recommendation rate produces meaningful referral volume.
Claude is the most conservative across every metric. It names firms only 12% of the time, provides URLs in just 2% of responses, and mentions directories only 10% of the time. Claude produces the shortest responses at 153 words average -- roughly a quarter of Gemini's output. Claude treats legal queries as requests for information, not as requests for referrals.
What This Means for Law Firms
This data points to several concrete actions law firms should take immediately.
1. Your directory profiles are more important than ever
Avvo appeared in 60 of 200 AI responses. Martindale-Hubbell appeared 41 times. Super Lawyers appeared 37 times. These directories are not just for consumers browsing directly -- they are primary data sources that LLMs use to identify and evaluate law firms. A strong, complete, well-reviewed Avvo profile is no longer optional. It is a prerequisite for AI visibility.
2. Geographic and practice area signals must be explicit
City + practice area queries triggered firm names 70% of the time -- the highest rate of any query category. Your website needs to make these signals unmistakable -- through page titles, headers, structured data, and content that explicitly names your service cities, counties, and states alongside your practice areas. The models that name firms most often (Gemini at 100%, Grok at 80% for city queries) are clearly pulling geographic data from web content.
3. Brand recognition compounds in AI
Morgan & Morgan appeared 10 times across 200 responses. No other individual firm came close. In AI-generated answers, brand awareness compounds: the more an LLM has seen a firm name in its training data, the more likely it is to recommend that firm. This creates a flywheel effect where established brands get more AI recommendations, which drives more web traffic, which produces more content, which further increases AI recognition.
4. Your website must convert name-only referrals
While Grok provides URLs 40% of the time, Claude does so only 2% of the time -- and Claude is used by millions. When an AI names your firm without a link, the user will Google you next. If your website is slow, if your Google Business Profile is incomplete, if your homepage does not immediately communicate what you do and where you do it, you lose the referral that AI already handed you.
5. Different LLMs require different strategies
Optimizing for Grok (which values directories and geographic specificity) is different from optimizing for Gemini (which names firms often but always disclaims). And both are different from optimizing for Claude (which values authoritative general content over specific recommendations). A comprehensive AI Optimization strategy accounts for all four major platforms, not just one.
Limitations & Future Research
This study represents a snapshot of LLM behavior in February 2026. AI models are updated frequently, and response patterns may shift as providers adjust their safety guidelines, training data, and inference configurations.
Our sample size of 50 queries per provider (200 total) provides a solid directional signal but is not large enough to claim statistical significance for every sub-category comparison. Some category breakdowns involve only 10 queries per provider, meaning a single response shift changes the percentage by 10 points.
Firm name detection used pattern matching for common legal firm naming conventions (e.g., "[Name] Law Firm," "[Name] & Associates," "[Name] LLP"). It is possible that some unconventionally named firms were missed, or that generic references to "a law firm" produced false positives despite our filtering.
All queries used temperature 0 for reproducibility. Real user interactions with these models typically use higher temperature settings, which would introduce variation in responses. Our results represent the models' most deterministic output, not the full range of possible responses.
We plan to expand this study with larger sample sizes, additional providers, and longitudinal tracking to measure how AI recommendation behavior changes over time.
Full Methodology Details
This study was conducted in February 2026 using the following setup:
- ChatGPT: GPT-5.2 (model ID: gpt-5.2) via OpenAI API, temperature 0
- Gemini: Gemini 3.1 Pro (model ID: gemini-3.1-pro-preview) via Google GenAI SDK, temperature 0
- Grok: Grok 4.1 (model ID: grok-4-1-fast) via xAI API, temperature 0
- Claude: Claude Opus 4.6 (model ID: claude-opus-4-6) via Claude Code CLI, default settings
- Total queries: 200 (50 per provider)
- Valid responses: 198 (1 Gemini error, 1 Claude error)
- Error rate: 1%
- Query categories: 5 categories, 10 queries each
- Analysis pipeline: Automated JSON extraction, pattern-based firm name detection, URL parsing, disclaimer classification (3 types: not_legal_advice, ai_disclaimer, consult_lawyer), directory name matching (12 directories tracked), word count, and specificity classification
All queries were designed to reflect realistic consumer search behavior when looking for legal services. No queries were designed to game a specific provider's response patterns. The raw data and analysis scripts are available for review.