Handling bots and crawlers
Unwanted crawlers and bots can cause problems in your Elevate integration. While some crawlers (like Googlebot) are legitimate and beneficial, others can artificially increase traffic and session counts leading to:
- Increased resource usage
- Inflated usage metrics (more sessions than actual users)
- Skewed analytics and statistics
By following these recommendations you can mitigate bots and crawlers impact by identifying, filtering, and restricting unwanted sources.
Note
Elevate can exclude certain traffic in client-side integrations, but for server-side integrations we have no visibility into the traffic source and therefore cannot assist with traffic exclusion.
Allow “good” bots¶
Some crawlers are necessary for SEO or platform integration.
Examples include: Googlebot, Bingbot, LinkedInBot, FacebookExternalHit
Maintain an allow list of known user agents that should be permitted to access your site. These bots help with indexing and link previews and generally behave predictably.
Manage crawler traffic¶
Not all automated traffic is harmful, but different types of crawlers must be handled differently to avoid inflated usage metrics.
Guide well-behaved crawlers using robots.txt¶
Use robots.txt to tell compliant crawlers which URLs on your own domain should not be crawled. This helps reduce unnecessary load from pages that would otherwise trigger Elevate API calls (e.g., internal search or utility URLs).
User-agent: *
Disallow: /search
Disallow: /internal-endpoint/
Note
robots.txt only affects cooperative crawlers such as Googlebot. It applies only to the pages on your site, not external domains like *.elevate-api.cloud. It does not block bots from executing your JavaScript or making API calls if they choose to ignore the file.
Resources: Robots.txt
Block or filter unwanted / aggressive bots¶
Some crawlers ignore robots.txt entirely and may generate large numbers of Elevate sessions. These should be filtered or blocked server-side before any Elevate API call is made.
1. Inspect the User-Agent header¶
Block requests that appear suspicious by checking the reported client identity. Requests with missing, malformed, or clearly fake identifiers should be denied immediately to prevent unauthorized or automated access.
2. Rate-limit suspicious traffic¶
Throttle excessive traffic from individual IPs or networks to reduce the risk of abuse, denial-of-service attempts, and automated attacks.
Resources: Rate limiting ai scrapers with nginx, Rate limiting with NGINX
3. Monitor traffic sources¶
Regularly review logs for spikes from:
- unknown IP ranges
- unusual geographies
- known hosting providers or bot networks
This helps identify bots that bypass user-agent filtering.
Resources: Logging
Generalised approach to handling automated traffic¶
-
Classify trusted automated clients
Identify and explicitly allow well-known, legitimate automated clients that are required for core business functions, such as search engine indexing or integrations. -
Detect and flag unknown or suspicious clients
Requests that do not match trusted automated clients or typical human-driven browsers should be treated as potentially automated and flagged for special handling. -
Gracefully degrade functionality for suspected automation
Instead of blocking these clients outright, reduce the level of functionality and resource usage associated with their requests. This helps limit impact while avoiding unnecessary disruption. -
Propagate classification to downstream systems
Ensure that the client classification is available throughout the request lifecycle so that frontend behavior and third-party integrations can adapt accordingly. -
Disable non-essential processing for flagged sessions
Turn off analytics, tracking, personalization, or other secondary services for suspected automated traffic to prevent amplification effects and traffic bursts.