Skip to content

Handling bots and crawlers

Unwanted crawlers and bots can cause problems in your Elevate integration. While some crawlers (like Googlebot) are legitimate and beneficial, others can artificially increase traffic and session counts leading to:

  • Increased resource usage
  • Inflated usage metrics (more sessions than actual users)
  • Skewed analytics and statistics

By following these recommendations you can mitigate bots and crawlers impact by identifying, filtering, and restricting unwanted sources.

Note

Elevate can exclude certain traffic in client-side integrations, but for server-side integrations we have no visibility into the traffic source and therefore cannot assist with traffic exclusion.

Allow “good” bots

Some crawlers are necessary for SEO or platform integration.

Examples include: Googlebot, Bingbot, LinkedInBot, FacebookExternalHit

Maintain an allow list of known user agents that should be permitted to access your site. These bots help with indexing and link previews and generally behave predictably.

Manage crawler traffic

Not all automated traffic is harmful, but different types of crawlers must be handled differently to avoid inflated usage metrics.

Guide well-behaved crawlers using robots.txt

Use robots.txt to tell compliant crawlers which URLs on your own domain should not be crawled. This helps reduce unnecessary load from pages that would otherwise trigger Elevate API calls (e.g., internal search or utility URLs).

User-agent: *
Disallow: /search
Disallow: /internal-endpoint/

Note

robots.txt only affects cooperative crawlers such as Googlebot. It applies only to the pages on your site, not external domains like *.elevate-api.cloud. It does not block bots from executing your JavaScript or making API calls if they choose to ignore the file.

Resources: Robots.txt

Block or filter unwanted / aggressive bots

Some crawlers ignore robots.txt entirely and may generate large numbers of Elevate sessions. These should be filtered or blocked server-side before any Elevate API call is made.

1. Inspect the User-Agent header

Block requests that appear suspicious by checking the reported client identity. Requests with missing, malformed, or clearly fake identifiers should be denied immediately to prevent unauthorized or automated access.

2. Rate-limit suspicious traffic

Throttle excessive traffic from individual IPs or networks to reduce the risk of abuse, denial-of-service attempts, and automated attacks.

Resources: Rate limiting ai scrapers with nginx, Rate limiting with NGINX

3. Monitor traffic sources

Regularly review logs for spikes from:

  • unknown IP ranges
  • unusual geographies
  • known hosting providers or bot networks

This helps identify bots that bypass user-agent filtering.

Resources: Logging

Generalised approach to handling automated traffic

  1. Classify trusted automated clients
    Identify and explicitly allow well-known, legitimate automated clients that are required for core business functions, such as search engine indexing or integrations.

  2. Detect and flag unknown or suspicious clients
    Requests that do not match trusted automated clients or typical human-driven browsers should be treated as potentially automated and flagged for special handling.

  3. Gracefully degrade functionality for suspected automation
    Instead of blocking these clients outright, reduce the level of functionality and resource usage associated with their requests. This helps limit impact while avoiding unnecessary disruption.

  4. Propagate classification to downstream systems
    Ensure that the client classification is available throughout the request lifecycle so that frontend behavior and third-party integrations can adapt accordingly.

  5. Disable non-essential processing for flagged sessions
    Turn off analytics, tracking, personalization, or other secondary services for suspected automated traffic to prevent amplification effects and traffic bursts.

×
Copyright

This online publication is intellectual property of Voyado Lund AB. Its contents can be duplicated in part or whole, provided that a copyright label is visibly located on each copy and the copy is used in conjunction with the product described within this document.

All information found in these documents has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither Voyado Lund AB nor the authors shall be held liable for possible errors or the consequences thereof.

Software and hardware descriptions cited in these documents might be registered trademarks. All trade names are subject to copyright restrictions and may be registered trademarks. Voyado Lund AB essentially adheres to the manufacturer’s spelling. Names of products and trademarks appearing in this document, with or without specific notation, are likewise subject to trademark and trade protection laws and may thus fall under copyright restrictions.

CLOSE