Improving SEO with Dynamic Rendering for WebAssembly Sites
Bot detection and server side routing
To ensure search engine bots are served the generated static pages, we place the static files directory generated by the webscraping script on the previous page in the WebRoot directory, next to Blazor's index.html.
For an Apache HTTP Server setup, we modify the .htaccess file to redirect all traffic through an index.php script. This script handles routing, deciding whether the request should go to static content (for bots) or to the index.html (for users). The .htaccess file also ensures automatic HTTPS redirection.
.htaccess Rewrite redirection
With this setup, all requests for the Blazor framework are funneled through the index.php file. In index.php, we analyze the requested URLs and either serve the static files to bots or redirect users to index.html, where Blazor’s client-side routing takes over; It is explained further in-depth further down.
Index.php script
How the script works
Now, the PHP script uses the CrawlerDetect library to analyze the user-agent of incoming requests and determine if the request is coming from a bot or a human user. CrawlerDetect identifies a wide variety of bots and web crawlers based on their user-agent strings. If the request is from a bot, the script proceeds to analyze the requested URL and attempts to resolve it to a corresponding static file in the static directory.
If the static file exists, it is served directly to the bot via the readfile function in PHP. This process bypasses any additional redirects on the user-side, ensuring that the bot gets the static content directly. This method is crucial for SEO, as it allows search engines to index the static HTML content, improving visibility and ranking.
Security Measures against malicous paths
The script implements security measures to prevent malicious bots from exploiting vulnerabilities by sanitizing the requested path to remove unusual characters like .., *, or //, which could allow bots to escape the static directory and access other parts of the server's file system. It also verifies(after sanitation) that the requested directory is within the static directory. This protection ensures that the requested file path remains strictly within the static directory, preventing bots from bypassing .htaccess rules and gaining unauthorized access to sensitive files. If a bot attempts to access a location outside the allowed directory or provides an invalid path, the script will return a warning message: "Invalid bot/crawler navigation, please follow only links."
Serving humans
On the other hand, if the request is coming from a human user (not a bot), the script redirects the user to the index.html file, which is the entry point for the Blazor framework. At this point, Blazor takes over, and the client-side routing and dynamic rendering are handled by the JavaScript framework, with no additional server-side intervention required.
Final Thoughts on script and approach
This setup enables the website to deliver static content directly to search engine bots for improved indexing, while providing a full interactive experience to human users via Blazor. The PHP script effectively distinguishes between bots and users, ensuring both functionality and security, while preventing unauthorized file access.
Despite its advantages, I’m curious why this solution isn’t more widely adopted. Is there a flaw, or is it considered overly complex? To me, it seems like a straightforward and effective solution to host client-side heavy applications on a traditional web host. I’ll need to test it further to determine if it’s truly as efficient as it appears.
I also want to note that this script/approach is not only for Blazor specifically, it should work with any client-side heavy application that needs to be prerendered for SEO.
But in the end the result is that it allows google to correctly index our website from the static files as seen in the picture below:
[1]
Dynamic Rendering SEO Optimization for Blazor WebAssembly
2024-12-01 |
[2]
Bot detection and server side routing
2024-12-01 |