Making a google scraper to track SEO rankings

Picture this: you are creator of the great base64decode.orgarrow-up-right. and you want to setup an alert system to see if anyone is trying to steal your #1 spot on google. This guide will show you how to make a scraper to monitor search rankings, complete with JS challenge handling.

The finished example is available herearrow-up-right

An example of the module's output which has been sent to a discord webhook

Starting the project

1

Create a new automation in Futura

From the dashboardarrow-up-right, simply enable developer mode and click the plus button to open the automation creation prompt.

2

Follow the onboarding to create our module

For this guide, we will only have 1 module, which we will call seomonitor

3

Start the development environment

In order to quickly iterate in our development process, we will use the live development feature.

To do this, we will first need to login to the dev branch with:

then simply run:

Now our boilerplate code should be deployed and the ftr cli should be waiting for more changes to deploy

Prepare our parameters

1

Define the module's parameters

First off lets define what parameters this module will need to accomplish its task.

This module will monitor a google search page for ranking changes. This means we will need a search term.

At first thought, a string parameter would work for this. However, for monitoring multiple search terms it would be more convenient to be able to input a list of terms, which can be accomplished by creating and using a basic group.

2

Create the "search term" basic group type

Basic groups are just implementations of the basicgroupsprotocol.Parsable interface, so lets make type that implements this in searchTermGroup.go.

A "search term" is just a string, so implementing the parsing/serialization is simple:

3

Accept this basic group as a parameter

Now to accept entries from this basic group as a parameter, we will use the basicgroupsprotocol.EntryProvided[T] type, like so:

Now we have a form input for a search term group:

And a place to manage search term groups:

Implement the steps

1

Create step 1: Initializing the session

Google has recently stepped up their anti scraping game, likely in response to the rise in LLM adjacent scraping.

This means that in order to make a search, you need to solve a JS challenge first.

Reverse engineering this challenge is one option, but we don't need that kind of performance for our use case, so lets sandbox it with a browser.

Luckily, Futura has a utility for exactly this!arrow-up-right

We will make the browser navigate to a random search page, and the JS challenge will be solved automatically.

Once it has navigated, the valid session cookies can be exported for use in our much more performant net client, and the browser can be closed.

And lets make sure to update our constructor to use this function:

2

Create step 2: fetch the search page

Now that we have a session prepared, we need to fetch the search page in order to have data to report on.

First, lets add a field to the Task struct to store the search rankings in a slice of *url.URLs. We will call it topLevelSearchResults because we only want the top level results, no nested links like this:

Our task struct will now look like this:

Now lets implement the step for this.

Lets start with a basic GET http request to have the response available:

Then lets parse the response, possibly handling for another JS challenge:

3

Create step 3: report the success, and loop back

Now that we have the data, lets make use of them.

This step will simply detect changes in rank for any of the sites, then post those changes to the webhook if there were any:

The finished example is available herearrow-up-right

Last updated