Distributed Google Maps scraping

How to scrape data from Google maps using Golang and Kubernetes

Introduction

In this post, I will show you how you can utilize the power of Kubernetes to scrape data from Google Maps without using an API key.

for the tutorial, I will use as an example deploying to

DigitalOcean Referral Badge

But this will work in any managed Kubernetes provider.

The whole procedure to get the scraper up and running won't take more than 20 minutes. So give it a try.

Prerequisites

  • Create a Digital Ocean Account . I recommend if you do not have an account to create it via the referral link . This way you get 200$ of credit and I may also get 25$ (depending if you continue using Digital Ocean). This way you can try the tutorial for Free .
    Note: To get the 200$ credit you need to add a payment method.

  • Install kubectl in your local machine. Follow the official instructions .

Create a K8s Cluster

  1. Login to your Digital Ocean Account and click on the top Right Create.

    Digital Ocean Dashboard | Google maps scraper golang

In the menu that popups select: Kubernetes

Digital Ocean Menu | Kubernetes Google Maps scraper

After clicking Kubernetes the Kubernetes page opens:

For the purposes of the tutorial leave the defaults.
In a real life scenario you need to pick the desired region and configure the nodes you like.

Don't change the defaults for now. If you registered in Digital Ocean via the referral link don't worry about costs for now. Additionally, keep in mind that since we are going to start headless web browser we need memory and CPU.

Create a Kubernetes cluster in Digital Ocean | Setup Google Maps scraper on Kubernetes

Please wait until the cluster initializes. This can take around 5 minutes.

Kubernetes Cluster is creating

Once the cluster is provisioned then you have to download the kubernetes configuration file:

Get K8s configuration | How to setup google-maps-scraper in kubernetes

Download the configuration file and take note of the location. For the purposes of the tutorial we assume that it is located at `/home/giorgos/k8s.config.yaml

Let's check that we can connect:

kubectl --kubeconfig=$HOME/k8s.config.yaml get pods && echo $?

You should get output like:

No resources found in default namespace.
0

Create a PostgreSQL database

In your Digital Ocean dashboard click on the left panel Databases or follow this create a database in Digital Ocean .

Select PostgreSQL

Select PostgresSQL database and in the next page leave the defaults (the lower tier).

Then click Create:

Again wait a bit until it is provisioned.

Once the database is ready we need to:

  • Create a User and a database

Create a User and a database

setup a database digital ocean for the google maps scraper

First, open a terminal (or your favorite GUI tool) and connect to your database

psql -p 25060 -h db-postgresql-sfo3-81615-do-user-14100026-0.b.db.ondigitalocean.com -U doadmin -d defaultdb

(Please replace host with yours)

If you managed to connect then we can move to the next step.

Create tables

    CREATE TABLE gmaps_jobs(
        id UUID PRIMARY KEY,
        priority SMALLINT NOT NULL,
        payload_type TEXT NOT NULL,
        payload BYTEA NOT NULL,
        created_at TIMESTAMP WITH TIME ZONE NOT NULL,
        status TEXT NOT NULL
    );

    CREATE TABLE results(
        id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
        title TEXT NOT NULL,
        category TEXT NOT NULL,
        address TEXT NOT NULL,
        openhours TEXT NOT NULL,
        website TEXT NOT NULL,
        phone TEXT NOT NULL,
        pluscode TEXT  NOT NULL,
        review_count INT NOT NULL,
        rating NUMERIC NOT NULL
    );

Execute the above queries in your database client.

Google maps scraper deployment

First create a file with your queries. A sample is

bars in Athens
bars in Berlin
restaurants in Rome

Save this file in a file name queries.txt.

then:

docker run -v $PWD/queries.txt:/queries.txt gosom/google-maps-scraper:v0.9.3  -depth 5 -input /queries.txt  -dsn "postgres://doadmin:{yourPassword}@{yourHost}:25060/defaultdb" -produce -lang en

(Replace with your password and your host)

be patient because the image is around 1GB so it needs to be downloaded

Once, the command finishes verify that the jobs are inserted to the database:

select count(1) from gmaps_jobs

Run the above query in your database client. It should return 3 if you use my sample file.

We are now ready to start our scrapers.

Create a file with the kubernetes deployment configuration named gmaps.deployment.yaml and paste the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: google-maps-scraper
spec:
  selector:
    matchLabels:
      app: google-maps-scraper
  replicas: 2
  template:
    metadata:
      labels:
        app: google-maps-scraper
    spec:
      containers:
      - name: google-maps-scraper
        image: gosom/google-maps-scraper:v0.9.3
        imagePullPolicy: IfNotPresent
        args: ["-c", "1", "-depth", "5", "-dsn", "postgres://doadmin:{YourPassword}@{YourHost}:25060/defaultdb"]

(Edit your password and your host)

Then apply the configuration:

kubectl --kubeconfig=$HOME/k8s.config.yaml apply -f gmaps.deployment.yaml

Give it some time, since the image needs to also get downloaded.

Check the status of the pods:

giorgos@gtp:~$ kubectl --kubeconfig=$HOME/k8s.config.yaml get pods
NAME                                   READY   STATUS    RESTARTS   AGE
google-maps-scraper-6489d96b84-7nltl   1/1     Running   0          68s
google-maps-scraper-6489d96b84-vvx6c   1/1     Running   0          116s
giorgos@gtp:~$

Meanwhile, check periodically the results table:

 select count(1) from results;

it will start slowly populating the results table.

defaultdb=> select * from results limit 5;
-[ RECORD 1 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id           | 1
title        | Athens Sports Bar
category     | Sports bar
address      | Veikou 3a, Athina 117 42, Greece
openhours    | Sunday, 10 AM to 12 AM; Monday, 10 AM to 12 AM; Tuesday, 10 AM to 12 AM; Wednesday, 10 AM to 12 AM; Thursday, 10 AM to 12 AM; Friday, 10 AM to 12 AM; Saturday, 10 AM to 12 AM. Hide open hours for the week
website      | http://www.athenssportsbar.gr/
phone        | +302109235811
pluscode     | XP8H+V9 Athens, Greece
review_count | 1
rating       | 4.4
-[ RECORD 2 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id           | 2
title        | 360 Cocktail bar
category     | Bar
address      | Ifestou 2, Athina 105 55, Greece
openhours    | Sunday, 9 AM to 4 AM; Monday, 9 AM to 3 AM; Tuesday, 9 AM to 3 AM; Wednesday, 9 AM to 3 AM; Thursday, 9 AM to 3 AM; Friday, 9 AM to 3 AM; Saturday, 9 AM to 4 AM. Hide open hours for the week
website      | http://www.three-sixty.gr/
phone        | +302103210006
pluscode     | XPGG+H6 Athens, Greece
review_count | 8
rating       | 4.4
-[ RECORD 3 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id           | 3
title        | Teddy Boy
category     | Bar
address      | Taki 18, Athina 105 54, Greece
openhours    | 
website      | https://m.facebook.com/teddyboy.bar
phone        | +306951116651
pluscode     | XPHF+8F Athens, Greece
review_count | 489
rating       | 4.5
-[ RECORD 4 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id           | 4
title        | Revolt street bar
category     | Bar
address      | Koletti 25-27, Athina 106 77, Greece
openhours    | Sunday, 11 AM to 2 AM; Monday, 11 AM to 2 AM; Tuesday, 11 AM to 2 AM; Wednesday, 11 AM to 2 AM; Thursday, 11 AM to 2 AM; Friday, 11 AM to 3 AM; Saturday, 11 AM to 3 AM. Hide open hours for the week
website      | https://www.facebook.com/Revoltstreetbar/
phone        | +302103800016
pluscode     | XPPM+85 Athens, Greece
review_count | 461
rating       | 4.5
-[ RECORD 5 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id           | 5
title        | 42 Barstronomy Athens
category     | Cocktail bar
address      | Kolokotroni 3, Athina 105 62, Greece
openhours    | 
website      | https://42barstronomy.gr/
phone        | +302130052153
pluscode     | XPGM+Q8 Athens, Greece
review_count | 1
rating       | 4.5

defaultdb=>

Conclusion

In this tutorial, I showed you how you can use the google-maps-scraper in Kubernetes to automate and scale scraping Google Maps results.

Note: Please clean up the resources in your Digital Ocean account to avoid undesired charges once you are done with this tutorial