• Blog
  • How to Build Open Source Cloud Asset Inventory with CloudQuery and Apache Superset (Preset)

How to Build Open Source Cloud Asset Inventory with CloudQuery and Apache Superset (Preset)

June 1, 2022

Yevgeny Pats
Name
Yevgeny Pats
Twitter
@yevgenypats

In this blog post, we will walk you through how to setup CloudQuery to build your cloud asset inventory in PostgreSQL and connect it to Apache Superset (or a hosted version such as preset.io) for visualization, monitoring and reporting.

General Architecture

  • ETL (Extract-Transform-Load) ingestion layer: CloudQuery
  • Datastore: PostgreSQL
  • Data Visualization and Exploration Platform: Apache Superset

What you will get

  • Raw SQL access to all your cloud asset inventory to create views or explore any questions or connection between resources.
  • Multi-Cloud Asset Inventory: Ingest configuration from all your clouds to a single datastore with a unified structure.
  • Avoid yet-another-dashboard fatigue: Use you existing Apache Superset setup as a best in class visualization platform to build cloud asset inventory.

Walkthrough

Step 1: Install or Deploy CloudQuery

If it’s your first time using CloudQuery we suggest you first run it locally to get familiar with the tool, take a look at our quickstart guide and AWS recipe.

If you are already familiar with CloudQuery, take a look at how to deploy it to AWS on Amazon Aurora and EKS at here.

Step 2: Install or sign up to Superset

Apache Superset is an open source Data Visualization and Data Exploration Platform so there are a number of ways to deploy it:

Step 3: Connecting Apache Superset (Preset) to PostgreSQL

By default RDS Aurora instances are not accessible from the public internet. In order to enable access by Preset you are going to have to update your security groups to include the IP ranges that Preset publishes (or alternatively look at the publicly_accessible variable in our terraform modules - aws , GCP). If you deploy it in your own VPC you might be able to connect it in your private network.

Now you can connect preset to your PostgreSQL database by clicking “New Database”, choosing PostgreSQL and filling-in the following form:

If all your credentials are correct, click connect and then finish

Step 4: Create a Dataset

To be able to create charts that you will later add to a dashboard you first need to create a dataset (for full details see Creating Charts and Dashboards in Superset).

In this guide we will create a dataset from a view (also, see blog) we already created on our database but you can also create a dataset from any complex query you can think off in Superset SQL Lab.

Step 5: Create your first chart!

Now you are ready to create your first chart on top of our AWS Resource View (Also, see blog)! Choose the dataset you created in the last step (in this case aws_resources) and choose the chart type you want to create (we will take the Bar Chart but you can also change this in the chart screen).

If you are familiar with the Superset UI this step is pretty easy where you just choose or drag and drop the columns you want to visualize (and you can always revert to plain SQL if needed).

Step 6: Save & Add to dashboard

Click Save and choose the Dashboard you want to add the chart (or create a new one if you don’t have one) and then click Save again or Save & Go to Dashboard

You can now repeat steps 4-5 or 3-5 to create all charts you need and add them to your dashboard/s.

Step 7: Edit Dashboard, Add Filters and Share and Alert!

Now we are at the final stage where you can edit, resize and tweak the charts/widgets so it looks something like this:

You can add Filters to make the dashboard more interactive and you can now share this with other team members, send links, image of dashboards as well as periodic alerts and reports via Superset.

Summary

In this post we showed you how to build an open-source cloud asset inventory with CloudQuery as the ETL (Extract-Transform-Load) / data-ingestion layer and Apache Superset as the visualization and monitoring platforms. This approach eliminates the yet-another-dashboard fatigue and gives you the ability to pick the best-in-class visualization tools or reuse your current stack.