When we invest at Series A startups are often at the point where Business Intelligence is still primarily run with a combination of Excel and engineers running SQL queries against operational databases.
One of the frequent questions we get is “how do we progress to a modern business intelligence stack?” — fortunately startups have converged to fairly well understood set of best practices and tools which means this is easier than ever.
The typical business intelligence stack breaks into three components:
- ETL — Getting the data
- Data Warehousing — Storing the data
- Business Intelligence Tools — Analyzing the data
ETL — Getting the data
ETL (extract-transform-load) is essentially about moving the data from it’s operational home (third party tools, operational databases, etc.) to the data warehouse in a form suitable for analysis. This can range from simply replicating the data to applying complex transformations on the data to make it easier to analyse.
Historically this would generally involving replicating data from internal databases into a central repository, however with the explosion in usage of SaaS for critical functions often core data now lies in third-party systems like Hubspot, Intercom, Zendesk or Salesforce. Extracting data from these services generally means using their provided APIs to obtain the data before inserting it into the warehouse.
While custom building integrations and replication used to be the norm, there are now a number of vendors which now provide these integrations as SaaS offerings. Segment, Stitch and Fivetran are all commonly used in the startup world.
Unlike most of the other parts of the stack, ETL is often a mix and match with startups frequently using different vendors for different sources of data and combining them with internal ETL in cases where off-the-shelf solutions aren’t viable.
Data Warehousing — Storing the data
A data warehouse is essentially a high-powered database which is optimised for analysis as opposed to traditional databases (MySQL, Postgres, etc.) which are designed for day-to-day operational use.
Amazon Redshift is the dominant player in this space and if you’re already on the AWS stack it’s the natural choice. If you’ve got high-volumes of data it’s worth considering Snowflake due to it’s higher performing self-tuning system.
If you’re already on the Google Cloud Platform stack then Google BigQuery is worth considering.
Business Intelligence Tools — Analyzing the data
Mode is closer to being a high-power SQL tool rather than a traditional BI tool. Analysts need to write SQL to extract the data they want and can then choose from a variety of visualization which they can then share via dashboards to other users within the company.
If your underlying business data is relatively straight-forward (i.e your SQL statements don’t get horrendously messy) and everyone producing reports will know SQL then Mode can be a good choice.
Looker is a more sophisticated product which is designed to allow most users to use it without needing to write SQL, having an interface akin to Excel’s pivot tool. It also comes with it’s own modelling language LookML which enables analysts to perform data transformations within the data warehouse (rather than at the ETL stage) and to abstract away complexities in the data model away from end users.
If you want business users to be able to use your BI tool directly or don’t want your analyst to have to be SQL experts then Looker is the obvious choice, but with the proviso that Looker’s additional capabilities come at price. That said Looker’s pricing is still significantly cheaper than the cost of one additional analyst.
Put together all the pieces and you have yourself a modern BI stack!
At Blossom we love backing data-smart startups both in consumer and enterprise across Europe. If you’re building one we’d love to hear from you!