Analyse Ethereum transactions with Google BigQuery

Featured image: Exemplary DAI Query

When you work with large data set you know the limits and challenges of traditional database management systems. You pay the price for the total cost of ownership. You manage and maintain the server infrastructure and need to make sure that data is accessible and up and running 24/7. As soon you are required to share your data with different parties within your organization or partners, data storage needs to be distributed and redundant in order to guarantee not only availability but high performance to return speedy query result sets. Sounds familiar, right?

About Google BigQuery

Besides developing new features, Google and other folks in the community made publicly available data accessible via Google BigQuery. At the time of writing, there are 130 different types of data sets available for you.

Image: Publicly available data sets of Google BigQuery

Work with Health data from the World Bank, Github Activities or real estate listings from Uruguay.

How to Load Data into Google Cloud Storage

Query Ethereum Blockchain Implementation with Google BigQuery

So what does it look like to query Ethereum in Google BigQuery?

Image: Ethereum data set

How to search Ethereum Transaction Structures in Google BigQuery?

SELECT
logs.block_timestamp AS block_timestamp
,logs.block_number AS block_number
,logs.transaction_hash AS transaction_hash
,logs.log_index AS log_index
,logs.block_hash AS block_hash
,logs.data AS data
,token_transfer.from_address AS from_address
,token_transfer.to_address AS to_address
FROM `bigquery-public-data.crypto_ethereum.logs` AS logs
JOIN `bigquery-public-data.crypto_ethereum.token_transfers` token_transfer ON logs.transaction_hash = token_transfer.transaction_hash
WHERE address = '0x89d24a6b4ccb1b6faa2625fe562bdd9a23260359'
AND topics[SAFE_OFFSET(0)] = '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef' order by logs.block_timestamp limit 10;

When you look into the Ethereum BigQuery data model, the data for the events is still hashed in the data column. In order to find the corresponding sender and receiver address, we need to join the token_transfer table with the logs table. The join slows down the query to 25.8 sec compared to 2 sec 801 ms on my working machine (MacBook Pro 2,7 GHz Quad-Core Intel Core i7, 16 GB 2133 MHz LPDDR3) with DataGrip an IDE for databases.

Image: Ethereum BigQuery data model

Use the shared query, if you would like to play with Google BigQuery UI and the Ethereum data set. Make sure to not exceed the monthly 1 TB, in case you opt-in for the on-demand pricing. The next 1 TB would cost you 5.00 USD. Did you know that we offer a monthly flat fee for all Ethereum-based production networks? For test networks, you’ll even get access for free!

Ethereum Google BigQuery REST API

Besides using SQL, you can work with the provided BigQuery REST API and code your own client in the programming language of your choice. In case you work with very large data sets a REST API might not be the ideal choice as the queries are running synchronously with the request. That means you need to deal with timeouts and slow responses for larger results sets. That’s the reason why Google recommends using the provided Google Cloud Client libraries.

Do you know Elasticsearch? Elasticsearch provides a REST API, too. You are able to define your DSL (Domain Specific Language) query in a JSON format and Elasticsearch will take care that you’ll get splendid super-fast response times. That’s the reason why we love Elasticsearch and indexed all the Ethereum network blockchains to our own Elasticsearch instance. So you can take advantage of the Elasticsearch performance.

Besides the client library, you can query Ethereum transactions and smart contracts from connectors. Install the Google BigQuery connectors and query data via the command line, Hadoop, Excel or Google dataflow.

Ethereum Visualization: Why should you query and visualize Ethereum data at all?

“A visualization like this (and the underpinning database query) is useful for making business decisions, such as prioritizing improvements to the Ethereum architecture itself (is the system running close to capacity and due for an upgrade?) to balance sheet adjustments (how quickly can a wallet be rebalanced?).”

We totally agree! It is essential to find improvements as early as possible when you work with distributed ledger technologies. Fixing a bug and re-deploying a new version is troublesome and costly! Sometimes even irreversible. Do you remember the DAO hack?

Google BigQuery is limited to the Ethereum Mainnet. We know many projects and partners which extensively test on Ethereum test networks. What about analyzing and monitoring those test networks? We understand your need! That’s why we are indexing Kovan, Rinkeby, Robsten, Görli and some other Ethereum- and Bitcoin-based networks as well. In total 22. You want to detect bottlenecks and issues before deploying them on the Mainnet, don’t you?

Image: Networks supported by Anyblock Analytics

Shall I use Google BigQuery for Ethereum?

1. Transaction ID Ethereum — Log Data Not Decoded

The value of the data column in the log table is not decoded. That requires you to decode the data at your end which slows you down.

Image: Ethereum log data table

2. Solidity smart contract as bytecode

Smart contracts are available as bytecode only. That means you are not able to query code, strings and functions.

Image: Solidity smart contract bytecode note

3. Transaction events not indexed

Ethereum transaction events are not indexed which is crucial in case you want to monitor and be alerted when specific events have occurred.

4. Join data tables

You need to create SQL views or joins in order to map data to one table. That makes the query significantly slower and requires additional scripting at your end.

You can use Google BigQuery, if you want to have general insights about Ethereum transactions and traces without running your own (archive) node. As soon you need to dig deeper into the world of Ethereum data, you need to do extra work. Decode the data, structure and store it, before you can start with your actual data science work.

Consider working with our Anyblocks Index instead?

1. More than 90% of the data is decoded. Avoid additional time-consuming processing and coding time and query the data right away.
If your smart contract is not decoded? Send us your ABI via contact@anyblockanalytics.com and we’ll take care of it.

2. Smart contracts are stored in plain full text

3. Events and internal transactions (traces) are indexed

4. Get super-fast response times with our Elasticsearch API

5. Use our same tech stack and data model across all test networks and other Ethereum-based blockchains.
It is even available as a virtual appliance if you need for private/permissioned consortia networks.

6. You can visualize the data easily via our hosted Metabase dashboard software.
Or connect to your existing Business Intelligence (e.g. Tableau etc.) or IT Monitoring systems (e.g. Nagios etc.) via our ready-to-go connectors.

These benefits are six substantial advantages compared to Google BigQuery when you work with the Anyblocks Index.

But maybe even more important: we are a small, approachable team of experts that are here to help you! We have deep blockchain know-how and smart contract specific data science experience and can help extract and visualize the relevant data technically but also with the eye of users.
For enterprises we can offer professional service-level-agreements, but generally we answer any questions by anybody, as we love to work with all sorts of projects and teams in the Ethereum ecosystem.

Have a look at all the features and services we are offering. Take also a glimpse into the monitoring use cases and get some inspiration what you can do with the Anyblock Index and check out our dashboard examples.

Image: Exemplary TenX Token dashboard

Now it’s time to get your hands dirty. Take a leap to Google BigQuery and use the public Ethereum data set and play around. Or create a free account for Anyblocks Index or request a free demo from us: contact@anyblockanalytics.com

Freddy Zwanzger — Co-Founder of Anyblock

Interested or questions?

Follow us:
Twitter, LinkedIn,
Medium, YouTube

Originally published at https://www.anyblockanalytics.com on February 10, 2020.

Anyblock Analytics is a German blockchain solution provider. We offer consulting, tools and data to integrate business processes with blockchain.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store