This post will give an overview of the new architecture supporting the next release of our personal data management platform. As is often the case for scrappy startups (and behemoths alike), we incurred significant technical debt prioritizing our offerings---ensuring we had a functional enterprise insights dashboard and native mobile app for our operations leading up to the ICO. However, we as an organization have subsequently prioritized and invested in building a solid technical foundation on which to build a developer friendly ecosystem that can scale efficiently .
Gone are the days of a monolith running on Digital Ocean. Our architecture is now spread across multiple availability zones in AWS with layers of redundancy and an ever growing number of services. For the most part we find these services logically separating themselves into the following areas:
The rest of this post will provide an overview into each of these areas.
Here we have an overview of our architecture, divided into the three aforementioned areas.
We run our services on AWS and have opted for their managed offerings where applicable, such as RDS and ElastiCache. For now, we are trading multi-cloud portability for keeping operational overhead to a minimum and the peace of mind that piggybacking off Amazon’s battle-tested infrastructure provides.
Here’s a quick list of the technologies we use:
Now we turn to each area of our architecture.
Our forward facing architecture consists of Node.js REST services sitting behind an AWS application load balancer. This balancer communicates with our application databases and caches for our payments, sign ups and CRUD operations. They run on an autoscaling EC2 cluster that is slightly overprovisioned so we can scale the number of containers quickly in case of sudden load spikes.
One of the main services we provide to our users is the ability to extract insights from data in a fully secure and permissioned way. Currently we offer three kinds of analyses:
However, we plan to massively expand these offerings by building an open ecosystem of plug-and-play analyses where any developer or company can submit an analysis available for purchase and use by anyone on the Datawallet platform.
To do this we needed to provide a way for developers to build out their analyses based upon their local data, then submit them to the marketplace. We are using Docker as a means of packaging these apps and running them on-demand in a locked-down environment to ensure an easy development, submission, and deployment experience.
In the above diagram we can see how the user-api cluster submits analysis requests to an SQS queue. Worker nodes pick up these requests, query the necessary data, and run the docker images containing the packaged analyses with the data mounted as a volume inside the container. The worker nodes store the result in our databases and ping back the user-api cluster to let the user know that their analysis has finished.
One of the most interesting pieces of the architecture here at Datawallet is our data processing layer. It handles all sourcing and storing of the data, from data lake to warehouse. In practice, it is made up of a few key components including our preprocessing services and an Apache Airflow cluster which manages the data pipelines.
Processing requests first come in through the application queue from which they are read by the preprocessors and placed on respective data ingestion pipeline queues. These requests can be metadata pointing to files, data source APIs, or actual pieces of data that need to be ingested.
From here we have the Airflow master node that pulls from the pipeline queues and is responsible for prioritizing the actual pipelines to be run on the worker nodes taking into account queue congestion as well as concurrency levels with regard to APIs we request.
That’s all for today, we will be doing deeper dives into these components in subsequent posts. Stay tuned to hear more about out highly available Airflow setups, Blockchain infrastructure, our distributed storage mechanisms, and more!